https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27078
Bug ID: 27078 Summary: Starman hanging in 3-node Koha cluster when 1 node goes offline. Change sponsored?: --- Product: Koha Version: 20.05 Hardware: All OS: All Status: NEW Severity: normal Priority: P5 - low Component: Architecture, internals, and plumbing Assignee: koha-bugs@lists.koha-community.org Reporter: rcmcdonal...@gmail.com QA Contact: testo...@bugs.koha-community.org So I've got a pretty interesting case that I've been playing with for the past several weeks. My goal has been to build a converged 3-node Koha cluster. The architecture looks like this: 1. Each node runs the "standalone" Koha stack (Koha, Starman, Apache, ElasticSearch, Memcached, MariaDB). For the sake of example, these nodes are 10.10.100.51, 10.10.100.52, and 10.10.100.53. 2. Galera is used to build the 3-node MariaDB cluster and each Koha node simply talks to the MariaDB server at localhost. This has worked fine and koha is blissfully unaware of the underlying galera cluster. 3. ElasticSearch is built in a 3-node cluster. koha-conf.xml is configured to use all three ES nodes (again at 10.10.100.51, 10.10.100.52, and 10.10.100.53). Again, koha doesn't seem to mind this at all. 4. All three nodes run Memcached and koha-conf.xml is configured to use all three Memcached nodes (again at 10.10.100.51-3). 5. Plack is enabled on all nodes using koha-plack --enable instancename && koha-plack --start instancename. 6. GlusterFS is used to serve up a 3-node replicated volume for /etc/koha/*, /usr/share/koha/*, and /var/lib/koha* across all three nodes. symlinks are used to present this storage in places koha is expecting them to be. Again, this works great. 6. Two HAProxy instances sit in front of these three koha instances (at 10.10.100.2 and 10.10.100.3). koha_trusted_proxies in koha-conf.xml is configured with these two IPs. 7. Finally, HAProxy handles SSL-offloading and client stickiness. This all works fine too. Here is the weirdness... When all three nodes are online, everything is absolutely fine. Everything is snappy, searching works, etc. When I change koha-conf.xml on one node, this is replicated to the other nodes immediately. However, when one node goes offline, the two remaining nodes become really sluggish. I've narrowed this down to a Starman/Plack issue, but I have no idea why. Here's how I arrived at that conclusion. * I started by killing pertinent services one-by-one on Node A. Killing MariaDB on Node A had no effect on Node B and C... though as expected, Node A started spitting out errors that the DB was unavailable. * Next, I stopped Memcached on Node A. Again, this had no effect on Node B and C. * Next, I stopped ElasticSeasrch on Node A. Again, this had no effect on Node B and Node C. * Next, I stopped GlusterFS on Node A. Again, this had no effect on Node B and Node C. * Next, I stopped koha-common on Node A. Again, this had no effect on Node B and Node C. So at this point, Node A is still "online." However, every service related to Koha is stopped (MariaDB, Memcached, ElasticSearch, Apache, koha-common, etc.). As expected, Node B and C keep on working just fine. Here is the weird part, When Node A actually goes offline (i.e. loses network connectivity and/or powers down), Node B and C become very very slow. They still serve traffic, but they are really sluggish. As soon as connectivity is restored to Node A, Node B and Node C speed right back up again. So is this related to Starman/Plack? When I disable Starman/Plack on all nodes, the speed of each node doesn't change when a single node goes offline. -- You are receiving this mail because: You are watching all bug changes. You are the assignee for the bug. _______________________________________________ Koha-bugs mailing list Koha-bugs@lists.koha-community.org https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/