[Koha-bugs] [Bug 27078] New: Starman hanging in 3-node Koha cluster when 1 node goes offline.

bugzilla-daemon Mon, 23 Nov 2020 07:06:45 -0800

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=27078


            Bug ID: 27078
           Summary: Starman hanging in 3-node Koha cluster when 1 node
                    goes offline.
 Change sponsored?: ---
           Product: Koha
           Version: 20.05
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5 - low
         Component: Architecture, internals, and plumbing
          Assignee: koha-bugs@lists.koha-community.org
          Reporter: rcmcdonal...@gmail.com
        QA Contact: testo...@bugs.koha-community.org

So I've got a pretty interesting case that I've been playing with for the past
several weeks. My goal has been to build a converged 3-node Koha cluster.

The architecture looks like this:
1. Each node runs the "standalone" Koha stack (Koha, Starman, Apache,
ElasticSearch, Memcached, MariaDB). For the sake of example, these nodes are
10.10.100.51, 10.10.100.52, and 10.10.100.53.
2. Galera is used to build the 3-node MariaDB cluster and each Koha node simply
talks to the MariaDB server at localhost. This has worked fine and koha is
blissfully unaware of the underlying galera cluster.
3. ElasticSearch is built in a 3-node cluster. koha-conf.xml is configured to
use all three ES nodes (again at 10.10.100.51, 10.10.100.52, and 10.10.100.53).
Again, koha doesn't seem to mind this at all.
4. All three nodes run Memcached and koha-conf.xml is configured to use all
three Memcached nodes (again at 10.10.100.51-3). 
5. Plack is enabled on all nodes using koha-plack --enable instancename &&
koha-plack --start instancename.
6. GlusterFS is used to serve up a 3-node replicated volume for /etc/koha/*,
/usr/share/koha/*, and /var/lib/koha* across all three nodes. symlinks are used
to present this storage in places koha is expecting them to be. Again, this
works great.
6. Two HAProxy instances sit in front of these three koha instances (at
10.10.100.2 and 10.10.100.3). koha_trusted_proxies in koha-conf.xml is
configured with these two IPs.
7. Finally, HAProxy handles SSL-offloading and client stickiness. This all
works fine too.

Here is the weirdness... When all three nodes are online, everything is
absolutely fine. Everything is snappy, searching works, etc. When I change
koha-conf.xml on one node, this is replicated to the other nodes immediately.
However, when one node goes offline, the two remaining nodes become really
sluggish. I've narrowed this down to a Starman/Plack issue, but I have no idea
why. 

Here's how I arrived at that conclusion.

* I started by killing pertinent services one-by-one on Node A. Killing MariaDB
on Node A had no effect on Node B and C... though as expected, Node A started
spitting out errors that the DB was unavailable.
* Next, I stopped Memcached on Node A. Again, this had no effect on Node B and
C.
* Next, I stopped ElasticSeasrch on Node A. Again, this had no effect on Node B
and Node C.
* Next, I stopped GlusterFS on Node A. Again, this had no effect on Node B and
Node C.
* Next, I stopped koha-common on Node A. Again, this had no effect on Node B
and Node C.


So at this point, Node A is still "online." However, every service related to
Koha is stopped (MariaDB, Memcached, ElasticSearch, Apache, koha-common, etc.).
As expected, Node B and C keep on working just fine.

Here is the weird part,

When Node A actually goes offline (i.e. loses network connectivity and/or
powers down), Node B and C become very very slow. They still serve traffic, but
they are really sluggish. As soon as connectivity is restored to Node A, Node B
and Node C speed right back up again.

So is this related to Starman/Plack?

When I disable Starman/Plack on all nodes, the speed of each node doesn't
change when a single node goes offline.

-- 
You are receiving this mail because:
You are watching all bug changes.
You are the assignee for the bug.
_______________________________________________
Koha-bugs mailing list
Koha-bugs@lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-bugs] [Bug 27078] New: Starman hanging in 3-node Koha cluster when 1 node goes offline.

Reply via email to