Hi, all. I'm having another problem that I hope someone can help me with. Or at least point me in the right direction for diagnosing this. It's a little weird, and I'm running out of ideas to test.
I'm in the middle of testing a two-node LVS balancing setup (see http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.localnode.html#two_box_lvs) to balance SSH connections. But for reasons that aren't clear, some of my connections are getting hung up. Here's the general info on my setup: - I'm running on (basically) a modified RHEL 6.2 image with an updated kernel package - This setup uses LVS-DR - The two nodes have the IP addresses 192.168.25.14 and 192.168.25.15 for direct access - The virtual IP that's being balanced is 192.168.25.16 - The testing client is 192.168.25.17 - I'm using keepalived to manage the setup, VRRP, etc. - I've already started using an FWMARK balancing setup, using IPTables, to avoid the double-balancing/packet-storm/battling-directors issue described in section 9.3 of the LVS-HOWTO (URL above) - All connections that go to the active director, and get sent locally, seem to be fine - Some, but not all, of the connections that go through the active director, and are forwarded to the backup director (also acting as a realserver), are hanging up - When I do something short (eg. a loop around "ssh 192.168.25.16 hostname"), I can frequently get several good connections through to the backup director, before one of them hangs up. - If I try to do a larger stream of data, eg scp a file, then my connection stalls/hangs up every time I'm sent to the backup director/RS - There doesn't seem to be any pattern yet as to the number of good connections, packet count, or data, before the hang up occurs - When the problem occurs, I see very rapid packet/byte rate on the "lo" interfaces, that seems to be a lot of SSH packet retransmissions from 192.168.25.17 (client), to 192.168.25.16 (VIP). Why this is ending up on "lo" is a mystery to me. - The problem only occurs when using the floating VIP interface to connect, and only when it's redirected to the backup director host. Connecting directly to that same host (eg. 192.168.25.14 or 192.168.25.16) works just fine every time. - I've already tried flushing iptables completely on the backup director, and it didn't seem to help. I'm going to attach copies of several files (keepalived.conf, iptables setup, etc.) to see if they're helpful. If anyone can point me in the right direction to figure this out, I'd appreciate it greatly. Thanks again, -- Lloyd Brown Systems Administrator Fulton Supercomputing Lab Brigham Young University http://marylou.byu.edu
lvs_diagnosis_21August2014.tar.gz
Description: GNU Zip compressed data
_______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org Send requests to lvs-users-requ...@linuxvirtualserver.org or go to http://lists.graemef.net/mailman/listinfo/lvs-users