[ https://issues.apache.org/jira/browse/HBASE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035998#comment-14035998 ]
Andrew Purtell commented on HBASE-11306: ---------------------------------------- Not sure why the client gets stuck with only one stalled connection. Not sure why it drops the connections to the 4 other RegionServers. I'm sure I can reproduce this again if you have suggestions for debugging what's going on there further. > Client connection starvation issues under high load on Amazon EC2 > ----------------------------------------------------------------- > > Key: HBASE-11306 > URL: https://issues.apache.org/jira/browse/HBASE-11306 > Project: HBase > Issue Type: Bug > Environment: Amazon EC2 > Reporter: Andrew Purtell > > I am using YCSB 0.1.4 with Hadoop 2.2.0 and HBase 0.98.3 RC2 on an EC2 > testbed (c3.8xlarge instances, SSD backed, 10 GigE networking). There are > five slaves and five separate clients. I start with a prepopulated table of > 100M rows over ~20 regions and run 5 YCSB clients concurrently targeting > 250,000 ops/sec in aggregate. (Can reproduce this less effectively at > 100k/ops/sec aggregate also.) Workload A. Due to how I set up the test, the > data is all in one HFile per region and very likely in cache. All writes will > fit in the aggregate memstore. No flushes or compactions are observed on any > server during the test, only the occasional log roll. Despite these favorable > conditions developed over time to isolate this issue, a few of the clients > will stop making progress until socket timeouts after 60 seconds, leading to > very large op latency outliers. With the above detail plus some added extra > logging we can rule out storage layer effects. Turning to the network, this > is where things get interesting. > I used {{while true ; do clear ; ss -a -o|grep ESTAB|grep 8120 ; sleep 5 ; > done}} (8120 is the configured RS data port) to watch receive and send socket > queues and TCP level timers on all of the clients and servers simultaneously > during the run. > I have Nagle disabled on the clients and servers and JVM networking set up to > use IPv4 only. The YCSB clients are configured to use 20 threads. These > threads are expected to share 5 active connections. one to each RegionServer. > When the test starts we see exactly what we'd expect, 5 established TCPv4 > connections. > On all servers usually the recv and send queues were empty when sampled. I > never saw more than 10K waiting. The servers occasionally retransmitted, but > with timers ~200ms and retry counts ~0. > The client side is another story. We see serious problems like: > {noformat} > tcp ESTAB 0 8733 10.220.15.45:41428 10.220.2.115:8120 > timer:(on,38sec,7) > {noformat} > That is about 9K of data still waiting to be sent after 7 TCP level > retransmissions. > There is some unfair queueing and packet drops happening at the network > level, but we should be handling this better. > During the periods when YCSB is not making progress, there is only that one > connection to one RS in established state. There should be 5 established > connections, one to each RS, but the other 4 have been dropped somehow. The > one distressed connection remains established for the duration of the > problem, while the retransmission timer count on the connection ticks upward. > It is dropped once the socket times out at the app level. Why are the > connections to the other RegionServers dropped? Why are all threads blocked > waiting on the one connection for the socket timeout interval (60 seconds)? > After the socket timeout we see the stuck connection dropped and 5 new > connections immediately established. YCSB doesn't do anything that would lead > to this behavior, it is using separate HTable instances for each client > thread and not closing the table references until test cleanup. These > behaviors seem internal to the HBase client. > Is maintaining only a single multiplexed connection to each RegionServer the > best approach? > A related issue is we collect zombie sockets in ESTABLISHED state on the > server. Also likely not our fault per se. Keepalives are enabled so they will > eventually be garbage collected by the OS. On Linux systems this will take 2 > hours. We might want to drop connections where we don't see activity sooner > than that. Before HBASE-11277 we were spinning indefinitely on a core for > each connection in this state. > I have tried this using a narrow range of recent Java 7 and Java 8 runtimes > and they all produce the same results. I have also launched several separate > EC2 based test clusters and they all produce the same results, so this is a > generic platform issue. -- This message was sent by Atlassian JIRA (v6.2#6252)