[ 
https://issues.apache.org/jira/browse/HBASE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035998#comment-14035998
 ] 

Andrew Purtell commented on HBASE-11306:
----------------------------------------

Not sure why the client gets stuck with only one stalled connection. Not sure 
why it drops the connections to the 4 other RegionServers. I'm sure I can 
reproduce this again if you have suggestions for debugging what's going on 
there further. 

> Client connection starvation issues under high load on Amazon EC2
> -----------------------------------------------------------------
>
>                 Key: HBASE-11306
>                 URL: https://issues.apache.org/jira/browse/HBASE-11306
>             Project: HBase
>          Issue Type: Bug
>         Environment: Amazon EC2
>            Reporter: Andrew Purtell
>
> I am using YCSB 0.1.4 with Hadoop 2.2.0 and HBase 0.98.3 RC2 on an EC2 
> testbed (c3.8xlarge instances, SSD backed, 10 GigE networking). There are 
> five slaves and five separate clients. I start with a prepopulated table of 
> 100M rows over ~20 regions and run 5 YCSB clients concurrently targeting 
> 250,000 ops/sec in aggregate. (Can reproduce this less effectively at 
> 100k/ops/sec aggregate also.) Workload A. Due to how I set up the test, the 
> data is all in one HFile per region and very likely in cache. All writes will 
> fit in the aggregate memstore. No flushes or compactions are observed on any 
> server during the test, only the occasional log roll. Despite these favorable 
> conditions developed over time to isolate this issue, a few of the clients 
> will stop making progress until socket timeouts after 60 seconds, leading to 
> very large op latency outliers. With the above detail plus some added extra 
> logging we can rule out storage layer effects. Turning to the network, this 
> is where things get interesting.
> I used {{while true ; do clear ; ss -a -o|grep ESTAB|grep 8120 ; sleep 5 ; 
> done}} (8120 is the configured RS data port) to watch receive and send socket 
> queues and TCP level timers on all of the clients and servers simultaneously 
> during the run. 
> I have Nagle disabled on the clients and servers and JVM networking set up to 
> use IPv4 only. The YCSB clients are configured to use 20 threads. These 
> threads are expected to share 5 active connections. one to each RegionServer. 
> When the test starts we see exactly what we'd expect, 5 established TCPv4 
> connections.
> On all servers usually the recv and send queues were empty when sampled. I 
> never saw more than 10K waiting. The servers occasionally retransmitted, but 
> with timers ~200ms and retry counts ~0.
> The client side is another story. We see serious problems like:
> {noformat}
> tcp    ESTAB      0      8733   10.220.15.45:41428   10.220.2.115:8120     
> timer:(on,38sec,7)
> {noformat}
> That is about 9K of data still waiting to be sent after 7 TCP level 
> retransmissions. 
> There is some unfair queueing and packet drops happening at the network 
> level, but we should be handling this better.
> During the periods when YCSB is not making progress, there is only that one 
> connection to one RS in established state. There should be 5 established 
> connections, one to each RS, but the other 4 have been dropped somehow. The 
> one distressed connection remains established for the duration of the 
> problem, while the retransmission timer count on the connection ticks upward. 
> It is dropped once the socket times out at the app level. Why are the 
> connections to the other RegionServers dropped? Why are all threads blocked 
> waiting on the one connection for the socket timeout interval (60 seconds)? 
> After the socket timeout we see the stuck connection dropped and 5 new 
> connections immediately established. YCSB doesn't do anything that would lead 
> to this behavior, it is using separate HTable instances for each client 
> thread and not closing the table references until test cleanup. These 
> behaviors seem internal to the HBase client. 
> Is maintaining only a single multiplexed connection to each RegionServer the 
> best approach? 
> A related issue is we collect zombie sockets in ESTABLISHED state on the 
> server. Also likely not our fault per se. Keepalives are enabled so they will 
> eventually be garbage collected by the OS. On Linux systems this will take 2 
> hours. We might want to drop connections where we don't see activity sooner 
> than that. Before HBASE-11277 we were spinning indefinitely on a core for 
> each connection in this state.
> I have tried this using a narrow range of recent Java 7 and Java 8 runtimes 
> and they all produce the same results. I have also launched several separate 
> EC2 based test clusters and they all produce the same results, so this is a 
> generic platform issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to