[
https://issues.apache.org/jira/browse/HBASE-24155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113749#comment-17113749
]
Mark Robert Miller commented on HBASE-24155:
--------------------------------------------
It took me a bit longer, but I ended up tracking this down a bit further.
Raising the socket cache size and expiration for hdfs had helped a fair amount,
but there still 50% the number of sockets getting made, a lot of it I tracked
to *ReplicationSourceWALReader* and it's reset to look for additional data to
read.
> When running the tests, a tremendous number of connections are put into
> TIME_WAIT.
> ----------------------------------------------------------------------------------
>
> Key: HBASE-24155
> URL: https://issues.apache.org/jira/browse/HBASE-24155
> Project: HBase
> Issue Type: Test
> Components: test
> Reporter: Mark Robert Miller
> Priority: Major
>
> When you run the test suite and monitor the number of connections in
> TIME_WAIT, it appears that a very large number of connections do not end up
> with a proper connection close lifecycle or perhaps proper reuse.
> Given connections can stay in TIME_WAIT from 1-4 minutes depending on OS/Env,
> running the tests faster or with more tests in parallel increases the
> TIME_WAIT connection buildup. Some tests spin up a very, very large number of
> connections and if the wrong ones run at the same time, this can also greatly
> increase the number of connections put into TIME_WAIT. This can have a
> dramatic affect on performance (as it can take longer to create a new
> connection) or flat out fail or timeout.
> In my experience, a much, much smaller number of connections in a test suite
> would end up in TIME_WAIT when connection handling is all correct.
> Notes to come in comments below.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)