[jira] [Commented] (HBASE-24158) [Flakey Tests] TestAsyncTableGetMultiThreaded

Michael Stack (Jira) Wed, 15 Apr 2020 23:11:08 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084574#comment-17084574
 ]


Michael Stack commented on HBASE-24158:
---------------------------------------

Got this checking local runs...
{code}
2020-04-15 20:27:40,882 ERROR [RPCClient-NioEventLoopGroup-6-3] 
util.FutureUtils(70): Unexpected error caught when processing CompletableFuture
 java.lang.NullPointerException
   at 
org.apache.hadoop.hbase.client.AsyncRegionLocatorHelper.canUpdateOnError(AsyncRegionLocatorHelper.java:49)
   at 
org.apache.hadoop.hbase.client.AsyncRegionLocatorHelper.updateCachedLocationOnError(AsyncRegionLocatorHelper.java:61)
   at 
org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.updateCachedLocationOnError(AsyncNonMetaRegionLocator.java:610)
...
{code}

I added logging to the background threads that were doing continuous gets. In 
the thread dump output when test times out, I saw one or two of the background 
threads. Looking at where they were last going back through logs, in at least 
one case the region had just split, the locator couldn't find the region. The 
handling of the null lcoation generated the above which messed up the 
background thread... It got stuck.

Testing patch locally....

> [Flakey Tests] TestAsyncTableGetMultiThreaded
> ---------------------------------------------
>
>                 Key: HBASE-24158
>                 URL: https://issues.apache.org/jira/browse/HBASE-24158
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Michael Stack
>            Assignee: Michael Stack
>            Priority: Major
>             Fix For: 3.0.0, 2.3.0
>
>         Attachments: 
> 0001-HBASE-24158-Flakey-Tests-TestAsyncTableGetMultiThrea.patch
>
>
> I've already cut down the number of threads used by this test but it failed 
> in nightly last night unable to close out its xml and locally it failed too 
> in a run overnight. I ran it under harness and it seems well-behaved. It 
> doesn't use much memory -- 700MB -- and thread counts are usual (~450).  It 
> does use near 100% CPU which is a little unusual. Otherwise, looks fine.
> Let me keep an eye on it. Could down the thread count more and use less 
> processes... this makes it use less CPU. There does seems a bunch of overlap 
> with tests done elsewhere.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24158) [Flakey Tests] TestAsyncTableGetMultiThreaded

Reply via email to