[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108303#comment-14108303
 ] 

Hadoop QA commented on HBASE-11813:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663910/11813.master.txt
  against trunk revision .
  ATTACHMENT ID: 12663910

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 6 
warning messages.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
     

     {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):       
at 
org.apache.hadoop.hbase.http.TestHttpServerLifecycle.testStartedServerWithRequestLog(TestHttpServerLifecycle.java:92)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//console

This message is automatically generated.

> CellScanner#advance may infinitely recurse
> ------------------------------------------
>
>                 Key: HBASE-11813
>                 URL: https://issues.apache.org/jira/browse/HBASE-11813
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.99.0, 2.0.0, 0.98.6
>
>         Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
>         at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
>         at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
>         at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
>         at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​    return new CellScanner() {
>       private final Iterator<? extends CellScannable> iterator = 
> cellScannerables.iterator();
>       private CellScanner cellScanner = null;
>       @Override
>       public Cell current() {
>         return this.cellScanner != null? this.cellScanner.current(): null;
>       }
>       @Override
>       public boolean advance() throws IOException {
>         if (this.cellScanner == null) {
>           if (!this.iterator.hasNext()) return false;
>           this.cellScanner = this.iterator.next().cellScanner();
>         }
>         if (this.cellScanner.advance()) return true;
>         this.cellScanner = null;
> --->        return advance();
>       }
>     };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a handler to 
> an Error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to