[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109410#comment-14109410
 ] 

stack commented on HBASE-11813:
---

javadoc is unrelated. I can fix on commit though: [WARNING] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java:96:
 warning - Tag @link: reference not found: cellCodecClsName

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt, 
> 11813.master.txt, 11813v2.master.txt, 11813v3.master.txt, 
> catch_all_exceptions.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a handler to 
> an Error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109407#comment-14109407
 ] 

stack commented on HBASE-11813:
---

Review?

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt, 
> 11813.master.txt, 11813v2.master.txt, 11813v3.master.txt, 
> catch_all_exceptions.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a handler to 
> an Error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109309#comment-14109309
 ] 

Hadoop QA commented on HBASE-11813:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664157/11813v3.master.txt
  against trunk revision .
  ATTACHMENT ID: 12664157

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 6 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10566//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10566//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10566//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10566//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10566//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10566//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10566//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10566//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10566//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10566//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10566//console

This message is automatically generated.

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt, 
> 11813.master.txt, 11813v2.master.txt, 11813v3.master.txt, 
> catch_all_exceptions.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception

[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109187#comment-14109187
 ] 

Hadoop QA commented on HBASE-11813:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664156/11813v2.master.txt
  against trunk revision .
  ATTACHMENT ID: 12664156

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:red}-1 Anti-pattern{color}.  The patch appears to 
have anti-pattern where BYTES_COMPARATOR was omitted:
 +NavigableMap> m = new TreeMap>();.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail.

Compilation errors resume:
[ERROR] COMPILATION ERROR : 
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-common/src/test/java/org/apache/hadoop/hbase/TestCellUtil.java:[79,10]
 error: TestCellUtil.TestCell is not abstract and does not override abstract 
method getTagsLength() in Cell
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-common/src/test/java/org/apache/hadoop/hbase/TestCellUtil.java:[186,17]
 error: getTagsLength() in TestCellUtil.TestCell cannot implement 
getTagsLength() in Cell
[ERROR]   return type short is not compatible with int
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-common/src/test/java/org/apache/hadoop/hbase/TestCellUtil.java:[191,4]
 error: method does not override or implement a method from a supertype
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile 
(default-testCompile) on project hbase-common: Compilation failure: Compilation 
failure:
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-common/src/test/java/org/apache/hadoop/hbase/TestCellUtil.java:[79,10]
 error: TestCellUtil.TestCell is not abstract and does not override abstract 
method getTagsLength() in Cell
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-common/src/test/java/org/apache/hadoop/hbase/TestCellUtil.java:[186,17]
 error: getTagsLength() in TestCellUtil.TestCell cannot implement 
getTagsLength() in Cell
[ERROR] return type short is not compatible with int
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-common/src/test/java/org/apache/hadoop/hbase/TestCellUtil.java:[185,4]
 error: method does not override or implement a method from a supertype
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase-common/src/test/java/org/apache/hadoop/hbase/TestCellUtil.java:[191,4]
 error: method does not override or implement a method from a supertype
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hbase-common


Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10565//console

This message is automatically generated.

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt, 
> 11813.master.txt, 11813v2.master.txt, catch_all_exceptions.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apach

[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-25 Thread Johannes Schaback (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109055#comment-14109055
 ] 

Johannes Schaback commented on HBASE-11813:
---

Quick update from me. 

We have Stacks patch running for a day now. The StackOverflowExceptions did not 
occur again, all RS are operational and the cluster did not hang. We keep our 
home-compiled HBase running now until the patch makes it to an official release.

Our client code still queries very large batches at times. Random-access of 
100k records in one query is likely for us. Large batches were the original 
cause of this issue. With the recursion issue resolved, we now observed two 
non-dramatic cases where the client timed out and a ChannelClosedException was 
thrown on the server side without killing the RS. Stack and I suspect that a 
large query is taking to long to process/transmit, but we havent figured out 
the root cause yet (region is consistent). We will adjust our logging a bit and 
keep an eye on it. Besides these two cases, nothing happend so far.

Thank you all for the quick fix and the responsiveness!

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt, 
> 11813.master.txt, catch_all_exceptions.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a 

[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-24 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108755#comment-14108755
 ] 

ramkrishna.s.vasudevan commented on HBASE-11813:


bq.-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.TestCellUtil
Seems the test case is related to the patch.

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt, 
> 11813.master.txt, catch_all_exceptions.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a handler to 
> an Error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108679#comment-14108679
 ] 

Hadoop QA commented on HBASE-11813:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12664071/11813.master.txt
  against trunk revision .
  ATTACHMENT ID: 12664071

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 6 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.TestCellUtil

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10557//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10557//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10557//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10557//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10557//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10557//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10557//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10557//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10557//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10557//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10557//console

This message is automatically generated.

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt, 
> 11813.master.txt, catch_all_exceptions.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowE

[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108647#comment-14108647
 ] 

Hadoop QA commented on HBASE-11813:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12664068/catch_all_exceptions.txt
  against trunk revision .
  ATTACHMENT ID: 12664068

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10556//console

This message is automatically generated.

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt, 
> catch_all_exceptions.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a handler to 
> an Error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-24 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108443#comment-14108443
 ] 

Andrew Purtell commented on HBASE-11813:


Thanks for reporting back [~Schabby]. Please let us know if this is still 
looking good after a day or so. 

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a handler to 
> an Error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-24 Thread Johannes Schaback (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108429#comment-14108429
 ] 

Johannes Schaback commented on HBASE-11813:
---

The patch is live in our production cluster for about 2 hours now. So far no RS 
crash...

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a handler to 
> an Error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-24 Thread Johannes Schaback (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108391#comment-14108391
 ] 

Johannes Schaback commented on HBASE-11813:
---

Ah, nevermind. I believe I just have to apply the patched attached to this bug 
ticket.

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a handler to 
> an Error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-24 Thread Johannes Schaback (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108387#comment-14108387
 ] 

Johannes Schaback commented on HBASE-11813:
---

Sorry for my asking, but where do I get the patch from exactly? 
git://git.apache.org/hbase.git has the last commits about 15 hours ago.

Thanks, Johannes

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a handler to 
> an Error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-24 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108382#comment-14108382
 ] 

Qiang Tian commented on HBASE-11813:



oops..it already points to line 210(got fever,brain is not so clear)
Thanks Stack

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a handler to 
> an Error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108303#comment-14108303
 ] 

Hadoop QA commented on HBASE-11813:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663910/11813.master.txt
  against trunk revision .
  ATTACHMENT ID: 12663910

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 6 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.http.TestHttpServerLifecycle.testStartedServerWithRequestLog(TestHttpServerLifecycle.java:92)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10550//console

This message is automatically generated.

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflow

[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-24 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108300#comment-14108300
 ] 

stack commented on HBASE-11813:
---

[~Schabby] Suggest you enable DEBUG. This patch below should catch the overflow 
error, dump some detail on the particular invocation, and allow you keep going:

diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
index 31484bb..da2afe0 100644
--- a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
+++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
@@ -136,9 +136,9 @@ public class CallRunner {
   "this means that the server was processing a " +
   "request but the client went away. The error message was: " +
   cce.getMessage());
-} catch (Exception e) {
+} catch (Throwable e) {
   RpcServer.LOG.warn(Thread.currentThread().getName()
-  + ": caught: " + StringUtils.stringifyException(e));
+  + ": caught: " + StringUtils.stringifyException(e) + " call=" + 
getCall());
 }
   }

No guarantees! I tried it and works when no problems.

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
> Attachments: 11813.098.txt, 11813.098.txt, 11813.master.txt
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this s

[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-23 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108251#comment-14108251
 ] 

Qiang Tian commented on HBASE-11813:


I'd suspect this one:

{code}
  /**
   * Flatten the map of cells out under the CellScanner
   * @param map Map of Cell Lists; for example, the map of families to Cells 
that is used
   * inside Put, etc., keeping Cells organized by family.
   * @return CellScanner interface over cellIterable
   */
  public static CellScanner createCellScanner(final NavigableMap> map) {
return new CellScanner() {
  private final Iterator>> entries =
  map.entrySet().iterator();
  private Iterator currentIterator = null;
  private Cell currentCell;

  @Override
  public Cell current() {
return this.currentCell;
  }

  @Override
  public boolean advance() {
if (this.currentIterator == null) {
  if (!this.entries.hasNext()) return false;
  this.currentIterator = this.entries.next().getValue().iterator();
}
if (this.currentIterator.hasNext()) {
  this.currentCell = this.currentIterator.next();
  return true;
}
this.currentCell = null;
this.currentIterator = null;
return advance();
  }
};
  }
{code}
looks the one Andrew mentioned would not trigger advance method in server 
side...while the other one is widely used in server side code 
paths..coprocessor or end point related..

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true

[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-23 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108197#comment-14108197
 ] 

Andrew Purtell commented on HBASE-11813:


Not that a scanner that never returns is better but can we do this without 
recursion? 

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a handler to 
> an Error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108190#comment-14108190
 ] 

stack commented on HBASE-11813:
---

The code has been in hbase a good while now.  The issue I think is 
this.cellScanner = this.iterator.next().cellScanner(); where the iterator never 
finishes.  I cannot repro it locally.  Its some particularly combo of cell 
count and lists of cellscanners that is triggering it.

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a handler to 
> an Error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-23 Thread Johannes Schaback (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108143#comment-14108143
 ] 

Johannes Schaback commented on HBASE-11813:
---

Great, we are eagerly looking forward to the patch :)

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we are 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a handler to 
> an Error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11813) CellScanner#advance may infinitely recurse

2014-08-23 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108112#comment-14108112
 ] 

Andrew Purtell commented on HBASE-11813:


Ping [~stack], this came in on HBASE-7899

> CellScanner#advance may infinitely recurse
> --
>
> Key: HBASE-11813
> URL: https://issues.apache.org/jira/browse/HBASE-11813
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Priority: Blocker
> Fix For: 0.99.0, 2.0.0, 0.98.6
>
>
> On user@hbase, johannes.schab...@visual-meta.com reported:
> {quote}
> we face a serious issue with our HBase production cluster for two days now. 
> Every couple minutes, a random RegionServer gets stuck and does not process 
> any requests. In addition this causes the other RegionServers to freeze 
> within a minute which brings down the entire cluster. Stopping the affected 
> RegionServer unblocks the cluster and everything comes back to normal.
> {quote}
> Subsequent troubleshooting reveals that RPC is getting stuck because we 
> losing RPC handlers. In the .out files we have this:
> {noformat}
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:210)
> [...]
> Exception in thread "defaultRpcServer.handler=5,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=18,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=23,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=24,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=2,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=11,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=25,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=20,queue=2,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=19,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=15,queue=0,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=1,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=7,queue=1,port=60020"
> java.lang.StackOverflowError
> Exception in thread "defaultRpcServer.handler=4,queue=1,port=60020"
> java.lang.StackOverflowError​
> {noformat}
> That is the anonymous CellScanner instance we create from 
> CellUtil#createCellScanner:
> {code}
> ​return new CellScanner() {
>   private final Iterator iterator = 
> cellScannerables.iterator();
>   private CellScanner cellScanner = null;
>   @Override
>   public Cell current() {
> return this.cellScanner != null? this.cellScanner.current(): null;
>   }
>   @Override
>   public boolean advance() throws IOException {
> if (this.cellScanner == null) {
>   if (!this.iterator.hasNext()) return false;
>   this.cellScanner = this.iterator.next().cellScanner();
> }
> if (this.cellScanner.advance()) return true;
> this.cellScanner = null;
> --->return advance();
>   }
> };
> {code}
> That final return statement is the immediate problem.
> We should also fix this so the RegionServer aborts if it loses a handler to 
> an Error. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)