It is interesting, though, because I have been running my local perf testing with this change included and have not seen this issue.
-- Lars ----- Original Message ----- From: lars hofhansl <[email protected]> To: "[email protected]" <[email protected]> Cc: Sent: Thursday, August 23, 2012 6:05 PM Subject: Re: HBase-0.94.2-SNAPSHOT Scanning Bug This: "IPC Server handler 43 on 10304" daemon prio=10 tid=0x00007f16b8b1f000 nid=0x6414 runnable [0x00007f16b47c6000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hbase.KeyValue.createFirstOnRowColTS(KeyValue.java:1893) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.requestSeek(StoreFileScanner.java:310) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:297) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:256) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:522) - locked <0x00000006cec5bd88> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.requestSeek(NonLazyKeyValueScanner.java:38) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:297) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:256) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3508) points to my change: https://issues.apache.org/jira/browse/HBASE-6577 The trace is interesting: RegionScannerImpl.nextRow now seeks to the last KV in the row and then iterates as before. However, then the reseek internally seeks to the first KV of the column, and somehow this interaction makes no progress forward. I'll revert that change. -- Lars ________________________________ From: Elliott Clark <[email protected]> To: [email protected] Sent: Thursday, August 23, 2012 5:39 PM Subject: HBase-0.94.2-SNAPSHOT Scanning Bug I recently tried to update one of our clusters to a version of 0.94.2 seen here: https://github.com/stumbleupon/hbase/commits/su_prod_94 When doing that all of the nodes started taking all available cpu time. Not much interesting was in the logs however jstacks looked like this: http://pastebin.com/raw.php?i=fw6P5RKE Everything is spinning in scans. A version of 0.94.1 works perfectly and reverting solved all issues. I don't really have enough data to point at any jira as the cause I was just wondering if anyone had some insight into the few commits between 0.94.1 release and the head of the above github that could cause scans to spin. Thanks
