I reverted that change for now.


----- Original Message -----
From: lars hofhansl <[email protected]>
To: "[email protected]" <[email protected]>
Cc: 
Sent: Thursday, August 23, 2012 6:05 PM
Subject: Re: HBase-0.94.2-SNAPSHOT Scanning Bug

This:

"IPC Server handler 43 on 10304" daemon prio=10 tid=0x00007f16b8b1f000 
nid=0x6414 runnable [0x00007f16b47c6000]
   java.lang.Thread.State: RUNNABLE
    at 
org.apache.hadoop.hbase.KeyValue.createFirstOnRowColTS(KeyValue.java:1893)
    at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.requestSeek(StoreFileScanner.java:310)
    at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:297)
    at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:256)
    at 
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:522)
    - locked <0x00000006cec5bd88> (a 
org.apache.hadoop.hbase.regionserver.StoreScanner)
    at 
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
    at 
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.requestSeek(NonLazyKeyValueScanner.java:38)
    at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:297)
    at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:256)
    at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3508)

points to my change: https://issues.apache.org/jira/browse/HBASE-6577

The trace is interesting: RegionScannerImpl.nextRow now seeks to the last KV in 
the row and then iterates as before.
However, then the reseek internally seeks to the first KV of the column, and 
somehow this interaction makes no progress forward.

I'll revert that change.

-- Lars


________________________________
From: Elliott Clark <[email protected]>
To: [email protected] 
Sent: Thursday, August 23, 2012 5:39 PM
Subject: HBase-0.94.2-SNAPSHOT Scanning Bug

I recently tried to update one of our clusters to a version of 0.94.2
seen here: https://github.com/stumbleupon/hbase/commits/su_prod_94

When doing that all of the nodes started taking all available cpu
time.  Not much interesting was in the logs however jstacks looked
like this: http://pastebin.com/raw.php?i=fw6P5RKE  Everything is
spinning in scans.  A version of 0.94.1 works perfectly and reverting
solved all issues.  I don't really have enough data to point at any
jira as the cause I was just wondering if anyone had some insight into
the few commits between 0.94.1 release and the head of the above
github that could cause scans to spin.

Thanks

Reply via email to