[ https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659524#comment-14659524 ]
Duo Zhang commented on HBASE-14178: ----------------------------------- Yes, we doing more round of check with lock because maybe another thread has already cache the block for us. Things happen here is we disable BC for the given family, so it is impossible that another thread will do the work for us, so we just read from HDFS and bypass the second checking BC round. And as I mentioned above, there are lots of configurations for BC, and {{family.isBlockCacheEnabled()}} is treated as {{cacheDataOnRead}} (You can see the code pasted by [~chenheng], maybe it is a mistake but it is not important for this issue I think, we could open another issue for it). So the safe way to determine if we need the second 'read BC with lock' round is to check if we will put the block back to BC after we read it from HDFS. This is why we introduce a {{shouldLockOnCacheMiss}} method here. Maybe we cound change the name to {{shouldReadAgainWithLockOnCacheMiss}}? Thanks. > regionserver blocks because of waiting for offsetLock > ----------------------------------------------------- > > Key: HBASE-14178 > URL: https://issues.apache.org/jira/browse/HBASE-14178 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.98.6 > Reporter: Heng Chen > Priority: Critical > Fix For: 0.98.6 > > Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, > HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, > HBASE-14178_v4.patch, HBASE-14178_v5.patch, HBASE-14178_v6.patch, jstack > > > My regionserver blocks, and all client rpc timeout. > I print the regionserver's jstack, it seems a lot of threads were blocked > for waiting offsetLock, detail infomation belows: > PS: my table's block cache is off > {code} > "B.DefaultRpcServer.handler=2,queue=2,port=60020" #82 daemon prio=5 os_prio=0 > tid=0x0000000001827000 nid=0x2cdc in Object.wait() [0x00007f3831b72000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79) > - locked <0x0000000773af7c18> (a > org.apache.hadoop.hbase.util.IdLock$Entry) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173) > at > org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820) > - locked <0x00000005e5c55ad0> (a > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) > at java.lang.Thread.run(Thread.java:745) > Locked ownable synchronizers: > - <0x00000005e5c55c08> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)