Xiaolin Ha created HBASE-26155:
----------------------------------

             Summary: JVM crash when rpc calls close scanner
                 Key: HBASE-26155
                 URL: https://issues.apache.org/jira/browse/HBASE-26155
             Project: HBase
          Issue Type: Bug
          Components: Scanners
    Affects Versions: 3.0.0-alpha-1
            Reporter: Xiaolin Ha


There are scanner close caused regionserver JVM coredump problems on our 
production clusters.

{code:java}
Stack: [0x00007fca4b0cc000,0x00007fca4b1cd000],  sp=0x00007fca4b1cb0d8,  free 
space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x7fd314]
J 2810  sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 
bytes) @ 0x00007fdae55a9e61 [0x00007fdae55a9d80+0xe1]
j  
org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V+36
j  
org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V+69
j  
org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V+39
j  
org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+31
j  
org.apache.hadoop.hbase.KeyValueUtil.appendKeyTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+43
J 14724 C2 org.apache.hadoop.hbase.regionserver.StoreScanner.shipped()V (51 
bytes) @ 0x00007fdae6a298d0 [0x00007fdae6a29780+0x150]
J 21387 C2 
org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run()V
 (53 bytes) @ 0x00007fdae622bab8 [0x00007fdae622acc0+0xdf8]
J 26353 C2 
org.apache.hadoop.hbase.ipc.ServerCall.setResponse(Lorg/apache/hbase/thirdparty/com/google/protobuf/Message;Lorg/apache/hadoop/hbase/CellScanner;Ljava/lang/Throwable;Ljava/lang/String;)V
 (384 bytes) @ 0x00007fdae7f139d8 [0x00007fdae7f12980+0x1058]
J 26226 C2 org.apache.hadoop.hbase.ipc.CallRunner.run()V (1554 bytes) @ 
0x00007fdae959f68c [0x00007fdae959e400+0x128c]
J 19598% C2 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(Ljava/util/concurrent/BlockingQueue;Ljava/util/concurrent/atomic/AtomicInteger;)V
 (338 bytes) @ 0x00007fdae81c54d4 [0x00007fdae81c53e0+0xf4]
{code}

There is no guarantee for RPC calls to hold unique scanners, right? 
For example, when there are client disconnect problems, RS may not terminate 
the scanner nexts until it checks the `rpcCall.disconnectSince()` time. But 
before this another scan RPC may also use the same scanner that holds in the RS 
cache by RegionScannerHolder. Then they change the `previousCell` in the 
scanner in different threads...













--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to