Xiaolin Ha created HBASE-26155:
----------------------------------
Summary: JVM crash when rpc calls close scanner
Key: HBASE-26155
URL: https://issues.apache.org/jira/browse/HBASE-26155
Project: HBase
Issue Type: Bug
Components: Scanners
Affects Versions: 3.0.0-alpha-1
Reporter: Xiaolin Ha
There are scanner close caused regionserver JVM coredump problems on our
production clusters.
{code:java}
Stack: [0x00007fca4b0cc000,0x00007fca4b1cd000], sp=0x00007fca4b1cb0d8, free
space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x7fd314]
J 2810 sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0
bytes) @ 0x00007fdae55a9e61 [0x00007fdae55a9d80+0xe1]
j
org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V+36
j
org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V+69
j
org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V+39
j
org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+31
j
org.apache.hadoop.hbase.KeyValueUtil.appendKeyTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+43
J 14724 C2 org.apache.hadoop.hbase.regionserver.StoreScanner.shipped()V (51
bytes) @ 0x00007fdae6a298d0 [0x00007fdae6a29780+0x150]
J 21387 C2
org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run()V
(53 bytes) @ 0x00007fdae622bab8 [0x00007fdae622acc0+0xdf8]
J 26353 C2
org.apache.hadoop.hbase.ipc.ServerCall.setResponse(Lorg/apache/hbase/thirdparty/com/google/protobuf/Message;Lorg/apache/hadoop/hbase/CellScanner;Ljava/lang/Throwable;Ljava/lang/String;)V
(384 bytes) @ 0x00007fdae7f139d8 [0x00007fdae7f12980+0x1058]
J 26226 C2 org.apache.hadoop.hbase.ipc.CallRunner.run()V (1554 bytes) @
0x00007fdae959f68c [0x00007fdae959e400+0x128c]
J 19598% C2
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(Ljava/util/concurrent/BlockingQueue;Ljava/util/concurrent/atomic/AtomicInteger;)V
(338 bytes) @ 0x00007fdae81c54d4 [0x00007fdae81c53e0+0xf4]
{code}
There is no guarantee for RPC calls to hold unique scanners, right?
For example, when there are client disconnect problems, RS may not terminate
the scanner nexts until it checks the `rpcCall.disconnectSince()` time. But
before this another scan RPC may also use the same scanner that holds in the RS
cache by RegionScannerHolder. Then they change the `previousCell` in the
scanner in different threads...
--
This message was sent by Atlassian Jira
(v8.3.4#803005)