[ https://issues.apache.org/jira/browse/HBASE-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398397#comment-17398397 ]
Hudson commented on HBASE-26155: -------------------------------- Results for branch branch-2.4 [build #179 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/179/]: (/) *{color:green}+1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/179/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/179/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/179/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/179/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > JVM crash when scan > ------------------- > > Key: HBASE-26155 > URL: https://issues.apache.org/jira/browse/HBASE-26155 > Project: HBase > Issue Type: Bug > Components: Scanners > Affects Versions: 3.0.0-alpha-1 > Reporter: Xiaolin Ha > Assignee: Xiaolin Ha > Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.6, 2.3.7 > > Attachments: scan-error.png > > > There are scanner close caused regionserver JVM coredump problems on our > production clusters. > {code:java} > Stack: [0x00007fca4b0cc000,0x00007fca4b1cd000], sp=0x00007fca4b1cb0d8, free > space=1020k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > V [libjvm.so+0x7fd314] > J 2810 sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V > (0 bytes) @ 0x00007fdae55a9e61 [0x00007fdae55a9d80+0xe1] > j > org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V+36 > j > org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V+69 > j > org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V+39 > j > org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+31 > j > org.apache.hadoop.hbase.KeyValueUtil.appendKeyTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+43 > J 14724 C2 org.apache.hadoop.hbase.regionserver.StoreScanner.shipped()V (51 > bytes) @ 0x00007fdae6a298d0 [0x00007fdae6a29780+0x150] > J 21387 C2 > org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run()V > (53 bytes) @ 0x00007fdae622bab8 [0x00007fdae622acc0+0xdf8] > J 26353 C2 > org.apache.hadoop.hbase.ipc.ServerCall.setResponse(Lorg/apache/hbase/thirdparty/com/google/protobuf/Message;Lorg/apache/hadoop/hbase/CellScanner;Ljava/lang/Throwable;Ljava/lang/String;)V > (384 bytes) @ 0x00007fdae7f139d8 [0x00007fdae7f12980+0x1058] > J 26226 C2 org.apache.hadoop.hbase.ipc.CallRunner.run()V (1554 bytes) @ > 0x00007fdae959f68c [0x00007fdae959e400+0x128c] > J 19598% C2 > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(Ljava/util/concurrent/BlockingQueue;Ljava/util/concurrent/atomic/AtomicInteger;)V > (338 bytes) @ 0x00007fdae81c54d4 [0x00007fdae81c53e0+0xf4] > {code} > There are also scan rpc errors when coredump happens at the handler, > !scan-error.png|width=585,height=235! > I found some clue in the logs, that some blocks may be replaced when its > nextBlockOnDiskSize less than the newly one in the method > > {code:java} > public static boolean shouldReplaceExistingCacheBlock(BlockCache blockCache, > BlockCacheKey cacheKey, Cacheable newBlock) { > if (cacheKey.toString().indexOf(".") != -1) { // reference file > LOG.warn("replace existing cached block, cache key is : " + cacheKey); > return true; > } > Cacheable existingBlock = blockCache.getBlock(cacheKey, false, false, > false); > if (existingBlock == null) { > return true; > } > try { > int comparison = BlockCacheUtil.validateBlockAddition(existingBlock, > newBlock, cacheKey); > if (comparison < 0) { > LOG.warn("Cached block contents differ by nextBlockOnDiskSize, the new > block has " > + "nextBlockOnDiskSize set. Caching new block."); > return true; > ......{code} > > And the block will be replaced if it is not in the RAMCache but in the > BucketCache. > When using > > {code:java} > private void putIntoBackingMap(BlockCacheKey key, BucketEntry bucketEntry) { > BucketEntry previousEntry = backingMap.put(key, bucketEntry); > if (previousEntry != null && previousEntry != bucketEntry) { > ReentrantReadWriteLock lock = offsetLock.getLock(previousEntry.offset()); > lock.writeLock().lock(); > try { > blockEvicted(key, previousEntry, false); > } finally { > lock.writeLock().unlock(); > } > } > } > {code} > to replace the old block, to avoid previous bucket entry mem leak, the > previous bucket entry will be force released regardless of RPC references to > it. > > {code:java} > void blockEvicted(BlockCacheKey cacheKey, BucketEntry bucketEntry, boolean > decrementBlockNumber) { > bucketAllocator.freeBlock(bucketEntry.offset()); > realCacheSize.add(-1 * bucketEntry.getLength()); > blocksByHFile.remove(cacheKey); > if (decrementBlockNumber) { > this.blockNumber.decrement(); > } > } > {code} > I used the check of RPC reference before replace bucket entry, and it works, > no coredumps until now. > > That is: > {code:java} > public void cacheBlockWithWait(BlockCacheKey cacheKey, Cacheable cachedItem, > boolean inMemory, > boolean wait) { > if (cacheEnabled) { > if (backingMap.containsKey(cacheKey) || ramCache.containsKey(cacheKey)) { > if (BlockCacheUtil.shouldReplaceExistingCacheBlock(this, cacheKey, > cachedItem)) { > BucketEntry bucketEntry = backingMap.get(cacheKey); > if (bucketEntry != null && bucketEntry.isRpcRef()) { > // avoid replace when there are RPC refs for the bucket entry in > bucket cache > return; > } > cacheBlockWithWaitInternal(cacheKey, cachedItem, inMemory, wait); > } > } else { > cacheBlockWithWaitInternal(cacheKey, cachedItem, inMemory, wait); > } > } > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)