ranpanfeng created HBASE-23084:
----------------------------------

             Summary: Assertion "Added a key not lexically larger than 
previous" or  "prevKey  followed by a smaller key"  caused abortion of RS
                 Key: HBASE-23084
                 URL: https://issues.apache.org/jira/browse/HBASE-23084
             Project: HBase
          Issue Type: Bug
          Components: regionserver
         Environment: We store many phoenix tables in hbase, only one table 
trigger this issues.

When two jacent keys in wrong order, the observation as follows.

1. the prevKey is written into  hbase via Put rpc

2. the current key via phoenix upsert statment( Multi rpc in underlying hbase)

3. the current Cell extends 1MB.

4. the error is easy to re-ocurr.
            Reporter: ranpanfeng


when auto flush and manual flush is invoked,  cell in immutable memstore 
segment is scanned one by one to build data block and bloom filter block, 
during this time, two adjacent keys are compared to assert that they are in 
lexicographical order, but regretfully, the assertion fails in some scenarios, 
when RS encounter these assertion failure, always call abort function to 
terminate abnormally.

 

the version is 0ba7cc01a13dbb7fec8e6c5d599fc8b4936bae61  tag: rel/2.1.5

 

At least three cases are found.

*error case#1:  getQualifierArray throws 
java.lang.ArrayIndexOutOfBoundsException*

 
{code:java}
17:57:50.328 [MemStoreFlusher.0] ERROR 
org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region 
server ${regionserver}: Replay of WAL required. Forcing server shutdown *****
org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}.
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) 
[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:612)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:581)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:361)
 [hbase-server-2.1.5.jar:2.1.5]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.lang.ArrayIndexOutOfBoundsException
        at 
org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:1155)
 ~[hbase-common-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.CellUtil.copyQualifierTo(CellUtil.java:309) 
~[hbase-common-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.CellUtil.cloneQualifier(CellUtil.java:121) 
~[hbase-common-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.ByteBufferKeyValue.getQualifierArray(ByteBufferKeyValue.java:112)
 ~[hbase-common-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(CellUtil.java:1306) 
~[hbase-common-2.1.5.jar:2.1.5]
        at org.apache.hadoop.hbase.CellUtil.toString(CellUtil.java:1483) 
~[hbase-common-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.ByteBufferKeyValue.toString(ByteBufferKeyValue.java:296)
 ~[hbase-common-2.1.5.jar:2.1.5]
        at java.lang.String.valueOf(String.java:2994) ~[?:1.8.0_144]
        at java.lang.StringBuilder.append(StringBuilder.java:131) ~[?:1.8.0_144]
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582) 
~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:127)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) 
~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733)
 ~[hbase-server-2.1.5.jar:2.1.5]
        ... 8 more
17:57:50.332 [MemStoreFlusher.0] ERROR 
org.apache.hadoop.hbase.regionserver.HRegionServer - RegionServer abort: loaded 
coprocessors are: [org.apache.phoenix.coprocessor.ScanRegionObserver, 
org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, 
org.apache.phoenix.hbase.index.Indexer, 
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, 
org.apache.phoenix.coprocessor.ServerCachingEndpointImpl]
  
{code}
 

 

*error case#2:  StoreScanner.checkScanOrder triggers "prevKey followed by a 
smaller key"*

 
{code:java}
Exception in thread "RS_IN_MEMORY_COMPACTION-regionserver/${regionserver}" 
java.lang.AssertionError: Key ${prevKey} followed by a smaller key ${key}
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007)
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreCompactorSegmentsIterator.refillKVS(MemStoreCompactorSegmentsIterator.java:150)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreCompactorSegmentsIterator.hasNext(MemStoreCompactorSegmentsIterator.java:72)
        at 
org.apache.hadoop.hbase.regionserver.CellChunkImmutableSegment.initializeCellSet(CellChunkImmutableSegment.java:146)
        at 
org.apache.hadoop.hbase.regionserver.CellChunkImmutableSegment.<init>(CellChunkImmutableSegment.java:66)
        at 
org.apache.hadoop.hbase.regionserver.SegmentFactory.createImmutableSegment(SegmentFactory.java:133)
        at 
org.apache.hadoop.hbase.regionserver.SegmentFactory.createImmutableSegmentByCompaction(SegmentFactory.java:59)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreCompactor.createSubstitution(MemStoreCompactor.java:217)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreCompactor.doCompaction(MemStoreCompactor.java:171)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreCompactor.start(MemStoreCompactor.java:109)
        at 
org.apache.hadoop.hbase.regionserver.CompactingMemStore.flushInMemory(CompactingMemStore.java:422)
        at 
org.apache.hadoop.hbase.regionserver.CompactingMemStore$InMemoryFlushRunnable.run(CompactingMemStore.java:534)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
        at java.lang.Thread.run(Thread.java:748)
12:34:07.695 [MemStoreFlusher.0] INFO  
org.apache.hadoop.hbase.regionserver.HRegion - Flushing 1/1 column families, 
dataSize=239.26 MB heapSize=256.00 MB
12:34:08.476 [MemStoreFlusher.0] ERROR 
org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region 
server ${regionserver}: Replay of WAL required. Forcing server shutdown *****
org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}.
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) 
[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:612)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:581)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:361)
 [hbase-server-2.1.5.jar:2.1.5]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.lang.AssertionError: Key ${prevKey} followed by a smaller key 
${key}
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582) 
~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:127)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) 
~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733)
 ~[hbase-server-2.1.5.jar:2.1.5]
        ... 8 more


{code}
 

 

 

*error case#3: BloomContext.sanityCheck triggers "Added a key not lexically 
larger than previous."* 

so, I alter the assertion code into if stmt
{code:java}
--- 
a/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java
+++ 
b/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java
@@ -33,6 +33,8 @@ import org.apache.hadoop.hbase.io.util.StreamUtils;
 import org.apache.hadoop.io.IOUtils;
 import org.apache.hadoop.io.WritableUtils;
 import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 import sun.nio.ch.DirectBuffer;
 
 import 
org.apache.hbase.thirdparty.com.google.common.annotations.VisibleForTesting;
@@ -51,6 +53,7 @@ public final class ByteBufferUtils {
   @VisibleForTesting
   final static boolean UNSAFE_AVAIL = UnsafeAvailChecker.isAvailable();
   public final static boolean UNSAFE_UNALIGNED = 
UnsafeAvailChecker.unaligned();
+  private static Logger LOG = LoggerFactory.getLogger(ByteBufferUtils.class);
 
   private ByteBufferUtils() {
   }
@@ -1152,7 +1155,18 @@ public final class ByteBufferUtils {
   public static void copyFromBufferToArray(byte[] out, ByteBuffer in, int 
sourceOffset,
       int destinationOffset, int length) {
     if (in.hasArray()) {
-      System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out, 
destinationOffset, length);
+      try {
+        System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out, 
destinationOffset, length);
+      } catch (Throwable e){
+        if (out == null) {
+          LOG.warn("[QINGGUI] out=null");
+        } else if (in == null || in.array() == null) {
+          LOG.warn("[QINGGUI] in=null");
+        } else {
+          LOG.warn("[QINGGUI] out.length={}, in.length={}, sourceOffset={} 
destinationOffset={} length={}",
+            out.length, in.array().length, sourceOffset, destinationOffset, 
length);
+        }
+      }
     } else if (UNSAFE_AVAIL) {
       UnsafeAccess.copy(in, sourceOffset, out, destinationOffset, length);
     } else {

{code}
 

re-deploy my binary, then the failure drifts to bloom filter building code.
{code:java}
12:54:39.966 [rs(${regionserver})-flush-proc-pool3-thread-2] WARN  
org.apache.hadoop.hbase.regionserver.HStore - Failed flushing store file, 
retrying num=9
java.io.IOException: Added a key not lexically larger than previous. Current 
cell ...
        at 
org.apache.hadoop.hbase.util.BloomContext.sanityCheck(BloomContext.java:63) 
~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:54) 
~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:213)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:232)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:133)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) 
[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) 
[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2232) 
[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:68)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:56)
 [hbase-server-2.1.5.jar:2.1.5]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_144]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_144]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_144]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_144]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_144]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
12:54:39.979 [rs(${regionserver})-flush-proc-pool3-thread-2] ERROR 
org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region 
server ${regionserver}: Replay of WAL required. Forcing server shutdown *****
org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}.
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) 
[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2232) 
[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:68)
 [hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:56)
 [hbase-server-2.1.5.jar:2.1.5]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_144]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_144]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_144]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_144]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_144]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.io.IOException: Added a key not lexically larger than previous. 
Current cell = ...
        at 
org.apache.hadoop.hbase.util.BloomContext.sanityCheck(BloomContext.java:63) 
~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:54) 
~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:213)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:232)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:133)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) 
~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259)
 ~[hbase-server-2.1.5.jar:2.1.5]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733)
 ~[hbase-server-2.1.5.jar:2.1.5]
        ... 12 more

{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to