[ https://issues.apache.org/jira/browse/HBASE-23084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940175#comment-16940175 ]
ranpanfeng commented on HBASE-23084: ------------------------------------ [~zhangduo] I use this property: hbase.hregion.compacting.memstore.type=EAGER, out-of-order adjacent keys can ocurr in the same memstore segment? I guess the prevKey in the end of former segment, the next key in the begin of the latter counterpart. right? > Assertion "Added a key not lexically larger than previous" or "prevKey > followed by a smaller key" caused abortion of RS > -------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-23084 > URL: https://issues.apache.org/jira/browse/HBASE-23084 > Project: HBase > Issue Type: Bug > Components: regionserver > Environment: We store many phoenix tables in hbase, only one table > trigger this issues. > When two jacent keys in wrong order, the observation as follows. > 1. the prevKey is written into hbase via Put rpc > 2. the current key via phoenix upsert statment( Multi rpc in underlying hbase) > 3. the current Cell extends 1MB. > 4. the error is easy to re-ocurr. > Reporter: ranpanfeng > Priority: Critical > > when auto flush and manual flush is invoked, cell in immutable memstore > segment is scanned one by one to build data block and bloom filter block, > during this time, two adjacent keys are compared to assert that they are in > lexicographical order, but regretfully, the assertion fails in some > scenarios, when RS encounter these assertion failure, always call abort > function to terminate abnormally. > > the version is 0ba7cc01a13dbb7fec8e6c5d599fc8b4936bae61 tag: rel/2.1.5 > > At least three cases are found. > *error case#1: getQualifierArray throws > java.lang.ArrayIndexOutOfBoundsException* > > {code:java} > 17:57:50.328 [MemStoreFlusher.0] ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region > server ${regionserver}: Replay of WAL required. Forcing server shutdown ***** > org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:612) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:581) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:361) > [hbase-server-2.1.5.jar:2.1.5] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] > Caused by: java.lang.ArrayIndexOutOfBoundsException > at > org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:1155) > ~[hbase-common-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.CellUtil.copyQualifierTo(CellUtil.java:309) > ~[hbase-common-2.1.5.jar:2.1.5] > at org.apache.hadoop.hbase.CellUtil.cloneQualifier(CellUtil.java:121) > ~[hbase-common-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.ByteBufferKeyValue.getQualifierArray(ByteBufferKeyValue.java:112) > ~[hbase-common-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(CellUtil.java:1306) > ~[hbase-common-2.1.5.jar:2.1.5] > at org.apache.hadoop.hbase.CellUtil.toString(CellUtil.java:1483) > ~[hbase-common-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.ByteBufferKeyValue.toString(ByteBufferKeyValue.java:296) > ~[hbase-common-2.1.5.jar:2.1.5] > at java.lang.String.valueOf(String.java:2994) ~[?:1.8.0_144] > at java.lang.StringBuilder.append(StringBuilder.java:131) > ~[?:1.8.0_144] > at > org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:127) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733) > ~[hbase-server-2.1.5.jar:2.1.5] > ... 8 more > 17:57:50.332 [MemStoreFlusher.0] ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer - RegionServer abort: > loaded coprocessors are: [org.apache.phoenix.coprocessor.ScanRegionObserver, > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver, > org.apache.phoenix.hbase.index.Indexer, > org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver, > org.apache.phoenix.coprocessor.ServerCachingEndpointImpl] > > {code} > > > *error case#2: StoreScanner.checkScanOrder triggers "prevKey followed by a > smaller key"* > > {code:java} > Exception in thread "RS_IN_MEMORY_COMPACTION-regionserver/${regionserver}" > java.lang.AssertionError: Key ${prevKey} followed by a smaller key ${key} > at > org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582) > at > org.apache.hadoop.hbase.regionserver.MemStoreCompactorSegmentsIterator.refillKVS(MemStoreCompactorSegmentsIterator.java:150) > at > org.apache.hadoop.hbase.regionserver.MemStoreCompactorSegmentsIterator.hasNext(MemStoreCompactorSegmentsIterator.java:72) > at > org.apache.hadoop.hbase.regionserver.CellChunkImmutableSegment.initializeCellSet(CellChunkImmutableSegment.java:146) > at > org.apache.hadoop.hbase.regionserver.CellChunkImmutableSegment.<init>(CellChunkImmutableSegment.java:66) > at > org.apache.hadoop.hbase.regionserver.SegmentFactory.createImmutableSegment(SegmentFactory.java:133) > at > org.apache.hadoop.hbase.regionserver.SegmentFactory.createImmutableSegmentByCompaction(SegmentFactory.java:59) > at > org.apache.hadoop.hbase.regionserver.MemStoreCompactor.createSubstitution(MemStoreCompactor.java:217) > at > org.apache.hadoop.hbase.regionserver.MemStoreCompactor.doCompaction(MemStoreCompactor.java:171) > at > org.apache.hadoop.hbase.regionserver.MemStoreCompactor.start(MemStoreCompactor.java:109) > at > org.apache.hadoop.hbase.regionserver.CompactingMemStore.flushInMemory(CompactingMemStore.java:422) > at > org.apache.hadoop.hbase.regionserver.CompactingMemStore$InMemoryFlushRunnable.run(CompactingMemStore.java:534) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > at java.lang.Thread.run(Thread.java:748) > 12:34:07.695 [MemStoreFlusher.0] INFO > org.apache.hadoop.hbase.regionserver.HRegion - Flushing 1/1 column families, > dataSize=239.26 MB heapSize=256.00 MB > 12:34:08.476 [MemStoreFlusher.0] ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region > server ${regionserver}: Replay of WAL required. Forcing server shutdown ***** > org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:612) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:581) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:361) > [hbase-server-2.1.5.jar:2.1.5] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] > Caused by: java.lang.AssertionError: Key ${prevKey} followed by a smaller key > ${key} > at > org.apache.hadoop.hbase.regionserver.StoreScanner.checkScanOrder(StoreScanner.java:1007) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:582) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:127) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733) > ~[hbase-server-2.1.5.jar:2.1.5] > ... 8 more > {code} > > > > *error case#3: BloomContext.sanityCheck triggers "Added a key not lexically > larger than previous."* > so, I alter the assertion code into if stmt > {code:java} > --- > a/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java > +++ > b/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java > @@ -33,6 +33,8 @@ import org.apache.hadoop.hbase.io.util.StreamUtils; > import org.apache.hadoop.io.IOUtils; > import org.apache.hadoop.io.WritableUtils; > import org.apache.yetus.audience.InterfaceAudience; > +import org.slf4j.Logger; > +import org.slf4j.LoggerFactory; > import sun.nio.ch.DirectBuffer; > > import > org.apache.hbase.thirdparty.com.google.common.annotations.VisibleForTesting; > @@ -51,6 +53,7 @@ public final class ByteBufferUtils { > @VisibleForTesting > final static boolean UNSAFE_AVAIL = UnsafeAvailChecker.isAvailable(); > public final static boolean UNSAFE_UNALIGNED = > UnsafeAvailChecker.unaligned(); > + private static Logger LOG = LoggerFactory.getLogger(ByteBufferUtils.class); > > private ByteBufferUtils() { > } > @@ -1152,7 +1155,18 @@ public final class ByteBufferUtils { > public static void copyFromBufferToArray(byte[] out, ByteBuffer in, int > sourceOffset, > int destinationOffset, int length) { > if (in.hasArray()) { > - System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out, > destinationOffset, length); > + try { > + System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out, > destinationOffset, length); > + } catch (Throwable e){ > + if (out == null) { > + LOG.warn("[QINGGUI] out=null"); > + } else if (in == null || in.array() == null) { > + LOG.warn("[QINGGUI] in=null"); > + } else { > + LOG.warn("[QINGGUI] out.length={}, in.length={}, sourceOffset={} > destinationOffset={} length={}", > + out.length, in.array().length, sourceOffset, destinationOffset, > length); > + } > + } > } else if (UNSAFE_AVAIL) { > UnsafeAccess.copy(in, sourceOffset, out, destinationOffset, length); > } else { > {code} > > re-deploy my binary, then the failure drifts to bloom filter building code. > {code:java} > 12:54:39.966 [rs(${regionserver})-flush-proc-pool3-thread-2] WARN > org.apache.hadoop.hbase.regionserver.HStore - Failed flushing store file, > retrying num=9 > java.io.IOException: Added a key not lexically larger than previous. Current > cell ... > at > org.apache.hadoop.hbase.util.BloomContext.sanityCheck(BloomContext.java:63) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:54) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:213) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:232) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:133) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2232) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:68) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:56) > [hbase-server-2.1.5.jar:2.1.5] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [?:1.8.0_144] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [?:1.8.0_144] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [?:1.8.0_144] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_144] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_144] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] > 12:54:39.979 [rs(${regionserver})-flush-proc-pool3-thread-2] ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer - ***** ABORTING region > server ${regionserver}: Replay of WAL required. Forcing server shutdown ***** > org.apache.hadoop.hbase.DroppedSnapshotException: region: ${region}. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2788) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2467) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2439) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2329) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2232) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:68) > [hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure$RegionFlushTask.call(FlushTableSubprocedure.java:56) > [hbase-server-2.1.5.jar:2.1.5] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [?:1.8.0_144] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [?:1.8.0_144] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [?:1.8.0_144] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_144] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_144] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] > Caused by: java.io.IOException: Added a key not lexically larger than > previous. Current cell = ... > at > org.apache.hadoop.hbase.util.BloomContext.sanityCheck(BloomContext.java:63) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.util.BloomContext.writeBloom(BloomContext.java:54) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.appendGeneralBloomfilter(StoreFileWriter.java:213) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(StoreFileWriter.java:232) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:133) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:69) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1021) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2259) > ~[hbase-server-2.1.5.jar:2.1.5] > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2733) > ~[hbase-server-2.1.5.jar:2.1.5] > ... 12 more > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)