[ https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918366#comment-16918366 ]
Alex Batyrshin edited comment on HBASE-22862 at 8/29/19 7:48 AM: ----------------------------------------------------------------- [~openinx] Yes, Phoenix installs coprocessors and they are used in our write path: {code} coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|805306366|org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder' {code} In our case coprocessor$5 is used. We don't have any useful stacktraces from Phoenix when UPSERTs happens. Our table SQL looks like this: {code} CREATE TABLE IF NOT EXISTS TBL_TABLE_CODE ( "c" VARCHAR NOT NULL PRIMARY KEY, "d"."apd" TIMESTAMP, "d"."emd" TIMESTAMP, "d"."prid" VARCHAR, "d"."o" VARCHAR, "d"."elr" UNSIGNED_TINYINT, "d"."st" UNSIGNED_TINYINT ... ); CREATE INDEX "IDX_TBL_TABLE_CODE_O" ON "TBL_TABLE_CODE" ("d"."o", "d"."emd") INCLUDE( "d"."elr", "apd", "d"."st", ... ); CREATE INDEX "IDX_TBL_TABLE_CODE_PRID" ON "TBL_TABLE_CODE" ("d"."prid", "d"."emd") INCLUDE( "d"."elr", "d"."apd", "d"."st", ...); for (int i=0; i < batches.size; i += 1) { for (int j = 0; j < batches[i].size; j += 1 ) { UPSERT INTO TBL_TABLE_CODE VALUES (?, ?, ? ...); } commit; } {code} was (Author: 0x62ash): [~openinx] Yes, Phoenix installs coprocessors and they are used in our write path: {code} coprocessor$1 => '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', coprocessor$3 => '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 => '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', coprocessor$5 => '|org.apache.phoenix.hbase.index.Indexer|805306366|org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder' {code} In our case coprocessor$5 is used. We don't have any useful stacktraces from Phoenix when UPSERTs happens. Our table SQL looks like this: {code} CREATE TABLE IF NOT EXISTS TBL_TABLE_CODE ( "c" VARCHAR NOT NULL PRIMARY KEY, "d"."apd" TIMESTAMP, "d"."emd" TIMESTAMP, "d"."prid" VARCHAR, "d"."o" VARCHAR, "d"."elr" UNSIGNED_TINYINT, "d"."st" UNSIGNED_TINYINT ... ); CREATE INDEX "IDX_CIS_O" ON "TBL_TABLE_CODE" ("d"."o", "d"."emd") INCLUDE( "d"."elr", "apd", "d"."st", ... ); CREATE INDEX "IDX_CIS_PRID" ON "TBL_TABLE_CODE" ("d"."prid", "d"."emd") INCLUDE( "d"."elr", "d"."apd", "d"."st", ...); for (int i=0; i < batches.size; i += 1) { for (int j = 0; j < batches[i].size; j += 1 ) { UPSERT INTO TBL_TABLE_CODE VALUES (?, ?, ? ...); } commit; } {code} > Region Server crash with: Added a key not lexically larger than previous > ------------------------------------------------------------------------ > > Key: HBASE-22862 > URL: https://issues.apache.org/jira/browse/HBASE-22862 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 1.4.10 > Environment: {code} > openjdk version "1.8.0_181" > OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02) > OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed > mode) > {code} > Reporter: Alex Batyrshin > Assignee: Zheng Hu > Priority: Critical > Attachments: HBASE-22862.UT.v01.patch, HBASE-22862.UT.v02.patch > > > We observe error "Added a key not lexically larger than previous” that cause > most of our region-servers to crash in our cluster. > {code} > 2019-08-15 18:02:10,554 INFO [MemStoreFlusher.0] regionserver.HRegion: > Flushing 1/1 column families, memstore=56.08 MB > 2019-08-15 18:02:10,727 WARN [MemStoreFlusher.0] regionserver.HStore: Failed > flushing store file, retrying num=0 > java.io.IOException: Added a key not lexically larger than previous. Current > cell = > \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567, > lastCell = > \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770 > at > org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200) > at > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264) > at java.lang.Thread.run(Thread.java:748) > 2019-08-15 18:02:21,776 WARN [MemStoreFlusher.0] regionserver.HStore: Failed > flushing store file, retrying num=9 > java.io.IOException: Added a key not lexically larger than previous. Current > cell = > \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567, > lastCell = > \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770 > at > org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200) > at > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264) > at java.lang.Thread.run(Thread.java:748) > 2019-08-15 18:02:21,777 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: > ABORTING region server prod006,60020,1565873610692: Replay of WAL required. > Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > TBL_TABLE_CODE,\x0904606203097821slG=sPD,1563070299676.5110b3395ca64a51cea99c6572a4c3d9. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2675) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314) > at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200) > at > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Added a key not lexically larger than > previous. Current cell = > \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567, > lastCell = > \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770 > at > org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127) > at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622) > ... 9 more > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)