Raman Ch created HBASE-17901: -------------------------------- Summary: HBase region server stops because of a failure during memstore flush Key: HBASE-17901 URL: https://issues.apache.org/jira/browse/HBASE-17901 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.2.2 Environment: Ubuntu 14.04.5 LTS HBase Version 1.2.2, revision=1 Reporter: Raman Ch
Once per several days region server fails to flush a memstore and stops. April, 8: {code} 2017-04-08 00:10:57,737 WARN [MemStoreFlusher.1] regionserver.HStore: Failed flushing store file, retrying num=9 java.io.IOException: ScanWildcardColumnTracker.checkColumn ran into a column actually smaller than the previous column: at org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkVersions(ScanWildcardColumnTracker.java:117) at org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:464) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:119) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:74) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:915) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2271) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2375) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2105) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2067) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1958) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1884) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:215) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:244) at java.lang.Thread.run(Thread.java:745) 2017-04-08 00:10:57,737 FATAL [MemStoreFlusher.1] regionserver.HRegionServer: ABORTING region server datanode13.webmeup.com,16020,1491573320653: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: di_ordinal_tmp,gov.ok.data/browse?page=2&category=Natural%20Resources&limitTo=datasets&tags=ed,1489764397211.9d7ca11018672c4aace7f30c8f4253f3. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2428) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2105) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2067) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1958) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1884) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:215) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:244) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: ScanWildcardColumnTracker.checkColumn ran into a column actually smaller than the previous column: at org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkVersions(ScanWildcardColumnTracker.java:117) at org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:464) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:529) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:119) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:74) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:915) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2271) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2375) ... 9 more {code} After region server restart it functioned properly for a couple of days. April, 10: {code} 2017-04-10 22:36:32,147 WARN [MemStoreFlusher.0] regionserver.HStore: Failed flushing store file, retrying num=9 java.io.IOException: Non-increasing Bloom keys: de.tina-eicke.blog/category/garten/\x09h after de.uina-eicke.blog/category/fruehling/\x09h at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.appendGeneralBloomfilter(StoreFile.java:936) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:969) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:125) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:74) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:915) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2271) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2375) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2105) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2067) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1958) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1884) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) at java.lang.Thread.run(Thread.java:745) 2017-04-10 22:36:32,147 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: ABORTING region server datanode13.webmeup.com,16020,1491828707088: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: di_ordinal_tmp,de.thschroeer/lmo/lmo.php?action=results&file=archiv/BLW2-2013.l98&endtab=8&st=8&tabtype=2\x09hw,1489764397211.b07eaba657affc2ba29f84b59c672836. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2428) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2105) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2067) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1958) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1884) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Non-increasing Bloom keys: de.tina-eicke.blog/category/garten/\x09h after de.uina-eicke.blog/category/fruehling/\x09h at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.appendGeneralBloomfilter(StoreFile.java:936) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:969) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:125) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:74) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:915) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2271) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2375) ... 9 more {code} Table description: {code} 'di_ordinal_tmp', {TABLE_ATTRIBUTES => {DURABILITY => 'ASYNC_WAL', MAX_FILESIZE => '8589934592'}, {NAME => 'di', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '10368000 SECONDS (120 DAYS)', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'false', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0', METADATA => {'COMPRESSION_COMPACT' => 'GZ'}} {code} The table is being populated only using put operations. There has never been any bulk loading into this table. -- This message was sent by Atlassian JIRA (v6.3.15#6346)