[ https://issues.apache.org/jira/browse/HBASE-16394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
qgxiaozhan updated HBASE-16394: ------------------------------- Description: My cluster dead one regionserver because of "Compaction is trying to add a bad range" Here the log: [2016-08-09T18:30:19.094+08:00] [INFO] regionserver.ReplicationSource : Log hdfs://athene/hbase/oldWALs/MJQ-HBASE-ATHENE-11139%2C16020%2C1470729882622.default.1470736608897 was moved to hdfs://athene/hbase/oldWA Ls/MJQ-HBASE-ATHENE%2C16020%2C1470729882622.default.1470736608897 [2016-08-09T18:30:30.225+08:00] [INFO] regionserver.MemStoreFlusher : Waited 90070ms on a compaction to clean up 'TOO MANY STORE FILES'; waited long enough... proceeding with flush of tjs4:popt_info,160608008474430,147073716071 1.7900baab5204e4f36fa49379c30cd584. [2016-08-09T18:30:30.226+08:00] [INFO] regionserver.HRegion : Started memstore flush for tjs4:popt_info,160608008474430,1470737160711.7900baab5204e4f36fa49379c30cd584., current region memstore size 769.41 MB, and 1/1 column fam ilies' memstores are being flushed. [2016-08-09T18:30:30.549+08:00] [INFO] regionserver.StripeStoreFileManager : 3 conflicting files (likely created by a flush) of size 156153021 are moved to L0 due to concurrent stripe change [2016-08-09T18:30:31.199+08:00] [INFO] regionserver.HStore : Completed compaction of 203 file(s) in c of tjs4:popt_info,160608008474430,1470737160711.7900baab5204e4f36fa49379c30cd584. into 20347d203d09442cac30c42b424adda6(size= 3.0 G), ded362eab9cf4a819675cd35992d4974(size=3.0 G), 281b1039ed2643679e5b0a3820f5059d(size=2.4 G), total size for store is 8.6 G. This selection was in queue for 0sec, and took 10mins, 16sec to execute. [2016-08-09T18:30:31.200+08:00] [INFO] regionserver.CompactSplitThread : Completed compaction: Request = regionName=tjs4:popt_info,160608008474430,1470737160711.7900baab5204e4f36fa49379c30cd584., storeName=c, fileCount=203, fil eSize=7.2 G, priority=-3, time=5388162916126535; duration=10mins, 16sec [2016-08-09T18:30:31.201+08:00] [INFO] regionserver.HRegion : Starting compaction on ci in region ad_union:union_click,3487f383ad484bcbb5cef727b69cec2a,1466484980245.c5772fc60c54f64cc977ba9cc01d74ad. [2016-08-09T18:30:31.201+08:00] [INFO] regionserver.HStore : Starting compaction of 14 file(s) in ci of ad_union:union_click,3487f383ad484bcbb5cef727b69cec2a,1466484980245.c5772fc60c54f64cc977ba9cc01d74ad. into tmpdir=hdfs://at hene/hbase/data/ad_union/union_click/c5772fc60c54f64cc977ba9cc01d74ad/.tmp, totalSize=75.0 M [2016-08-09T18:30:31.206+08:00] [INFO] hfile.CacheConfig : blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@52659482, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=fal se, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false [2016-08-09T18:30:32.893+08:00] [INFO] regionserver.ReplicationSource : Log hdfs://athene/hbase/oldWALs/MJQ-HBASE-ATHENE-11139l%2C16020%2C1470729882622.default.1470736612825 was moved to hdfs://athene/hbase/oldWA Ls/MJQ-HBASE-ATHENE-11139.%2C16020%2C1470729882622.default.1470736612825 [2016-08-09T18:30:34.373+08:00] [INFO] regionserver.HStore : Added hdfs://athene/hbase/data/tjs4/popt_info/7900baab5204e4f36fa49379c30cd584/c/775e8956cd2a48aaae70b9eded4457e9, entries=4336457, sequenceid=582528, filesize=48.7 M [2016-08-09T18:30:34.373+08:00] [FATAL] regionserver.HRegionServer : ABORTING region server MJQ-HBASE-ATHENE-11139.,16020,1470729882622: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tjs4:popt_info,160608008474430,1470737160711.7900baab5204e4f36fa49379c30cd584. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2354) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2057) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2019) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1911) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1837) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Compaction is trying to add a bad range. at org.apache.hadoop.hbase.regionserver.StripeStoreFileManager$CompactionOrFlushMergeCopy.processNewCandidateStripes(StripeStoreFileManager.java:837) at org.apache.hadoop.hbase.regionserver.StripeStoreFileManager$CompactionOrFlushMergeCopy.mergeResults(StripeStoreFileManager.java:672) at org.apache.hadoop.hbase.regionserver.StripeStoreFileManager.insertNewFiles(StripeStoreFileManager.java:144) at org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1052) at org.apache.hadoop.hbase.regionserver.HStore.access$500(HStore.java:128) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2231) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2315) was: My cluster dead one regionserver because of "Compaction is trying to add a bad range" Here the log: [2016-08-09T18:30:19.094+08:00] [INFO] regionserver.ReplicationSource : Log hdfs://athene/hbase/oldWALs/MJQ-HBASE-ATHENE-11139%2C16020%2C1470729882622.default.1470736608897 was moved to hdfs://athene/hbase/oldWA Ls/MJQ-HBASE-ATHENE%2C16020%2C1470729882622.default.1470736608897 [2016-08-09T18:30:30.225+08:00] [INFO] regionserver.MemStoreFlusher : Waited 90070ms on a compaction to clean up 'TOO MANY STORE FILES'; waited long enough... proceeding with flush of tjs4:popt_info,160608008474430,147073716071 1.7900baab5204e4f36fa49379c30cd584. [2016-08-09T18:30:30.226+08:00] [INFO] regionserver.HRegion : Started memstore flush for tjs4:popt_info,160608008474430,1470737160711.7900baab5204e4f36fa49379c30cd584., current region memstore size 769.41 MB, and 1/1 column fam ilies' memstores are being flushed. [2016-08-09T18:30:30.549+08:00] [INFO] regionserver.StripeStoreFileManager : 3 conflicting files (likely created by a flush) of size 156153021 are moved to L0 due to concurrent stripe change [2016-08-09T18:30:31.199+08:00] [INFO] regionserver.HStore : Completed compaction of 203 file(s) in c of tjs4:popt_info,160608008474430,1470737160711.7900baab5204e4f36fa49379c30cd584. into 20347d203d09442cac30c42b424adda6(size= 3.0 G), ded362eab9cf4a819675cd35992d4974(size=3.0 G), 281b1039ed2643679e5b0a3820f5059d(size=2.4 G), total size for store is 8.6 G. This selection was in queue for 0sec, and took 10mins, 16sec to execute. [2016-08-09T18:30:31.200+08:00] [INFO] regionserver.CompactSplitThread : Completed compaction: Request = regionName=tjs4:popt_info,160608008474430,1470737160711.7900baab5204e4f36fa49379c30cd584., storeName=c, fileCount=203, fil eSize=7.2 G, priority=-3, time=5388162916126535; duration=10mins, 16sec [2016-08-09T18:30:31.201+08:00] [INFO] regionserver.HRegion : Starting compaction on ci in region ad_union:union_click,3487f383ad484bcbb5cef727b69cec2a,1466484980245.c5772fc60c54f64cc977ba9cc01d74ad. [2016-08-09T18:30:31.201+08:00] [INFO] regionserver.HStore : Starting compaction of 14 file(s) in ci of ad_union:union_click,3487f383ad484bcbb5cef727b69cec2a,1466484980245.c5772fc60c54f64cc977ba9cc01d74ad. into tmpdir=hdfs://at hene/hbase/data/ad_union/union_click/c5772fc60c54f64cc977ba9cc01d74ad/.tmp, totalSize=75.0 M [2016-08-09T18:30:31.206+08:00] [INFO] hfile.CacheConfig : blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@52659482, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=fal se, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false [2016-08-09T18:30:32.893+08:00] [INFO] regionserver.ReplicationSource : Log hdfs://athene/hbase/oldWALs/MJQ-HBASE-ATHENE-11139l%2C16020%2C1470729882622.default.1470736612825 was moved to hdfs://athene/hbase/oldWA Ls/MJQ-HBASE-ATHENE-11139.%2C16020%2C1470729882622.default.1470736612825 [2016-08-09T18:30:34.373+08:00] [INFO] regionserver.HStore : Added hdfs://athene/hbase/data/tjs4/popt_info/7900baab5204e4f36fa49379c30cd584/c/775e8956cd2a48aaae70b9eded4457e9, entries=4336457, sequenceid=582528, filesize=48.7 M [2016-08-09T18:30:34.373+08:00] [FATAL] regionserver.HRegionServer : ABORTING region server MJQ-HBASE-ATHENE-11139.,16020,1470729882622: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tjs4:popt_info,160608008474430,1470737160711.7900baab5204e4f36fa49379c30cd584. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2354) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2057) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2019) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1911) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1837) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Compaction is trying to add a bad range. at org.apache.hadoop.hbase.regionserver.StripeStoreFileManager$CompactionOrFlushMergeCopy.processNewCandidateStripes(StripeStoreFileManager.java:837) at org.apache.hadoop.hbase.regionserver.StripeStoreFileManager$CompactionOrFlushMergeCopy.mergeResults(StripeStoreFileManager.java:672) at org.apache.hadoop.hbase.regionserver.StripeStoreFileManager.insertNewFiles(StripeStoreFileManager.java:144) at org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1052) at org.apache.hadoop.hbase.regionserver.HStore.access$500(HStore.java:128) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2231) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2315) > What cause "Compaction is trying to add a bad range",and Should stop the > regionserver? > -------------------------------------------------------------------------------------- > > Key: HBASE-16394 > URL: https://issues.apache.org/jira/browse/HBASE-16394 > Project: HBase > Issue Type: Bug > Components: Compaction > Affects Versions: 1.1.2 > Environment: hadoop-2.6.1 hbase-1.1.2 > Reporter: qgxiaozhan > > My cluster dead one regionserver because of "Compaction is trying to add a > bad range" > Here the log: > [2016-08-09T18:30:19.094+08:00] [INFO] regionserver.ReplicationSource : Log > hdfs://athene/hbase/oldWALs/MJQ-HBASE-ATHENE-11139%2C16020%2C1470729882622.default.1470736608897 > was moved to hdfs://athene/hbase/oldWA > Ls/MJQ-HBASE-ATHENE%2C16020%2C1470729882622.default.1470736608897 > [2016-08-09T18:30:30.225+08:00] [INFO] regionserver.MemStoreFlusher : Waited > 90070ms on a compaction to clean up 'TOO MANY STORE FILES'; waited long > enough... proceeding with flush of > tjs4:popt_info,160608008474430,147073716071 > 1.7900baab5204e4f36fa49379c30cd584. > [2016-08-09T18:30:30.226+08:00] [INFO] regionserver.HRegion : Started > memstore flush for > tjs4:popt_info,160608008474430,1470737160711.7900baab5204e4f36fa49379c30cd584., > current region memstore size 769.41 MB, and 1/1 column fam ilies' > memstores are being flushed. > [2016-08-09T18:30:30.549+08:00] [INFO] regionserver.StripeStoreFileManager : > 3 conflicting files (likely created by a flush) of size 156153021 are moved > to L0 due to concurrent stripe change > [2016-08-09T18:30:31.199+08:00] [INFO] regionserver.HStore : Completed > compaction of 203 file(s) in c of > tjs4:popt_info,160608008474430,1470737160711.7900baab5204e4f36fa49379c30cd584. > into 20347d203d09442cac30c42b424adda6(size= 3.0 G), > ded362eab9cf4a819675cd35992d4974(size=3.0 G), > 281b1039ed2643679e5b0a3820f5059d(size=2.4 G), total size for store is 8.6 G. > This selection was in queue for 0sec, and took 10mins, 16sec to execute. > [2016-08-09T18:30:31.200+08:00] [INFO] regionserver.CompactSplitThread : > Completed compaction: Request = > regionName=tjs4:popt_info,160608008474430,1470737160711.7900baab5204e4f36fa49379c30cd584., > storeName=c, fileCount=203, fil eSize=7.2 G, priority=-3, > time=5388162916126535; duration=10mins, 16sec > [2016-08-09T18:30:31.201+08:00] [INFO] regionserver.HRegion : Starting > compaction on ci in region > ad_union:union_click,3487f383ad484bcbb5cef727b69cec2a,1466484980245.c5772fc60c54f64cc977ba9cc01d74ad. > [2016-08-09T18:30:31.201+08:00] [INFO] regionserver.HStore : Starting > compaction of 14 file(s) in ci of > ad_union:union_click,3487f383ad484bcbb5cef727b69cec2a,1466484980245.c5772fc60c54f64cc977ba9cc01d74ad. > into tmpdir=hdfs://at > hene/hbase/data/ad_union/union_click/c5772fc60c54f64cc977ba9cc01d74ad/.tmp, > totalSize=75.0 M > [2016-08-09T18:30:31.206+08:00] [INFO] hfile.CacheConfig : > blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@52659482, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=fal se, cacheEvictOnClose=false, > cacheDataCompressed=false, prefetchOnOpen=false > [2016-08-09T18:30:32.893+08:00] [INFO] regionserver.ReplicationSource : Log > hdfs://athene/hbase/oldWALs/MJQ-HBASE-ATHENE-11139l%2C16020%2C1470729882622.default.1470736612825 > was moved to hdfs://athene/hbase/oldWA > Ls/MJQ-HBASE-ATHENE-11139.%2C16020%2C1470729882622.default.1470736612825 > [2016-08-09T18:30:34.373+08:00] [INFO] regionserver.HStore : Added > hdfs://athene/hbase/data/tjs4/popt_info/7900baab5204e4f36fa49379c30cd584/c/775e8956cd2a48aaae70b9eded4457e9, > entries=4336457, sequenceid=582528, filesize=48.7 M > [2016-08-09T18:30:34.373+08:00] [FATAL] regionserver.HRegionServer : ABORTING > region server MJQ-HBASE-ATHENE-11139.,16020,1470729882622: Replay of WAL > required. Forcing server shutdown > org.apache.hadoop.hbase.DroppedSnapshotException: region: > tjs4:popt_info,160608008474430,1470737160711.7900baab5204e4f36fa49379c30cd584. > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2354) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2057) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2019) > at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1911) > at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1837) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:75) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Compaction is trying to add a bad range. > at > org.apache.hadoop.hbase.regionserver.StripeStoreFileManager$CompactionOrFlushMergeCopy.processNewCandidateStripes(StripeStoreFileManager.java:837) > at > org.apache.hadoop.hbase.regionserver.StripeStoreFileManager$CompactionOrFlushMergeCopy.mergeResults(StripeStoreFileManager.java:672) > at > org.apache.hadoop.hbase.regionserver.StripeStoreFileManager.insertNewFiles(StripeStoreFileManager.java:144) > at > org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1052) > at org.apache.hadoop.hbase.regionserver.HStore.access$500(HStore.java:128) > at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2231) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2315) -- This message was sent by Atlassian JIRA (v6.3.4#6332)