[ https://issues.apache.org/jira/browse/HBASE-16931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602322#comment-15602322 ]
Yu Li commented on HBASE-16931: ------------------------------- All timed out cases failed because of OOME (why so frequent OOME?...) {noformat} Running org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd Exception in thread "Thread-2475" java.lang.OutOfMemoryError: Java heap space Running org.apache.hadoop.hbase.TestHBaseOnOtherDfsCluster Running org.apache.hadoop.hbase.tool.TestCanaryTool Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap space Exception in thread "Thread-2505" java.lang.OutOfMemoryError: Java heap space Exception in thread "Thread-2507" java.lang.OutOfMemoryError: Java heap space {noformat} And the failed case seems encountered some environment issue (_Unable to create region directory_): {noformat} Running org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 23.592 sec <<< FAILURE! - in org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat testScanYZYToEmpty(org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat) Time elapsed: 0.044 sec <<< ERROR! java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Unable to create region directory: /tmp/scantest1_snapshot__8235bb48-4e7b-4e00-ad80-b2ce716c8522/data/default/scantest1/519e450e89d832d702a416a9bca04b5d at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:180) at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:527) at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:234) at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:170) at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:736) at org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.restoreSnapshot(MultiTableSnapshotInputFormatImpl.java:249) at org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.restoreSnapshots(MultiTableSnapshotInputFormatImpl.java:243) at org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.setInput(MultiTableSnapshotInputFormatImpl.java:80) at org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormat.setInput(MultiTableSnapshotInputFormat.java:106) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initMultiTableSnapshotMapperJob(TableMapReduceUtil.java:319) at org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat.initJob(TestMultiTableSnapshotInputFormat.java:72) {noformat} Ran above 4 cases locally and confirmed all could pass. > Setting cell's seqId to zero in compaction flow might cause RS down. > -------------------------------------------------------------------- > > Key: HBASE-16931 > URL: https://issues.apache.org/jira/browse/HBASE-16931 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 2.0.0 > Reporter: binlijin > Assignee: binlijin > Priority: Critical > Attachments: HBASE-16931-master.patch, HBASE-16931.branch-1.patch, > HBASE-16931.branch-1.v2.patch, HBASE-16931_master_v2.patch, > HBASE-16931_master_v3.patch, HBASE-16931_master_v4.patch, > HBASE-16931_master_v5.patch > > > Compactor#performCompaction > do { > hasMore = scanner.next(cells, scannerContext); > // output to writer: > for (Cell c : cells) { > if (cleanSeqId && c.getSequenceId() <= smallestReadPoint) { > CellUtil.setSequenceId(c, 0); > } > writer.append(c); > } > cells.clear(); > } while (hasMore); > scanner.next will choose at most "hbase.hstore.compaction.kv.max" kvs, the > last cell still reference by StoreScanner.prevCell, so if cleanSeqId is > called when the scanner.next call StoreScanner.checkScanOrder may throw > exception and cause regionserver down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)