[jira] [Commented] (HBASE-16931) Setting cell's seqId to zero in compaction flow might cause RS down.

Yu Li (JIRA) Mon, 24 Oct 2016 08:31:29 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-16931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602322#comment-15602322
 ]


Yu Li commented on HBASE-16931:
-------------------------------

All timed out cases failed because of OOME (why so frequent OOME?...)
{noformat}
Running org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd
Exception in thread "Thread-2475" java.lang.OutOfMemoryError: Java heap space
Running org.apache.hadoop.hbase.TestHBaseOnOtherDfsCluster
Running org.apache.hadoop.hbase.tool.TestCanaryTool
Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Thread-2505" java.lang.OutOfMemoryError: Java heap space
Exception in thread "Thread-2507" java.lang.OutOfMemoryError: Java heap space
{noformat}

And the failed case seems encountered some environment issue (_Unable to create 
region directory_):
{noformat}
Running org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat
Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 23.592 sec <<< 
FAILURE! - in 
org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat
testScanYZYToEmpty(org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat)
  Time elapsed: 0.044 sec  <<< ERROR!
java.io.IOException: java.util.concurrent.ExecutionException: 
java.io.IOException: Unable to create region directory: 
/tmp/scantest1_snapshot__8235bb48-4e7b-4e00-ad80-b2ce716c8522/data/default/scantest1/519e450e89d832d702a416a9bca04b5d
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:188)
        at 
org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:180)
        at 
org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(RestoreSnapshotHelper.java:527)
        at 
org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:234)
        at 
org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions(RestoreSnapshotHelper.java:170)
        at 
org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:736)
        at 
org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.restoreSnapshot(MultiTableSnapshotInputFormatImpl.java:249)
        at 
org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.restoreSnapshots(MultiTableSnapshotInputFormatImpl.java:243)
        at 
org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormatImpl.setInput(MultiTableSnapshotInputFormatImpl.java:80)
        at 
org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormat.setInput(MultiTableSnapshotInputFormat.java:106)
        at 
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initMultiTableSnapshotMapperJob(TableMapReduceUtil.java:319)
        at 
org.apache.hadoop.hbase.mapreduce.TestMultiTableSnapshotInputFormat.initJob(TestMultiTableSnapshotInputFormat.java:72)
{noformat}

Ran above 4 cases locally and confirmed all could pass.

> Setting cell's seqId to zero in compaction flow might cause RS down.
> --------------------------------------------------------------------
>
>                 Key: HBASE-16931
>                 URL: https://issues.apache.org/jira/browse/HBASE-16931
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 2.0.0
>            Reporter: binlijin
>            Assignee: binlijin
>            Priority: Critical
>         Attachments: HBASE-16931-master.patch, HBASE-16931.branch-1.patch, 
> HBASE-16931.branch-1.v2.patch, HBASE-16931_master_v2.patch, 
> HBASE-16931_master_v3.patch, HBASE-16931_master_v4.patch, 
> HBASE-16931_master_v5.patch
>
>
> Compactor#performCompaction
>       do {
>         hasMore = scanner.next(cells, scannerContext);
>         // output to writer:
>         for (Cell c : cells) {
>           if (cleanSeqId && c.getSequenceId() <= smallestReadPoint) {
>             CellUtil.setSequenceId(c, 0);
>           }
>           writer.append(c);
>         }
>         cells.clear();
>       } while (hasMore);
> scanner.next will choose at most "hbase.hstore.compaction.kv.max" kvs, the 
> last cell still reference by StoreScanner.prevCell, so if cleanSeqId is 
> called when the scanner.next call StoreScanner.checkScanOrder may throw 
> exception and cause regionserver down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16931) Setting cell's seqId to zero in compaction flow might cause RS down.

Reply via email to