[ 
https://issues.apache.org/jira/browse/HBASE-16931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601075#comment-15601075
 ] 

ramkrishna.s.vasudevan commented on HBASE-16931:
------------------------------------------------

Checking this patch. 
Fix looks ok but one question -since we call shipped() after a batch of cells 
are written - to avoid OOME because during compaction we hold all the blocks 
till the compaction is completed. So to avoid that we call shipped(). But 
because we do shipped() there is a chance that the blocks are cleared and in 
write flow we hold on to 'lastCell' etc so those could get corrupted when the 
block got released.
So we added beforeShipped() called. Now even before this bug was there in read 
path even in write path we will end up in the same problem right. 
The lastCell in write path just before the cleanSeqId started happening will 
have a seqId but now the next Cell will become 0. So it is going to be problem 
in Writer#checkKey() method I believe.
One more question - after append() immediately cant we again set back the 
lastSeqId?

> Setting cell's seqId to zero in compaction flow might cause RS down.
> --------------------------------------------------------------------
>
>                 Key: HBASE-16931
>                 URL: https://issues.apache.org/jira/browse/HBASE-16931
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 2.0.0
>            Reporter: binlijin
>            Assignee: binlijin
>            Priority: Critical
>         Attachments: HBASE-16931-master.patch
>
>
> Compactor#performCompaction
>       do {
>         hasMore = scanner.next(cells, scannerContext);
>         // output to writer:
>         for (Cell c : cells) {
>           if (cleanSeqId && c.getSequenceId() <= smallestReadPoint) {
>             CellUtil.setSequenceId(c, 0);
>           }
>           writer.append(c);
>         }
>         cells.clear();
>       } while (hasMore);
> scanner.next will choose at most "hbase.hstore.compaction.kv.max" kvs, the 
> last cell still reference by StoreScanner.prevCell, so if cleanSeqId is 
> called when the scanner.next call StoreScanner.checkScanOrder may throw 
> exception and cause regionserver down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to