[ https://issues.apache.org/jira/browse/HBASE-16931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601075#comment-15601075 ]
ramkrishna.s.vasudevan commented on HBASE-16931: ------------------------------------------------ Checking this patch. Fix looks ok but one question -since we call shipped() after a batch of cells are written - to avoid OOME because during compaction we hold all the blocks till the compaction is completed. So to avoid that we call shipped(). But because we do shipped() there is a chance that the blocks are cleared and in write flow we hold on to 'lastCell' etc so those could get corrupted when the block got released. So we added beforeShipped() called. Now even before this bug was there in read path even in write path we will end up in the same problem right. The lastCell in write path just before the cleanSeqId started happening will have a seqId but now the next Cell will become 0. So it is going to be problem in Writer#checkKey() method I believe. One more question - after append() immediately cant we again set back the lastSeqId? > Setting cell's seqId to zero in compaction flow might cause RS down. > -------------------------------------------------------------------- > > Key: HBASE-16931 > URL: https://issues.apache.org/jira/browse/HBASE-16931 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 2.0.0 > Reporter: binlijin > Assignee: binlijin > Priority: Critical > Attachments: HBASE-16931-master.patch > > > Compactor#performCompaction > do { > hasMore = scanner.next(cells, scannerContext); > // output to writer: > for (Cell c : cells) { > if (cleanSeqId && c.getSequenceId() <= smallestReadPoint) { > CellUtil.setSequenceId(c, 0); > } > writer.append(c); > } > cells.clear(); > } while (hasMore); > scanner.next will choose at most "hbase.hstore.compaction.kv.max" kvs, the > last cell still reference by StoreScanner.prevCell, so if cleanSeqId is > called when the scanner.next call StoreScanner.checkScanOrder may throw > exception and cause regionserver down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)