Jean-Daniel Cryans created HBASE-10958:
------------------------------------------

             Summary: [dataloss] Bulk loading with seqids can prevent some log 
entries from being replayed
                 Key: HBASE-10958
                 URL: https://issues.apache.org/jira/browse/HBASE-10958
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.94.18, 0.98.1, 0.96.2
            Reporter: Jean-Daniel Cryans
            Priority: Blocker
             Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3


We found an issue with bulk loads causing data loss when assigning sequence ids 
(HBASE-6630) that is triggered when replaying recovered edits. We're nicknaming 
this issue *Blindspot*.

The problem is that the sequence id given to a bulk loaded file is higher than 
those of the edits in the region's memstore. When replaying recovered edits, 
the rule to skip some of them is that they have to be _lower than the highest 
sequence id_. In other words, the edits that have a sequence id lower than the 
highest one in the store files *should* have also been flushed. This is not the 
case with bulk loaded files since we now have an HFile with a sequence id 
higher than unflushed edits.

The log recovery code takes this into account by simply skipping the bulk 
loaded files, but this "bulk loaded status" is *lost* on compaction. The edits 
in the logs that have a sequence id lower than the bulk loaded file that got 
compacted are put in a blind spot and are skipped during replay.

Here's the easiest way to recreate this issue:
 - Create an empty table
 - Put one row in it (let's say it gets seqid 1)
 - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
hbase.mapreduce.bulkload.assign.sequenceNumbers.
 - Bulk load a second file the same way (it gets seqid 3).
 - Major compact the table (the new file has seqid 3 and isn't considered bulk 
loaded).
 - Kill the region server that holds the table's region.
 - Scan the table once the region is made available again. The first row, at 
seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to