[ 
https://issues.apache.org/jira/browse/HBASE-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated HBASE-12363:
---------------------------------
    Labels: Phoenix  (was: )

> KEEP_DELETED_CELLS considered harmful?
> --------------------------------------
>
>                 Key: HBASE-12363
>                 URL: https://issues.apache.org/jira/browse/HBASE-12363
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: Lars Hofhansl
>              Labels: Phoenix
>
> Brainstorming...
> This morning in the train (of all places) I realized a fundamental issue in 
> how KEEP_DELETED_CELLS is implemented.
> The problem is around knowing when it is safe to remove a delete marker (we 
> cannot remove it unless all cells affected by it are remove otherwise).
> This was particularly hard for family marker, since they sort before all 
> cells of a row, and hence scanning forward through an HFile you cannot know 
> whether the family markers are still needed until at least the entire row is 
> scanned.
> My solution was to keep the TS of the oldest put in any given HFile, and only 
> remove delete markers older than that TS.
> That sounds good on the face of it... But now imagine you wrote a version of 
> ROW 1 and then never update it again. Then later you write a billion other 
> rows and delete them all. Since the TS of the cells in ROW 1 is older than 
> all the delete markers for the other billion rows, these will never be 
> collected... At least for the region that hosts ROW 1 after a major 
> compaction.
> I don't see a good way out of this. In parent I outlined these four solutions:
> So there are three options I think:
> # Only allow the new flag set on CFs with TTL set. MIN_VERSIONS would not 
> apply to deleted rows or delete marker rows (wouldn't know how long to keep 
> family deletes in that case). (MAX)VERSIONS would still be enforced on all 
> rows types except for family delete markers.
> # Translate family delete markers to column delete marker at (major) 
> compaction time.
> # Change HFileWriterV* to keep track of the earliest put TS in a store and 
> write it to the file metadata. Use that use expire delete marker that are 
> older and hence can't affect any puts in the file.
> # Have Store.java keep track of the earliest put in internalFlushCache and 
> compactStore and then append it to the file metadata. That way HFileWriterV* 
> would not need to know about KVs.
> And I implemented #4.
> I'd love to get input on ideas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to