[
https://issues.apache.org/jira/browse/HADOOP-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HADOOP-1784:
--------------------------
Status: In Progress (was: Patch Available)
I suppose its possible that compaction could start after the cacheflush thread
finishes (Doesn't on my old single-processor linux box).
> [hbase] delete
> --------------
>
> Key: HADOOP-1784
> URL: https://issues.apache.org/jira/browse/HADOOP-1784
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
> Fix For: 0.15.0
>
> Attachments: delete1.patch, delete2.patch, delete3.patch
>
>
> Delete is incomplete in hbase. Whats there is inconsistent. Deleted records
> currently persist and are never cleaned up. This issue is about making
> delete behavior coherent across gets, scans and compaction.
> Below is from a bit of back and forth between Jim and myself where Jim takes
> a stab at outlining a model for delete taking inspiration from how Digital's
> versioned file system used work:
> {code}
> Let's say you have 5 versions with timestamps T1, T2, ..., T5 where
> timestamps are increasing from T1 to T5 (so T5 is the newest).
> Before any deletes occur, if you don't specify a timestamp and request N
> versions, you should get T5 first, then T4, T3, ... until you have
> reached N or you run out of versions.
> Now add deletes:
> (In the following, timestamp refers to the timestamp associated with
> the delete operation)
> 1. If no timestamp is specified we are deleting the latest version.
> If a get or scanner specifies that it wants N versions, then it
> should get T4, T3, ..., until we have N versions or we run out of
> older versions. After compaction, the deletion record and T5 should
> be elided from the HStore.
> 2. If a timestamp is specified and it exactly matches a version (say
> T4) and a get or scanner requests N versions, then the client
> receives T5, T3, T2, ... until we satisfy N or run out of versions.
> After a compaction, the deletion record and T4 should be elided
> from the HStore.
> 3. If a timestamp is specified and does not exactly match a version,
> it means delete every version older than this timestamp. If the
> timestamp is greater than T5 all versions are considered to be
> deleted and a get or a scanner will return no results even if
> the get or scanner specify an older time. This is consistent
> with the concept of delete all versions older than timestamp.
> After a compaction, the delete record and all the values should
> be elided.
> If the specified timestamp falls between two older versions (say
> T4 and T3) then T3, T2 and T1 are considered to be deleted (again
> this is all versions older than timestamp). A get or scanner
> that specifies no time but requests N versions can only get T5
> and T4. A get or scanner that requests a time of T3 or earlier
> will get no results because those versions are deleted. After
> a compaction, the deletion record and the deleted versions
> are elided from the HStore.
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.