[
https://issues.apache.org/jira/browse/HBASE-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858046#action_12858046
]
Jonathan Gray commented on HBASE-2453:
--------------------------------------
I agree that if you have an IO bound workload, and you have a significant
number of deletes, then this may not be a win. This seems this a fairly rare
use case except for those applications which do a ton of deleting. In those
cases, you could always trigger majors.
What we get by taking this out is not having to track deletes during minors
(yes, we should measure this), retaining deleted records for sake of
recovery/snapshotscanner type stuff, and (possibly) being able to do minor
compactions against any storefiles (not just neighbors).
Looking at the old minor and scan delete tracker code paths, it's not a ton of
work in the case that there are no deletes, so I imagine it's not a big
performance impact. However once you take the delete tracking out, minors can
actually be refactored to not keep track row by row as it still does even after
this patch... Having to do row compares for every kv, plus the previous delete
tracking, we could see some incremental boost from stripping down minors to the
bare minimum.
Let's keep the conversation going on these topics.
Of note but not really that important, I believe the bigtable paper explicitly
talks about deletes only being deleted during major compactions.
> Revisit compaction policies after HBASE-2248 commit
> ---------------------------------------------------
>
> Key: HBASE-2453
> URL: https://issues.apache.org/jira/browse/HBASE-2453
> Project: Hadoop HBase
> Issue Type: Improvement
> Reporter: Jonathan Gray
> Assignee: Jonathan Gray
> Priority: Critical
> Fix For: 0.20.4, 0.20.5, 0.21.0
>
> Attachments: HBASE-2453-v1.patch
>
>
> HBASE-2248 turned Gets into Scans server-side. It also removed the invariant
> that deletes in a file only apply to other files and not itself (no longer
> processes MemStore deletes when the delete happens). This has implications
> for our minor compaction policy.
> We are currently processing deletes during minor compactions in a way that
> makes it so we do the actual deleting as we compact, but we retain the delete
> records themselves. This makes it so we retain the invariant of deletes only
> applying to other files.
> Since this is now gone post HBASE-2248, we should revisit our compaction
> policies.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira