[
https://issues.apache.org/jira/browse/HBASE-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258599#comment-13258599
]
Lars Hofhansl commented on HBASE-3967:
--------------------------------------
HFileOutputFormat just stores KeyValues so it already handles Deletes. The
question is: How do you get Deletes output from a Mapper?
One solution (the one I took in HBASE-5440) is to have Mapper emit KeyValues
and then use KeyValueSortReducer in the reduce phase.
Another approach is this jira, which is what Facebook has internally (as far as
I know).
Yet another approach could be use Mutation (which is new in 0.92), and write a
new SortReducer.
I think the point of this jira is to get the Facebook approach into 0.94/trunk
in order to make upgrading more palatable for Facebook.
Kannan, correct me if I am wrong.
> Support deletes in HFileOutputFormat based bulk import mechanism
> ----------------------------------------------------------------
>
> Key: HBASE-3967
> URL: https://issues.apache.org/jira/browse/HBASE-3967
> Project: HBase
> Issue Type: Sub-task
> Reporter: Kannan Muthukkaruppan
> Priority: Critical
> Fix For: 0.96.0
>
> Attachments: diff.patch
>
>
> During bulk imports, it'll be useful to be able to do delete mutations
> (either to delete data that already exists in HBase or was inserted earlier
> during this run of the import).
> For example, we have a use case, where we are processing a log of data which
> may have both inserts and deletes in the mix and we want to upload that into
> HBase using the bulk import mechanism.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira