[ 
https://issues.apache.org/jira/browse/HBASE-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258599#comment-13258599
 ] 

Lars Hofhansl commented on HBASE-3967:
--------------------------------------

HFileOutputFormat just stores KeyValues so it already handles Deletes. The 
question is: How do you get Deletes output from a Mapper?

One solution (the one I took in HBASE-5440) is to have Mapper emit KeyValues 
and then use KeyValueSortReducer in the reduce phase.
Another approach is this jira, which is what Facebook has internally (as far as 
I know).

Yet another approach could be use Mutation (which is new in 0.92), and write a 
new SortReducer.

I think the point of this jira is to get the Facebook approach into 0.94/trunk 
in order to make upgrading more palatable for Facebook.
Kannan, correct me if I am wrong.
                
> Support deletes in HFileOutputFormat based bulk import mechanism
> ----------------------------------------------------------------
>
>                 Key: HBASE-3967
>                 URL: https://issues.apache.org/jira/browse/HBASE-3967
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Kannan Muthukkaruppan
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: diff.patch
>
>
> During bulk imports, it'll be useful to be able to do delete mutations 
> (either to delete data that already exists in HBase or was inserted earlier 
> during this run of the import). 
> For example, we have a use case, where we are processing a log of data which 
> may have both inserts and deletes in the mix and we want to upload that into 
> HBase using the bulk import mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to