[ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954608#comment-14954608
 ] 

maghamravikiran commented on PHOENIX-2216:
------------------------------------------

[~gabriel.reid], [~jamestaylor]
    I have attached two patch files using different approaches.
a) HFileMultioutputFormat: [phoenix-custom-hfileoutputformat.patch]
        Most of the code is copied over from HFileOutputformat with minor 
tweaks to write the data to different directories based on table and family 
name. All the integration tests work successfully.  :) 

b) MultipleOutputs: [phoenix-multipleoutputs.patch]
      The plan is to use MultipleOutputs with HFileOutputFormat2 as the 
OutputFormat from the Reducer . Tests which involve a single table bulk load 
works but when we have multiple tables, tests keep failing. If the 
HFileOutputFormat produces the necessary files under the configured job 
outputpath, it works.  However, for bulk loads of multiple tables , tests fail. 
 

Please let me know which of the two approaches should we follow. 

> Support single mapper pass to CSV bulk load table and indexes
> -------------------------------------------------------------
>
>                 Key: PHOENIX-2216
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2216
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: maghamravikiran
>         Attachments: phoenix-custom-hfileoutputformat.patch, 
> phoenix-multipleoutputs.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to