[jira] [Commented] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

James Taylor (JIRA) Fri, 16 Oct 2015 11:35:18 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961163#comment-14961163
 ]


James Taylor commented on PHOENIX-2216:
---------------------------------------

Good idea, [~gabriel.reid]. To force the table to span regions, you can 
pre-split it by tacking on a SPLIT ON (1, 2, 3) to your CREATE TABLE statement, 
where the 1, 2, 3 represent the region boundaries (i.e. this would assume your 
leading PK column is an INTEGER or BIGINT).

Also, one more minor nit: would you mind adding to the class level javadoc 
where the code was copy/pasted from and that the plan is to put together an 
HBase patch to allow this to be done without the copy/paste (ideally 
referencing an HBase JIRA that you file).

> Support single mapper pass to CSV bulk load table and indexes
> -------------------------------------------------------------
>
>                 Key: PHOENIX-2216
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2216
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: maghamravikiran
>         Attachments: phoenix-custom-hfileoutputformat-comments.patch, 
> phoenix-custom-hfileoutputformat.patch, phoenix-multipleoutputs.patch
>
>
> Instead of running separate MR jobs for CSV bulk load: once for the table and 
> then once for each secondary index, generate both the data table HFiles and 
> the index table(s) HFiles in one mapper phase.
> Not sure if we need HBASE-3727 to be implemented for this or if we can do it 
> with existing HBase APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2216) Support single mapper pass to CSV bulk load table and indexes

Reply via email to