[jira] [Commented] (PHOENIX-129) Improve MapReduce-based import

James Taylor (JIRA) Fri, 14 Mar 2014 19:44:25 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935948#comment-13935948
 ]


James Taylor commented on PHOENIX-129:
--------------------------------------

One more comment on the "why" of the create table option. It isn't currently 
needed, so I think it's fine if we remove it, but the reason it existed in the 
first place is to support creation of the HFiles even in the event that you 
don't have connectivity to an HBase cluster. We had a use case like this 
before, but no longer do. This could be supported by passing through the DDL 
statement, and then using our "connectionless" Connection, you could run all of 
the upsert statements (since they don't actually need a connection). You'd use 
either pre-split information in the DDL statement or the salting information or 
potentially another argument to determine where to make your split points.

> Improve MapReduce-based import
> ------------------------------
>
>                 Key: PHOENIX-129
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-129
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>         Attachments: PHOENIX-129-3.0.patch, PHOENIX-129-master.patch
>
>
> In implementing PHOENIX-66, it was noted that the current MapReduce-based 
> importer implementation has a number issues, including the following:
> * CSV handling is largely replicated from the non-MR code, with no ability to 
> specify custom separators
> * No automated tests, and code is written in a way that makes it difficult to 
> test
> * Unusual custom config loading and handling instead of using 
> GenericOptionParser and ToolRunner and friends
> The initial work towards PHOENIX-66 included refactoring the MR importer 
> enough to use common code, up until the development of automated testing 
> exposed the fact that the MR importer could use some major refactoring.
> This ticket is a proposal to do a relatively major rework of the MR import, 
> fixing the above issues. The biggest improvements that will result from this 
> are a common codebase for handling CSV input, and the addition of automated 
> testing for the MR import.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PHOENIX-129) Improve MapReduce-based import

Reply via email to