[
https://issues.apache.org/jira/browse/PHOENIX-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935948#comment-13935948
]
James Taylor commented on PHOENIX-129:
--------------------------------------
One more comment on the "why" of the create table option. It isn't currently
needed, so I think it's fine if we remove it, but the reason it existed in the
first place is to support creation of the HFiles even in the event that you
don't have connectivity to an HBase cluster. We had a use case like this
before, but no longer do. This could be supported by passing through the DDL
statement, and then using our "connectionless" Connection, you could run all of
the upsert statements (since they don't actually need a connection). You'd use
either pre-split information in the DDL statement or the salting information or
potentially another argument to determine where to make your split points.
> Improve MapReduce-based import
> ------------------------------
>
> Key: PHOENIX-129
> URL: https://issues.apache.org/jira/browse/PHOENIX-129
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Gabriel Reid
> Assignee: Gabriel Reid
> Attachments: PHOENIX-129-3.0.patch, PHOENIX-129-master.patch
>
>
> In implementing PHOENIX-66, it was noted that the current MapReduce-based
> importer implementation has a number issues, including the following:
> * CSV handling is largely replicated from the non-MR code, with no ability to
> specify custom separators
> * No automated tests, and code is written in a way that makes it difficult to
> test
> * Unusual custom config loading and handling instead of using
> GenericOptionParser and ToolRunner and friends
> The initial work towards PHOENIX-66 included refactoring the MR importer
> enough to use common code, up until the development of automated testing
> exposed the fact that the MR importer could use some major refactoring.
> This ticket is a proposal to do a relatively major rework of the MR import,
> fixing the above issues. The biggest improvements that will result from this
> are a common codebase for handling CSV input, and the addition of automated
> testing for the MR import.
--
This message was sent by Atlassian JIRA
(v6.2#6252)