[jira] [Commented] (PHOENIX-129) Improve MapReduce-based import

James Taylor (JIRA) Fri, 14 Mar 2014 17:23:28 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935874#comment-13935874
 ]


James Taylor commented on PHOENIX-129:
--------------------------------------

Wow, fantastic! Great work, [~gabriel.reid]. And it includes unit test - love 
it.

One question: did you try it on a real cluster as well? I think the changes you 
made make sense in terms of config options.

Some minor feedback:

Use Nick's suggested package name: package org.apache.phoenix.mapreduce;

+        conn = DriverManager.getConnection("jdbc:phoenix:" + zkQuorum);
Use constants from PhoenixRuntime.JDBC_PROTOCOL + 
PhoenixRuntime.JDBC_PROTOCOL_SEPARATOR

+        conn.close();
+        PhoenixDriver.INSTANCE.close();
May not matter, but throw in a 
DriverManager.deregisterDriver(PhoenixDriver.INSTANCE);

+            // TODO Creating a new parser for each line seems terribly 
inefficient but
+            // there's no public way to parse single lines via commons-csv. We 
should update
+            // it to create a LineParser class like this one.
Good idea. File JIRA with Apache Commons?

+public interface ImportPreUpsertKeyValueProcessor {
Good idea - I like the idea of this new hook.

> Improve MapReduce-based import
> ------------------------------
>
>                 Key: PHOENIX-129
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-129
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>         Attachments: PHOENIX-129-3.0.patch, PHOENIX-129-master.patch
>
>
> In implementing PHOENIX-66, it was noted that the current MapReduce-based 
> importer implementation has a number issues, including the following:
> * CSV handling is largely replicated from the non-MR code, with no ability to 
> specify custom separators
> * No automated tests, and code is written in a way that makes it difficult to 
> test
> * Unusual custom config loading and handling instead of using 
> GenericOptionParser and ToolRunner and friends
> The initial work towards PHOENIX-66 included refactoring the MR importer 
> enough to use common code, up until the development of automated testing 
> exposed the fact that the MR importer could use some major refactoring.
> This ticket is a proposal to do a relatively major rework of the MR import, 
> fixing the above issues. The biggest improvements that will result from this 
> are a common codebase for handling CSV input, and the addition of automated 
> testing for the MR import.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PHOENIX-129) Improve MapReduce-based import

Reply via email to