[
https://issues.apache.org/jira/browse/PHOENIX-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195060#comment-15195060
]
Gabriel Reid commented on PHOENIX-2723:
---------------------------------------
{quote}
well, the logic is quite simple. If there are several input files and one table
name - all those files will be loaded to this table. Otherwise the number of
tables need to be equal number of inputs.
{quote}
This sounds like the semantics of one input parameter is then changed by the
contents of other input parameters, which I'm personally not in favor of.
I think that sticking with a single invocation is for loading a single table is
the best way to stay in line with the [Principle of least
astonishment|https://en.wikipedia.org/wiki/Principle_of_least_astonishment]
(mostly because it is in line with how most other tools work), and the
advantages of not having to write shell scripts around it and reduced start-up
time don't feel like a bit enough win to compromise on simplicity here. That's
just my opinion of course.
> Make BulkLoad able to load several tables at once
> -------------------------------------------------
>
> Key: PHOENIX-2723
> URL: https://issues.apache.org/jira/browse/PHOENIX-2723
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Sergey Soldatov
> Assignee: Sergey Soldatov
> Attachments: PHOENIX-2723-1.patch
>
>
> It comes that usually bulk load is required for more than one table and
> usually it's done by running jobs one by one. The idea is to provide lists of
> tables and corresponding input sources to the MR BulkLoad job. Syntax can be
> something like :
> yarn ... CsvBulkLoadTool -t table1,table2,table3 --input input1,input2,input3
> Having map tableName => input during map phase we can determine to which
> table the current split belongs to and produce necessary tableRowKeyPair.
> Any thoughts, suggestions?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)