[jira] [Commented] (PHOENIX-2632) Easier Hive->Phoenix data movement

Josh Mahonin (JIRA) Thu, 28 Jan 2016 06:36:13 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121584#comment-15121584
 ]


Josh Mahonin commented on PHOENIX-2632:
---------------------------------------

I think #1 is a great idea. I don't really have any opinions either way on #2.

It should be pretty straight-forward to implement, starting from [1], we just 
need to adjust the SaveMode case statement, and the code to create the table 
then and there. The various options you use in your config can be passed 
through from Spark as option parameters (e.g., zkUrl and table).

I had originally thought that 'Ignore' would be the right SaveMode to use, but 
looking through some examples, I'm wondering if we should take the approach 
where the default 'ErrorIfExists' attempts a 'CREATE TABLE', 'Ignore' will do a 
'CREATE TABLE IF NOT EXISTS', and the existing 'Append' mode will just make an 
attempt to write straight to the specified table.

I'm probably a bit ahead of myself, once you open a new JIRA we can work out 
the details there.

> Easier Hive->Phoenix data movement
> ----------------------------------
>
>                 Key: PHOENIX-2632
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2632
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Randy Gelhausen
>
> Moving tables or query results from Hive into Phoenix today requires error 
> prone manual schema re-definition inside HBase storage handler properties. 
> Since Hive and Phoenix support near equivalent types, it should be easier for 
> users to pick a Hive table and load it (or derived query results) from it.
> I'm posting this to open design discussion, but also submit my own project 
> https://github.com/randerzander/HiveToPhoenix for consideration as an early 
> solution. It creates a Spark DataFrame from a Hive query, uses Phoenix JDBC 
> to "create if not exists" a Phoenix equivalent table, and uses the 
> phoenix-spark artifact to store the DataFrame into Phoenix.
> I'm eager to get feedback if this is interesting/useful to the Phoenix 
> community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2632) Easier Hive->Phoenix data movement

Reply via email to