[jira] [Commented] (PHOENIX-918) Support importing directly from ORC formatted HDFS data

Nick Dimiduk (JIRA) Tue, 22 Apr 2014 12:25:15 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977252#comment-13977252
 ]


Nick Dimiduk commented on PHOENIX-918:
--------------------------------------

{quote}
bq. I know a good Apache open source project that already does that. Why not 
work to integrate Apache Phoenix with Apache Hive rather than have some 
separate/different/duplicated effort (i.e. like, as a start, implementing this 
JIRA)?

FWIW, I agree with this. I think that at its core, Phoenix is a HBase schema 
mapping mechanism together with a system for doing optimal scans and retrieval 
of data for given queries. I think that both of these are the main focuses of 
an optimal HBaseStorageHandler for Hive, and I think that Phoenix has already 
largely solved these issues. That's my 2c anyhow.
{quote}

I think this could be a very good approach, benefiting both users across 
projects. Can we dig into this in more detail? Basically, Hive's 
HBaseStorageHandler converts the relevant portions of the hive query into a 
configured (meaning, scan plus hadoop.mapred namespace) InputFormat for the 
execution engine to consume. So for this to work with hive, we'd need a 
Phoenix-planned query that can be partitioned according to the Hive runtime 
semantics.

Not really ORC related, perhaps we should have this conversation on a different 
ticket?

> Support importing directly from ORC formatted HDFS data
> -------------------------------------------------------
>
>                 Key: PHOENIX-918
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-918
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> We currently have a good way to import from CSV, but we should also add the 
> ability to import from HDFS ORC files, as this would likely be common if 
> folks have Hive data they'd like to import.
> [~enis], [~ndimiduk], [~devaraj] - Does this make sense, or is there a 
> better, existing way? Any takers on implementing it?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PHOENIX-918) Support importing directly from ORC formatted HDFS data

Reply via email to