[
https://issues.apache.org/jira/browse/PHOENIX-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977252#comment-13977252
]
Nick Dimiduk commented on PHOENIX-918:
--------------------------------------
{quote}
bq. I know a good Apache open source project that already does that. Why not
work to integrate Apache Phoenix with Apache Hive rather than have some
separate/different/duplicated effort (i.e. like, as a start, implementing this
JIRA)?
FWIW, I agree with this. I think that at its core, Phoenix is a HBase schema
mapping mechanism together with a system for doing optimal scans and retrieval
of data for given queries. I think that both of these are the main focuses of
an optimal HBaseStorageHandler for Hive, and I think that Phoenix has already
largely solved these issues. That's my 2c anyhow.
{quote}
I think this could be a very good approach, benefiting both users across
projects. Can we dig into this in more detail? Basically, Hive's
HBaseStorageHandler converts the relevant portions of the hive query into a
configured (meaning, scan plus hadoop.mapred namespace) InputFormat for the
execution engine to consume. So for this to work with hive, we'd need a
Phoenix-planned query that can be partitioned according to the Hive runtime
semantics.
Not really ORC related, perhaps we should have this conversation on a different
ticket?
> Support importing directly from ORC formatted HDFS data
> -------------------------------------------------------
>
> Key: PHOENIX-918
> URL: https://issues.apache.org/jira/browse/PHOENIX-918
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
>
> We currently have a good way to import from CSV, but we should also add the
> ability to import from HDFS ORC files, as this would likely be common if
> folks have Hive data they'd like to import.
> [~enis], [~ndimiduk], [~devaraj] - Does this make sense, or is there a
> better, existing way? Any takers on implementing it?
--
This message was sent by Atlassian JIRA
(v6.2#6252)