[
https://issues.apache.org/jira/browse/PHOENIX-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964785#comment-13964785
]
Nick Dimiduk commented on PHOENIX-918:
--------------------------------------
bq. I believe there's an impedance mismatch between HCatalog and Phoenix
metadata
Yeah, there's still only a rudimentary bridge between these worlds. Today,
Hive's HBaseStorageHandler offers the concept of [column
mapping|https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-ColumnMapping],
whereby the columns defined in the Hive table are mapped to cf/qual and
encoding directives in HBase. The current implementation is very primitive.
(Have a look at the section "Register the HBase table" in [this
post|http://www.n10k.com/blog/hbase-via-hive-pt2/].) I'd like to replace it
with something more rigorous and supported by a "type DSL" that HBase would
honor; see HBASE-10091 for my current musings. Feedback from you fine folk is
encouraged! Basically, if we can make that language sufficient for Phoenix, I
hope it'll solve 90% of anyone else's use-cases that come along later. Your
first couple bullet-points look like an excellent start for a functional spec;
I don't know that the others apply at this level.
bq. Am I correct in assuming that you had a more automated workflow in mind?
That same blog post outlines what I imagine a Hive -> HBase workflow might look
like. Easily you could s/HBase/Phoenix/ and still get the point. The last step
of Hive queries hitting HBase has some serious performance issues today, but I
hope they can be improved by enhancing the storage handler's ability to push
Hive predicates down to the HBase scan/filter; or side-stepped entirely via an
HBaseSnapshotInputFormat/HIVE-6584. More and more, my money is on the latter as
a long-term solution.
> Support importing directly from ORC formatted HDFS data
> -------------------------------------------------------
>
> Key: PHOENIX-918
> URL: https://issues.apache.org/jira/browse/PHOENIX-918
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
>
> We currently have a good way to import from CSV, but we should also add the
> ability to import from HDFS ORC files, as this would likely be common if
> folks have Hive data they'd like to import.
> [~enis], [~ndimiduk], [~devaraj] - Does this make sense, or is there a
> better, existing way? Any takers on implementing it?
--
This message was sent by Atlassian JIRA
(v6.2#6252)