[jira] [Commented] (PHOENIX-918) Support importing directly from ORC formatted HDFS data

Nick Dimiduk (JIRA) Wed, 09 Apr 2014 16:09:13 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964785#comment-13964785
 ]


Nick Dimiduk commented on PHOENIX-918:
--------------------------------------

bq. I believe there's an impedance mismatch between HCatalog and Phoenix 
metadata

Yeah, there's still only a rudimentary bridge between these worlds. Today, 
Hive's HBaseStorageHandler offers the concept of [column 
mapping|https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-ColumnMapping],
 whereby the columns defined in the Hive table are mapped to cf/qual and 
encoding directives in HBase. The current implementation is very primitive. 
(Have a look at the section "Register the HBase table" in [this 
post|http://www.n10k.com/blog/hbase-via-hive-pt2/].) I'd like to replace it 
with something more rigorous and supported by a "type DSL" that HBase would 
honor; see HBASE-10091 for my current musings. Feedback from you fine folk is 
encouraged! Basically, if we can make that language sufficient for Phoenix, I 
hope it'll solve 90% of anyone else's use-cases that come along later. Your 
first couple bullet-points look like an excellent start for a functional spec; 
I don't know that the others apply at this level.

bq. Am I correct in assuming that you had a more automated workflow in mind?

That same blog post outlines what I imagine a Hive -> HBase workflow might look 
like. Easily you could s/HBase/Phoenix/ and still get the point. The last step 
of Hive queries hitting HBase has some serious performance issues today, but I 
hope they can be improved by enhancing the storage handler's ability to push 
Hive predicates down to the HBase scan/filter; or side-stepped entirely via an 
HBaseSnapshotInputFormat/HIVE-6584. More and more, my money is on the latter as 
a long-term solution.

> Support importing directly from ORC formatted HDFS data
> -------------------------------------------------------
>
>                 Key: PHOENIX-918
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-918
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> We currently have a good way to import from CSV, but we should also add the 
> ability to import from HDFS ORC files, as this would likely be common if 
> folks have Hive data they'd like to import.
> [~enis], [~ndimiduk], [~devaraj] - Does this make sense, or is there a 
> better, existing way? Any takers on implementing it?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PHOENIX-918) Support importing directly from ORC formatted HDFS data

Reply via email to