Stephen Durfey created CRUNCH-659:
-------------------------------------

             Summary: Upgrade to Hive 2.x
                 Key: CRUNCH-659
                 URL: https://issues.apache.org/jira/browse/CRUNCH-659
             Project: Crunch
          Issue Type: Task
            Reporter: Stephen Durfey


I've been working on CRUNCH-340 to finish implementing the HCatSource and 
HCatTarget. It seems to be in a better place now that crunch only supports 
hadoop 2. I was looking to target as high of a version of hive/hcat as possible 
with minimal impact on the code base and dependencies. 

Hive 2.3.1 is out now. That relies upon hadoop 2.7.2, but HBase doesn't bump up 
to that version until HBase 2.x. Trying to run with hadoop 2.7.2 causes test 
failures in crunch-hbase. I'm not sure if that is going to cause runtime issues 
as the minicluster wouldn't even start due to a package name change in 
hadoop-hdfs (for the class StorageType) that's causing a no class found error. 

Hive 2.1.0 relies upon Hadoop 2.6.0, and that plays nice with HBase 1.x. 
However, the class StructField (inside TupleObjectInspector for ORC files) has 
a new abstract method added to it that would need to be implemented that was 
introduced after 2.x of Hive. Other than that everything runs fine. 

Currently Crunch is on 0.13.1 of Hive, so it's pretty far behind. I'm just kind 
of looking for feedback on the version bumps that should be targeted for my 
changes in CRUNCH-340. I wanted to take care of those first in a separate JIRA 
before introducing new code against a higher Hive version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to