Re: CDH5, HiveContext, Parquet

2014-08-11 Thread chutium
hive-thriftserver does not work with parquet tables in hive metastore also, this PR will fix it too? do not need to change any pom.xml ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CDH5-HiveContext-Parquet-tp11853p11880.html Sent from the Apache Spark

CDH5, HiveContext, Parquet

2014-08-10 Thread Eric Friedman
I have a CDH5.0.3 cluster with Hive tables written in Parquet. The tables have the DeprecatedParquetInputFormat on their metadata, and when I try to select from one using Spark SQL, it blows up with a stack trace like this: java.lang.RuntimeException: java.lang.ClassNotFoundException:

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Sean Owen
As far as I can tell, the method was removed after 0.12.0 in the fix for HIVE-5223 (https://github.com/apache/hive/commit/4059a32f34633dcef1550fdef07d9f9e044c722c#diff-948cc2a95809f584eb030e2b57be3993), and that fix was back-ported in its entirety to 5.0.0+:

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Eric Friedman
Hi Sean, Thanks for the reply. I'm on CDH 5.0.3 and upgrading the whole cluster to 5.1.0 will eventually happen but not immediately. I've tried running the CDH spark-1.0 release and also building it from source. This, unfortunately goes into a whole other rathole of dependencies. :-( Eric

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Sean Owen
Hm, I was thinking that the issue is that Spark has to use a forked hive-exec since hive-exec unfortunately includes a bunch of dependencies it shouldn't. It forked Hive 0.12.0: http://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/0.12.0 ... and then I was thinking maybe CDH wasn't

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Eric Friedman
Yeah, that's what I feared. Unfortunately upgrades on very large production clusters aren't a cheap way to find out what else is broken. Perhaps I can create an RCFile table and sidestep parquet for now. On Aug 10, 2014, at 1:45 PM, Sean Owen so...@cloudera.com wrote: Hm, I was thinking

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Michael Armbrust
I imagine it's not the only instance of this kind of problem people will ever encounter. Can you rebuild Spark with this particular release of Hive? Unfortunately the Hive APIs that we use change to much from release to release to make this possible. There is a JIRA for compiling Spark SQL

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Eric Friedman
Thanks Michael, I can try that too. I know you guys aren't in sales/marketing (thank G-d), but given all the hoopla about the CDH-DataBricks partnership, it'd be awesome if you guys were somewhat more aligned, by which I mean that the DataBricks releases on Apache that say for CDH5 would

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Yin Huai
If the link to PR/1819 is broken. Here is the one https://github.com/apache/spark/pull/1819. On Sun, Aug 10, 2014 at 5:56 PM, Eric Friedman eric.d.fried...@gmail.com wrote: Thanks Michael, I can try that too. I know you guys aren't in sales/marketing (thank G-d), but given all the hoopla

Re: CDH5, HiveContext, Parquet

2014-08-10 Thread Eric Friedman
On Sun, Aug 10, 2014 at 2:43 PM, Michael Armbrust mich...@databricks.com wrote: if I try to add hive-exec-0.12.0-cdh5.0.3.jar to my SPARK_CLASSPATH, in order to get DeprecatedParquetInputFormat, I find out that there is an incompatibility in the SerDeUtils class. Spark's Hive snapshot