hive-thriftserver does not work with parquet tables in hive metastore also,
this PR will fix it too?
do not need to change any pom.xml ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/CDH5-HiveContext-Parquet-tp11853p11880.html
Sent from the Apache Spark
I have a CDH5.0.3 cluster with Hive tables written in Parquet.
The tables have the DeprecatedParquetInputFormat on their metadata, and
when I try to select from one using Spark SQL, it blows up with a stack
trace like this:
java.lang.RuntimeException: java.lang.ClassNotFoundException:
As far as I can tell, the method was removed after 0.12.0 in the fix
for HIVE-5223
(https://github.com/apache/hive/commit/4059a32f34633dcef1550fdef07d9f9e044c722c#diff-948cc2a95809f584eb030e2b57be3993),
and that fix was back-ported in its entirety to 5.0.0+:
Hi Sean,
Thanks for the reply. I'm on CDH 5.0.3 and upgrading the whole cluster to
5.1.0 will eventually happen but not immediately.
I've tried running the CDH spark-1.0 release and also building it from
source. This, unfortunately goes into a whole other rathole of
dependencies. :-(
Eric
Hm, I was thinking that the issue is that Spark has to use a forked
hive-exec since hive-exec unfortunately includes a bunch of
dependencies it shouldn't. It forked Hive 0.12.0:
http://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/0.12.0
... and then I was thinking maybe CDH wasn't
Yeah, that's what I feared. Unfortunately upgrades on very large production
clusters aren't a cheap way to find out what else is broken.
Perhaps I can create an RCFile table and sidestep parquet for now.
On Aug 10, 2014, at 1:45 PM, Sean Owen so...@cloudera.com wrote:
Hm, I was thinking
I imagine it's not the only instance of this kind of problem people
will ever encounter. Can you rebuild Spark with this particular
release of Hive?
Unfortunately the Hive APIs that we use change to much from release to
release to make this possible. There is a JIRA for compiling Spark SQL
Thanks Michael, I can try that too.
I know you guys aren't in sales/marketing (thank G-d), but given all the hoopla
about the CDH-DataBricks partnership, it'd be awesome if you guys were
somewhat more aligned, by which I mean that the DataBricks releases on Apache
that say for CDH5 would
If the link to PR/1819 is broken. Here is the one
https://github.com/apache/spark/pull/1819.
On Sun, Aug 10, 2014 at 5:56 PM, Eric Friedman eric.d.fried...@gmail.com
wrote:
Thanks Michael, I can try that too.
I know you guys aren't in sales/marketing (thank G-d), but given all the
hoopla
On Sun, Aug 10, 2014 at 2:43 PM, Michael Armbrust mich...@databricks.com
wrote:
if I try to add hive-exec-0.12.0-cdh5.0.3.jar to my SPARK_CLASSPATH, in
order to get DeprecatedParquetInputFormat, I find out that there is an
incompatibility in the SerDeUtils class. Spark's Hive snapshot
10 matches
Mail list logo