Hi,
I am using newAPIHadoopRDD to load RDD from hbase (using pyspark running as
yarn-client) - pretty much the standard case demonstrated in the
hbase_inputformat.py from examples... the thing is the when trying the very
same code on spark 1.2 I am getting the error bellow which based on
Problems like this are always due to having code compiled for Hadoop 1.x
run against Hadoop 2.x, or vice versa. Here, you compiled for 1.x but at
runtime Hadoop 2.x is used.
A common cause is actually bundling Spark / Hadoop classes with your app,
when the app should just use the Spark / Hadoop
I have not used CDH5.3.0. But looks
spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar contains some
hadoop1 jars (come from a wrong hbase version).
I don't know the recommanded way to build spark-examples jar because the
official Spark docs does not mention how to build spark-examples jar.
Yes, the distribution is certainly fine and built for Hadoop 2. It sounds
like you are inadvertently including Spark code compiled for Hadoop 1 when
you run your app. The general idea is to use the cluster's copy at runtime.
Those with more pyspark experience might be able to give more useful
thanks, I found the issue, I was including
/usr/lib/spark/lib/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar into
the classpath - this was breaking it. now using custom jar with just the python
convertors and all works as a charm.thanks,antony.
On Wednesday, 7 January 2015,
this is official cloudera compiled stack cdh 5.3.0 - nothing has been done by
me and I presume they are pretty good in building it so I still suspect it now
gets the classpath resolved in different way?
thx,Antony.
On Wednesday, 7 January 2015, 18:55, Sean Owen so...@cloudera.com wrote: