Spark 1.1 / cdh4 stuck using old hadoop client?

Paul Wais Mon, 15 Sep 2014 18:30:07 -0700

Dear List,

I'm having trouble getting Spark 1.1 to use the Hadoop 2 API for
reading SequenceFiles.  In particular, I'm seeing:


Exception in thread "main" org.apache.hadoop.ipc.RemoteException:
Server IPC version 7 cannot communicate with client version 4
        at org.apache.hadoop.ipc.Client.call(Client.java:1070)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
        at com.sun.proxy.$Proxy7.getProtocolVersion(Unknown Source)
        ...

When invoking JavaSparkContext#newAPIHadoopFile().  (With args
validSequenceFileURI, SequenceFileInputFormat.class, Text.class,
BytesWritable.class, new Job().getConfiguration() -- Pretty close to
the unit test here:
https://github.com/apache/spark/blob/f0f1ba09b195f23f0c89af6fa040c9e01dfa8951/core/src/test/java/org/apache/spark/JavaAPISuite.java#L916
)


This error indicates to me that Spark is using an old hadoop client to
do reads.  Oddly I'm able to do /writes/ ok, i.e. I'm able to write
via JavaPairRdd#saveAsNewAPIHadoopFile() to my hdfs cluster.


Do I need to explicitly build spark for modern hadoop??  I previously
had an hdfs cluster running hadoop 2.3.0 and I was getting a similar
error (server is using version 9, client is using version 4).


I'm using Spark 1.1 cdh4 as well as hadoop cdh4 from the links posted
on spark's site:
 * http://d3kbcqa49mib13.cloudfront.net/spark-1.1.0-bin-cdh4.tgz
 * http://d3kbcqa49mib13.cloudfront.net/hadoop-2.0.0-cdh4.2.0.tar.gz


What distro of hadoop is used at Data Bricks?  Are there distros of
Spark 1.1 and hadoop that should work together out-of-the-box?
(Previously I had Spark 1.0.0 and Hadoop 2.3 working fine..)

Thanks for any help anybody can give me here!
-Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark 1.1 / cdh4 stuck using old hadoop client?

Reply via email to