  I am trying to load a CSV file which is on HDFS. I have two
machines: IMPETUS-1466 ( and IMPETUS-1325 (
Both have Spark 1.6.0 pre-built for Hadoop 2.6 and later, but for both, I
had existing Hadoop clusters running Hadoop 1.0.4. I have launched HDFS
from by running start-dfs.sh from it, copied files from local
file system to HDFS and can view them by hadoop fs -ls.

  However, when I am trying to load the CSV file from pyspark shell
(launched by bin/pyspark --packages com.databricks:spark-csv_2.10:1.3.0)
from IMPETUS-1325 ( with the following commands:

>>from pyspark.sql import SQLContext

>>sqlContext = SQLContext(sc)

>>patients_df =

I get the following error:

java.io.EOFException: End of File Exception between local host is: "
IMPETUS-1325.IMPETUS.CO.IN/"; destination host is:
"IMPETUS-1466":54310; : java.io.EOFException; For more details see:

U have changed the port number from 54310 to 8020, but then I get the error

java.net.ConnectException: Call From IMPETUS-1325.IMPETUS.CO.IN/
to IMPETUS-1466:8020 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:

To me it seemed like this may result from a version mismatch between Spark
Hadoop client and my Hadoop cluster, so I have made the following changes:

1) Added the following lines to conf/spark-env.sh

export HADOOP_HOME="/usr/local/hadoop-1.0.4" export
HADOOP_CONF_DIR="$HADOOP_HOME/conf" export HDFS_URL="hdfs://"

2) Downloaded Spark 1.6.0, pre-built with user-provided Hadoop, and in
addition to the three lines above, added the following line to

export SPARK_DIST_CLASSPATH="/usr/local/hadoop-1.0.4/bin/hadoop"

but none of it seems to work. However, the following command works from and gives the directory listing:

/usr/local/hadoop-1.0.4/bin/hadoop fs -ls hdfs://

Any suggestion?



Bibudh Lahiri
Data Scientist, Impetus Technolgoies
5300 Stevens Creek Blvd
San Jose, CA 95129

Reply via email to