The Spark package you are using is packaged with Hadoop 2.6, but the
HDFS is Hadoop 1.0.4, they are not compatible.

On Tue, Apr 26, 2016 at 11:18 AM, Bibudh Lahiri <bibudhlah...@gmail.com> wrote:
> Hi,
>   I am trying to load a CSV file which is on HDFS. I have two machines:
> IMPETUS-1466 (172.26.49.156) and IMPETUS-1325 (172.26.49.55). Both have
> Spark 1.6.0 pre-built for Hadoop 2.6 and later, but for both, I had existing
> Hadoop clusters running Hadoop 1.0.4. I have launched HDFS from
> 172.26.49.156 by running start-dfs.sh from it, copied files from local file
> system to HDFS and can view them by hadoop fs -ls.
>
>   However, when I am trying to load the CSV file from pyspark shell
> (launched by bin/pyspark --packages com.databricks:spark-csv_2.10:1.3.0)
> from IMPETUS-1325 (172.26.49.55) with the following commands:
>
>
>>>from pyspark.sql import SQLContext
>
>>>sqlContext = SQLContext(sc)
>
>>>patients_df =
>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>> "false").load("hdfs://172.26.49.156:54310/bibudh/healthcare/data/cloudera_challenge/patients.csv")
>
>
> I get the following error:
>
>
> java.io.EOFException: End of File Exception between local host is:
> "IMPETUS-1325.IMPETUS.CO.IN/172.26.49.55"; destination host is:
> "IMPETUS-1466":54310; : java.io.EOFException; For more details see:
> http://wiki.apache.org/hadoop/EOFException
>
>
> U have changed the port number from 54310 to 8020, but then I get the error
>
>
> java.net.ConnectException: Call From IMPETUS-1325.IMPETUS.CO.IN/172.26.49.55
> to IMPETUS-1466:8020 failed on connection exception:
> java.net.ConnectException: Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
>
>
> To me it seemed like this may result from a version mismatch between Spark
> Hadoop client and my Hadoop cluster, so I have made the following changes:
>
>
> 1) Added the following lines to conf/spark-env.sh
>
>
> export HADOOP_HOME="/usr/local/hadoop-1.0.4" export
> HADOOP_CONF_DIR="$HADOOP_HOME/conf" export
> HDFS_URL="hdfs://172.26.49.156:8020"
>
>
> 2) Downloaded Spark 1.6.0, pre-built with user-provided Hadoop, and in
> addition to the three lines above, added the following line to
> conf/spark-env.sh
>
>
> export SPARK_DIST_CLASSPATH="/usr/local/hadoop-1.0.4/bin/hadoop"
>
>
> but none of it seems to work. However, the following command works from
> 172.26.49.55 and gives the directory listing:
>
> /usr/local/hadoop-1.0.4/bin/hadoop fs -ls hdfs://172.26.49.156:54310/
>
>
> Any suggestion?
>
>
> Thanks
>
> Bibudh
>
>
> --
> Bibudh Lahiri
> Data Scientist, Impetus Technolgoies
> 5300 Stevens Creek Blvd
> San Jose, CA 95129
> http://knowthynumbers.blogspot.com/
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to