Re: EOFException while reading from HDFS

Saurav Sinha Thu, 28 Apr 2016 00:02:08 -0700

Are you able to connect to Name node UI on MACHINE_IP:50070.

Check what is URI there.


If UI does't open it means your hdfs is not up ,try to start it using
start.dfs.sh.



On Thu, Apr 28, 2016 at 2:59 AM, Bibudh Lahiri <bibudhlah...@gmail.com>
wrote:

> Hi,
>   I installed Hadoop 2.6.0 today on one of the machines (172.26.49.156),
> got HDFS running on it (both Namenode and Datanode on the same machine) and
> copied the files to HDFS. However, from the same machine, when I try to
> load the same CSV with the following statement:
>
>   sqlContext.read.format("com.databricks.spark.csv").option("header",
> "false").load("hdfs://
> 172.26.49.156:54310/bibudh/healthcare/data/cloudera_challenge/patients.csv
> ")
>
>  I get the error
>
> java.net.ConnectException: Call From impetus-i0276.impetus.co.in/127.0.0.1
> to impetus-i0276:54310 failed on connection exception:
> java.net.ConnectException: Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
>
>   I have changed the port number to 8020 but the same error gets reported.
>
>   Even the following command is not working from the command line, when
> launched from the HADOOP_HOME folder for
>
>   bin/hdfs dfs -ls hdfs://172.26.49.156:54310/
>
>   which was working earlier when issued from the other machine
> (172.26.49.55), from under HADOOP_HOME for Hadoop 1.0.4.
>
>   I set ~/.bashrc are as follows, when I installed Hadoop 2.6.0:
>
>   export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64
> export HADOOP_HOME=/usr/local/hadoop-2.6.0
> export HADOOP_INSTALL=$HADOOP_HOME
> export HADOOP_PREFIX=$HADOOP_HOME
> export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
> export HADOOP_COMMON_HOME=$HADOOP_PREFIX
> export HADOOP_HDFS_HOME=$HADOOP_PREFIX
> export YARN_HOME=$HADOOP_PREFIX
> export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
> export SPARK_HOME=/home/impadmin/spark-1.6.0-bin-hadoop2.6
>
> PATH=$PATH:$JAVA_HOME/bin:$HADOOP_PREFIX/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin
> export HADOOP_CONF_DIR=$HADOOP_HOME
> export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
> export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
> export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
>
>   Am I getting the port number wrong, or is it some other config param
> that I should check? What's the general rule here?
>
> Thanks
>           Bibudh
>
> On Tue, Apr 26, 2016 at 7:51 PM, Davies Liu <dav...@databricks.com> wrote:
>
>> The Spark package you are using is packaged with Hadoop 2.6, but the
>> HDFS is Hadoop 1.0.4, they are not compatible.
>>
>> On Tue, Apr 26, 2016 at 11:18 AM, Bibudh Lahiri <bibudhlah...@gmail.com>
>> wrote:
>> > Hi,
>> >   I am trying to load a CSV file which is on HDFS. I have two machines:
>> > IMPETUS-1466 (172.26.49.156) and IMPETUS-1325 (172.26.49.55). Both have
>> > Spark 1.6.0 pre-built for Hadoop 2.6 and later, but for both, I had
>> existing
>> > Hadoop clusters running Hadoop 1.0.4. I have launched HDFS from
>> > 172.26.49.156 by running start-dfs.sh from it, copied files from local
>> file
>> > system to HDFS and can view them by hadoop fs -ls.
>> >
>> >   However, when I am trying to load the CSV file from pyspark shell
>> > (launched by bin/pyspark --packages com.databricks:spark-csv_2.10:1.3.0)
>> > from IMPETUS-1325 (172.26.49.55) with the following commands:
>> >
>> >
>> >>>from pyspark.sql import SQLContext
>> >
>> >>>sqlContext = SQLContext(sc)
>> >
>> >>>patients_df =
>> >>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> >>> "false").load("hdfs://
>> 172.26.49.156:54310/bibudh/healthcare/data/cloudera_challenge/patients.csv
>> ")
>> >
>> >
>> > I get the following error:
>> >
>> >
>> > java.io.EOFException: End of File Exception between local host is:
>> > "IMPETUS-1325.IMPETUS.CO.IN/172.26.49.55"; destination host is:
>> > "IMPETUS-1466":54310; : java.io.EOFException; For more details see:
>> > http://wiki.apache.org/hadoop/EOFException
>> >
>> >
>> > U have changed the port number from 54310 to 8020, but then I get the
>> error
>> >
>> >
>> > java.net.ConnectException: Call From
>> IMPETUS-1325.IMPETUS.CO.IN/172.26.49.55
>> > to IMPETUS-1466:8020 failed on connection exception:
>> > java.net.ConnectException: Connection refused; For more details see:
>> > http://wiki.apache.org/hadoop/ConnectionRefused
>> >
>> >
>> > To me it seemed like this may result from a version mismatch between
>> Spark
>> > Hadoop client and my Hadoop cluster, so I have made the following
>> changes:
>> >
>> >
>> > 1) Added the following lines to conf/spark-env.sh
>> >
>> >
>> > export HADOOP_HOME="/usr/local/hadoop-1.0.4" export
>> > HADOOP_CONF_DIR="$HADOOP_HOME/conf" export
>> > HDFS_URL="hdfs://172.26.49.156:8020"
>> >
>> >
>> > 2) Downloaded Spark 1.6.0, pre-built with user-provided Hadoop, and in
>> > addition to the three lines above, added the following line to
>> > conf/spark-env.sh
>> >
>> >
>> > export SPARK_DIST_CLASSPATH="/usr/local/hadoop-1.0.4/bin/hadoop"
>> >
>> >
>> > but none of it seems to work. However, the following command works from
>> > 172.26.49.55 and gives the directory listing:
>> >
>> > /usr/local/hadoop-1.0.4/bin/hadoop fs -ls hdfs://172.26.49.156:54310/
>> >
>> >
>> > Any suggestion?
>> >
>> >
>> > Thanks
>> >
>> > Bibudh
>> >
>> >
>> > --
>> > Bibudh Lahiri
>> > Data Scientist, Impetus Technolgoies
>> > 5300 Stevens Creek Blvd
>> > San Jose, CA 95129
>> > http://knowthynumbers.blogspot.com/
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> Bibudh Lahiri
> Senior Data Scientist, Impetus Technolgoies
> 720 University Avenue, Suite 130
> Los Gatos, CA 95129
> http://knowthynumbers.blogspot.com/
>
>



-- 
Thanks and Regards,

Saurav Sinha

Contact: 9742879062

Re: EOFException while reading from HDFS

Reply via email to