Re: EOFException while reading from HDFS

2016-04-28 Thread Saurav Sinha
Are you able to connect to Name node UI on MACHINE_IP:50070.

Check what is URI there.

If UI does't open it means your hdfs is not up ,try to start it using
start.dfs.sh.



On Thu, Apr 28, 2016 at 2:59 AM, Bibudh Lahiri 
wrote:

> Hi,
>   I installed Hadoop 2.6.0 today on one of the machines (172.26.49.156),
> got HDFS running on it (both Namenode and Datanode on the same machine) and
> copied the files to HDFS. However, from the same machine, when I try to
> load the same CSV with the following statement:
>
>   sqlContext.read.format("com.databricks.spark.csv").option("header",
> "false").load("hdfs://
> 172.26.49.156:54310/bibudh/healthcare/data/cloudera_challenge/patients.csv
> ")
>
>  I get the error
>
> java.net.ConnectException: Call From impetus-i0276.impetus.co.in/127.0.0.1
> to impetus-i0276:54310 failed on connection exception:
> java.net.ConnectException: Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
>
>   I have changed the port number to 8020 but the same error gets reported.
>
>   Even the following command is not working from the command line, when
> launched from the HADOOP_HOME folder for
>
>   bin/hdfs dfs -ls hdfs://172.26.49.156:54310/
>
>   which was working earlier when issued from the other machine
> (172.26.49.55), from under HADOOP_HOME for Hadoop 1.0.4.
>
>   I set ~/.bashrc are as follows, when I installed Hadoop 2.6.0:
>
>   export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64
> export HADOOP_HOME=/usr/local/hadoop-2.6.0
> export HADOOP_INSTALL=$HADOOP_HOME
> export HADOOP_PREFIX=$HADOOP_HOME
> export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
> export HADOOP_COMMON_HOME=$HADOOP_PREFIX
> export HADOOP_HDFS_HOME=$HADOOP_PREFIX
> export YARN_HOME=$HADOOP_PREFIX
> export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
> export SPARK_HOME=/home/impadmin/spark-1.6.0-bin-hadoop2.6
>
> PATH=$PATH:$JAVA_HOME/bin:$HADOOP_PREFIX/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin
> export HADOOP_CONF_DIR=$HADOOP_HOME
> export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
> export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
> export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
>
>   Am I getting the port number wrong, or is it some other config param
> that I should check? What's the general rule here?
>
> Thanks
>   Bibudh
>
> On Tue, Apr 26, 2016 at 7:51 PM, Davies Liu  wrote:
>
>> The Spark package you are using is packaged with Hadoop 2.6, but the
>> HDFS is Hadoop 1.0.4, they are not compatible.
>>
>> On Tue, Apr 26, 2016 at 11:18 AM, Bibudh Lahiri 
>> wrote:
>> > Hi,
>> >   I am trying to load a CSV file which is on HDFS. I have two machines:
>> > IMPETUS-1466 (172.26.49.156) and IMPETUS-1325 (172.26.49.55). Both have
>> > Spark 1.6.0 pre-built for Hadoop 2.6 and later, but for both, I had
>> existing
>> > Hadoop clusters running Hadoop 1.0.4. I have launched HDFS from
>> > 172.26.49.156 by running start-dfs.sh from it, copied files from local
>> file
>> > system to HDFS and can view them by hadoop fs -ls.
>> >
>> >   However, when I am trying to load the CSV file from pyspark shell
>> > (launched by bin/pyspark --packages com.databricks:spark-csv_2.10:1.3.0)
>> > from IMPETUS-1325 (172.26.49.55) with the following commands:
>> >
>> >
>> >>>from pyspark.sql import SQLContext
>> >
>> >>>sqlContext = SQLContext(sc)
>> >
>> >>>patients_df =
>> >>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> >>> "false").load("hdfs://
>> 172.26.49.156:54310/bibudh/healthcare/data/cloudera_challenge/patients.csv
>> ")
>> >
>> >
>> > I get the following error:
>> >
>> >
>> > java.io.EOFException: End of File Exception between local host is:
>> > "IMPETUS-1325.IMPETUS.CO.IN/172.26.49.55"; destination host is:
>> > "IMPETUS-1466":54310; : java.io.EOFException; For more details see:
>> > http://wiki.apache.org/hadoop/EOFException
>> >
>> >
>> > U have changed the port number from 54310 to 8020, but then I get the
>> error
>> >
>> >
>> > java.net.ConnectException: Call From
>> IMPETUS-1325.IMPETUS.CO.IN/172.26.49.55
>> > to IMPETUS-1466:8020 failed on connection exception:
>> > java.net.ConnectException: Connection refused; For more details see:
>> > http://wiki.apache.org/hadoop/ConnectionRefused
>> >
>> >
>> > To me it seemed like this may result from a version mismatch between
>> Spark
>> > Hadoop client and my Hadoop cluster, so I have made the following
>> changes:
>> >
>> >
>> > 1) Added the following lines to conf/spark-env.sh
>> >
>> >
>> > export HADOOP_HOME="/usr/local/hadoop-1.0.4" export
>> > HADOOP_CONF_DIR="$HADOOP_HOME/conf" export
>> > HDFS_URL="hdfs://172.26.49.156:8020"
>> >
>> >
>> > 2) Downloaded Spark 1.6.0, pre-built with user-provided Hadoop, and in
>> > addition to the three lines above, added the following line to
>> > conf/spark-env.sh
>> >
>> >
>> > export SPARK_DIST_CLASSPATH="/usr/local/hadoop-1.0.4/bin/hadoop"
>> >
>> >
>> > but none of 

Re: EOFException while reading from HDFS

2016-04-27 Thread Bibudh Lahiri
Hi,
  I installed Hadoop 2.6.0 today on one of the machines (172.26.49.156),
got HDFS running on it (both Namenode and Datanode on the same machine) and
copied the files to HDFS. However, from the same machine, when I try to
load the same CSV with the following statement:

  sqlContext.read.format("com.databricks.spark.csv").option("header",
"false").load("hdfs://
172.26.49.156:54310/bibudh/healthcare/data/cloudera_challenge/patients.csv")

 I get the error

java.net.ConnectException: Call From impetus-i0276.impetus.co.in/127.0.0.1
to impetus-i0276:54310 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused

  I have changed the port number to 8020 but the same error gets reported.

  Even the following command is not working from the command line, when
launched from the HADOOP_HOME folder for

  bin/hdfs dfs -ls hdfs://172.26.49.156:54310/

  which was working earlier when issued from the other machine
(172.26.49.55), from under HADOOP_HOME for Hadoop 1.0.4.

  I set ~/.bashrc are as follows, when I installed Hadoop 2.6.0:

  export JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64
export HADOOP_HOME=/usr/local/hadoop-2.6.0
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export YARN_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export SPARK_HOME=/home/impadmin/spark-1.6.0-bin-hadoop2.6
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_PREFIX/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop

  Am I getting the port number wrong, or is it some other config param that
I should check? What's the general rule here?

Thanks
  Bibudh

On Tue, Apr 26, 2016 at 7:51 PM, Davies Liu  wrote:

> The Spark package you are using is packaged with Hadoop 2.6, but the
> HDFS is Hadoop 1.0.4, they are not compatible.
>
> On Tue, Apr 26, 2016 at 11:18 AM, Bibudh Lahiri 
> wrote:
> > Hi,
> >   I am trying to load a CSV file which is on HDFS. I have two machines:
> > IMPETUS-1466 (172.26.49.156) and IMPETUS-1325 (172.26.49.55). Both have
> > Spark 1.6.0 pre-built for Hadoop 2.6 and later, but for both, I had
> existing
> > Hadoop clusters running Hadoop 1.0.4. I have launched HDFS from
> > 172.26.49.156 by running start-dfs.sh from it, copied files from local
> file
> > system to HDFS and can view them by hadoop fs -ls.
> >
> >   However, when I am trying to load the CSV file from pyspark shell
> > (launched by bin/pyspark --packages com.databricks:spark-csv_2.10:1.3.0)
> > from IMPETUS-1325 (172.26.49.55) with the following commands:
> >
> >
> >>>from pyspark.sql import SQLContext
> >
> >>>sqlContext = SQLContext(sc)
> >
> >>>patients_df =
> >>> sqlContext.read.format("com.databricks.spark.csv").option("header",
> >>> "false").load("hdfs://
> 172.26.49.156:54310/bibudh/healthcare/data/cloudera_challenge/patients.csv
> ")
> >
> >
> > I get the following error:
> >
> >
> > java.io.EOFException: End of File Exception between local host is:
> > "IMPETUS-1325.IMPETUS.CO.IN/172.26.49.55"; destination host is:
> > "IMPETUS-1466":54310; : java.io.EOFException; For more details see:
> > http://wiki.apache.org/hadoop/EOFException
> >
> >
> > U have changed the port number from 54310 to 8020, but then I get the
> error
> >
> >
> > java.net.ConnectException: Call From
> IMPETUS-1325.IMPETUS.CO.IN/172.26.49.55
> > to IMPETUS-1466:8020 failed on connection exception:
> > java.net.ConnectException: Connection refused; For more details see:
> > http://wiki.apache.org/hadoop/ConnectionRefused
> >
> >
> > To me it seemed like this may result from a version mismatch between
> Spark
> > Hadoop client and my Hadoop cluster, so I have made the following
> changes:
> >
> >
> > 1) Added the following lines to conf/spark-env.sh
> >
> >
> > export HADOOP_HOME="/usr/local/hadoop-1.0.4" export
> > HADOOP_CONF_DIR="$HADOOP_HOME/conf" export
> > HDFS_URL="hdfs://172.26.49.156:8020"
> >
> >
> > 2) Downloaded Spark 1.6.0, pre-built with user-provided Hadoop, and in
> > addition to the three lines above, added the following line to
> > conf/spark-env.sh
> >
> >
> > export SPARK_DIST_CLASSPATH="/usr/local/hadoop-1.0.4/bin/hadoop"
> >
> >
> > but none of it seems to work. However, the following command works from
> > 172.26.49.55 and gives the directory listing:
> >
> > /usr/local/hadoop-1.0.4/bin/hadoop fs -ls hdfs://172.26.49.156:54310/
> >
> >
> > Any suggestion?
> >
> >
> > Thanks
> >
> > Bibudh
> >
> >
> > --
> > Bibudh Lahiri
> > Data Scientist, Impetus Technolgoies
> > 5300 Stevens Creek Blvd
> > San Jose, CA 95129
> > http://knowthynumbers.blogspot.com/
> >
>
> 

Re: EOFException while reading from HDFS

2016-04-26 Thread Davies Liu
The Spark package you are using is packaged with Hadoop 2.6, but the
HDFS is Hadoop 1.0.4, they are not compatible.

On Tue, Apr 26, 2016 at 11:18 AM, Bibudh Lahiri  wrote:
> Hi,
>   I am trying to load a CSV file which is on HDFS. I have two machines:
> IMPETUS-1466 (172.26.49.156) and IMPETUS-1325 (172.26.49.55). Both have
> Spark 1.6.0 pre-built for Hadoop 2.6 and later, but for both, I had existing
> Hadoop clusters running Hadoop 1.0.4. I have launched HDFS from
> 172.26.49.156 by running start-dfs.sh from it, copied files from local file
> system to HDFS and can view them by hadoop fs -ls.
>
>   However, when I am trying to load the CSV file from pyspark shell
> (launched by bin/pyspark --packages com.databricks:spark-csv_2.10:1.3.0)
> from IMPETUS-1325 (172.26.49.55) with the following commands:
>
>
>>>from pyspark.sql import SQLContext
>
>>>sqlContext = SQLContext(sc)
>
>>>patients_df =
>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>> "false").load("hdfs://172.26.49.156:54310/bibudh/healthcare/data/cloudera_challenge/patients.csv")
>
>
> I get the following error:
>
>
> java.io.EOFException: End of File Exception between local host is:
> "IMPETUS-1325.IMPETUS.CO.IN/172.26.49.55"; destination host is:
> "IMPETUS-1466":54310; : java.io.EOFException; For more details see:
> http://wiki.apache.org/hadoop/EOFException
>
>
> U have changed the port number from 54310 to 8020, but then I get the error
>
>
> java.net.ConnectException: Call From IMPETUS-1325.IMPETUS.CO.IN/172.26.49.55
> to IMPETUS-1466:8020 failed on connection exception:
> java.net.ConnectException: Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
>
>
> To me it seemed like this may result from a version mismatch between Spark
> Hadoop client and my Hadoop cluster, so I have made the following changes:
>
>
> 1) Added the following lines to conf/spark-env.sh
>
>
> export HADOOP_HOME="/usr/local/hadoop-1.0.4" export
> HADOOP_CONF_DIR="$HADOOP_HOME/conf" export
> HDFS_URL="hdfs://172.26.49.156:8020"
>
>
> 2) Downloaded Spark 1.6.0, pre-built with user-provided Hadoop, and in
> addition to the three lines above, added the following line to
> conf/spark-env.sh
>
>
> export SPARK_DIST_CLASSPATH="/usr/local/hadoop-1.0.4/bin/hadoop"
>
>
> but none of it seems to work. However, the following command works from
> 172.26.49.55 and gives the directory listing:
>
> /usr/local/hadoop-1.0.4/bin/hadoop fs -ls hdfs://172.26.49.156:54310/
>
>
> Any suggestion?
>
>
> Thanks
>
> Bibudh
>
>
> --
> Bibudh Lahiri
> Data Scientist, Impetus Technolgoies
> 5300 Stevens Creek Blvd
> San Jose, CA 95129
> http://knowthynumbers.blogspot.com/
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org