Hi, I am not able to read from HDFS(Intel distribution hadoop,Hadoop version is 1.0.3) from spark-shell(spark version is 1.2.1). I built spark using the commandmvn -Dhadoop.version=1.0.3 clean package and started spark-shell and read a HDFS file using sc.textFile() and the exception is WARN hdfs.DFSClient: Failed to connect to /10.88.6.133:50010, add to deadNodes and continuejava.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.88.6.131:44264 remote=/10.88.6.133:50010]
The same problem is asked in the this mail. RE: Spark is unable to read from HDFS | | | | | | | | | RE: Spark is unable to read from HDFSHi,Thanks for the reply. I've tried the below. | | | | View on mail-archives.us.apache.org | Preview by Yahoo | | | | | As suggested in the above mail,"In addition to specifying HADOOP_VERSION=1.0.3 in the ./project/SparkBuild.scala file, you will need to specify the libraryDependencies and name "spark-core" resolvers. Otherwise, sbt will fetch version 1.0.3 of hadoop-core from apache instead of Intel. You can set up your own local or remote repository that you specify" Now HADOOP_VERSION is deprecated and -Dhadoop.version should be used. Can anybody please elaborate on how to specify tat SBT should fetch hadoop-core from Intel which is in our internal repository? Thanks & Regards, Meethu M