wholeTextFiles on 20 nodes

Simon Hafner Sun, 23 Nov 2014 18:12:06 -0800

I have 20 nodes via EC2 and an application that reads the data via
wholeTextFiles. I've tried to copy the data into hadoop via
copyFromLocal, and I get


14/11/24 02:00:07 INFO hdfs.DFSClient: Exception in
createBlockOutputStream 172.31.2.209:50010 java.io.IOException: Bad
connect ack with firstBadLink as X:50010
14/11/24 02:00:07 INFO hdfs.DFSClient: Abandoning block
blk_-8725559184260876712_2627
14/11/24 02:00:07 INFO hdfs.DFSClient: Excluding datanode X:50010

a lot. Then I went the file system route via copy-dir, which worked
well. Now everything is under /root/txt on all nodes. I submitted the
job with the "file:///root/txt/" directory for wholeTextFiles() and I
get

Exception in thread "main" java.io.FileNotFoundException: File does
not exist: /root/txt/3521.txt

The file exists on the root note and should be everywhere according to
copy-dir. The hadoop variant worked fine with 3 nodes, but it starts
bugging with 20. I added

  <property>
    <name>dfs.datanode.max.transfer.threads</name>
    <value>4096</value>
  </property>

to hdfs-site.xml and core-site.xml, didn't help.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

wholeTextFiles on 20 nodes

Reply via email to