I have 20 nodes via EC2 and an application that reads the data via wholeTextFiles. I've tried to copy the data into hadoop via copyFromLocal, and I get
14/11/24 02:00:07 INFO hdfs.DFSClient: Exception in createBlockOutputStream 172.31.2.209:50010 java.io.IOException: Bad connect ack with firstBadLink as X:50010 14/11/24 02:00:07 INFO hdfs.DFSClient: Abandoning block blk_-8725559184260876712_2627 14/11/24 02:00:07 INFO hdfs.DFSClient: Excluding datanode X:50010 a lot. Then I went the file system route via copy-dir, which worked well. Now everything is under /root/txt on all nodes. I submitted the job with the "file:///root/txt/" directory for wholeTextFiles() and I get Exception in thread "main" java.io.FileNotFoundException: File does not exist: /root/txt/3521.txt The file exists on the root note and should be everywhere according to copy-dir. The hadoop variant worked fine with 3 nodes, but it starts bugging with 20. I added <property> <name>dfs.datanode.max.transfer.threads</name> <value>4096</value> </property> to hdfs-site.xml and core-site.xml, didn't help. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org