Re: java.io.FileNotFoundException when using HDFS in cluster mode
What happens when you do: sc.textFile(hdfs://path/to/the_file.txt) Thanks Best Regards On Mon, Mar 30, 2015 at 11:04 AM, Nick Travers n.e.trav...@gmail.com wrote: Hi List, I'm following this example here https://github.com/databricks/learning-spark/tree/master/mini-complete-example with the following: $SPARK_HOME/bin/spark-submit \ --deploy-mode cluster \ --master spark://host.domain.ex:7077 \ --class com.oreilly.learningsparkexamples.mini.scala.WordCount \ hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar \ hdfs://host.domain.ex/user/nickt/linkage hdfs://host.domain.ex/user/nickt/wordcounts The jar is submitted fine and I can see it appear on the driver node (i.e. connecting to and reading from HDFS ok): -rw-r--r-- 1 nickt nickt 15K Mar 29 22:05 learning-spark-mini-example_2.10-0.0.1.jar -rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr -rw-r--r-- 1 nickt nickt0 Mar 29 22:05 stdout But it's failing due to a java.io.FileNotFoundException saying my input file is missing: Caused by: java.io.FileNotFoundException: Added file file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage does not exist. I'm using sc.addFile(hdfs://path/to/the_file.txt) to propagate to all the workers and sc.textFile(SparkFiles(the_file.txt)) to return the path to the file on each of the hosts. Has anyone come up against this before when reading from HDFS? No doubt I'm doing something wrong. Full trace below: Launch Command: /usr/java/java8/bin/java -cp :/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar -Dakka.loglevel=WARNING -Dspark.driver.supervise=false -Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount -Dspark.akka.askTimeout=10 -Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar -Dspark.master=spark://host.domain.ex:7077 -Xms512M -Xmx512M org.apache.spark.deploy.worker.DriverWrapper akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker /home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar com.oreilly.learningsparkexamples.mini.scala.WordCount hdfs://host.domain.ex/user/nickt/linkage hdfs://host.domain.ex/user/nickt/wordcounts log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nickt); users with modify permissions: Set(nickt) 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started 15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port 44201. 15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker 15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nickt); users with modify permissions: Set(nickt) 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started 15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on port 33382. 15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker 15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster 15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at /tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666 15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker 15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1 MB 15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is /tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66 15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server 15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT 15/03/29 22:05:05 INFO AbstractConnector: Started SocketConnector@0.0.0.0:42484 15/03/29 22:05:05 INFO Utils: Successfully started service 'HTTP file server' on port 42484. 15/03/29 22:05:05 INFO SparkEnv: Registering OutputCommitCoordinator 15/03/29 22:05:06 INFO Server: jetty-8.y.z-SNAPSHOT 15/03/29 22:05:06 INFO AbstractConnector:
RE: java.io.FileNotFoundException when using HDFS in cluster mode
I think the jar file has to be local. In HDFS is not supported yet in Spark. See this answer: http://stackoverflow.com/questions/28739729/spark-submit-not-working-when-application-jar-is-in-hdfs Date: Sun, 29 Mar 2015 22:34:46 -0700 From: n.e.trav...@gmail.com To: user@spark.apache.org Subject: java.io.FileNotFoundException when using HDFS in cluster mode Hi List, I'm following this example here https://github.com/databricks/learning-spark/tree/master/mini-complete-example with the following: $SPARK_HOME/bin/spark-submit \ --deploy-mode cluster \ --master spark://host.domain.ex:7077 \ --class com.oreilly.learningsparkexamples.mini.scala.WordCount \ hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar \ hdfs://host.domain.ex/user/nickt/linkage hdfs://host.domain.ex/user/nickt/wordcounts The jar is submitted fine and I can see it appear on the driver node (i.e. connecting to and reading from HDFS ok): -rw-r--r-- 1 nickt nickt 15K Mar 29 22:05 learning-spark-mini-example_2.10-0.0.1.jar -rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr -rw-r--r-- 1 nickt nickt0 Mar 29 22:05 stdout But it's failing due to a java.io.FileNotFoundException saying my input file is missing: Caused by: java.io.FileNotFoundException: Added file file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage does not exist. I'm using sc.addFile(hdfs://path/to/the_file.txt) to propagate to all the workers and sc.textFile(SparkFiles(the_file.txt)) to return the path to the file on each of the hosts. Has anyone come up against this before when reading from HDFS? No doubt I'm doing something wrong. Full trace below: Launch Command: /usr/java/java8/bin/java -cp :/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar -Dakka.loglevel=WARNING -Dspark.driver.supervise=false -Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount -Dspark.akka.askTimeout=10 -Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar -Dspark.master=spark://host.domain.ex:7077 -Xms512M -Xmx512M org.apache.spark.deploy.worker.DriverWrapper akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker /home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar com.oreilly.learningsparkexamples.mini.scala.WordCount hdfs://host.domain.ex/user/nickt/linkage hdfs://host.domain.ex/user/nickt/wordcounts log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nickt); users with modify permissions: Set(nickt) 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started 15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port 44201. 15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker 15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nickt); users with modify permissions: Set(nickt) 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started 15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on port 33382. 15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker 15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster 15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at /tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666 15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker 15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1 MB 15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is /tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66 15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server 15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT 15/03/29 22:05:05 INFO AbstractConnector: Started SocketConnector@0.0.0.0:42484 15/03/29 22:05:05 INFO Utils: Successfully started service
Re: java.io.FileNotFoundException when using HDFS in cluster mode
Try running it like this: sudo -u hdfs spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master yarn hdfs:///user/spark/spark-examples-1.2.0-cdh5.3.2-hadoop2.5.0-cdh5.3.2.jar 10 Caveats: 1) Make sure the permissions of /user/nick is 775 or 777. 2) No need for hostname, try hdfs://path-to-jar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-FileNotFoundException-when-using-HDFS-in-cluster-mode-tp22287p22303.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
java.io.FileNotFoundException when using HDFS in cluster mode
Hi List, I'm following this example here https://github.com/databricks/learning-spark/tree/master/mini-complete-example with the following: $SPARK_HOME/bin/spark-submit \ --deploy-mode cluster \ --master spark://host.domain.ex:7077 \ --class com.oreilly.learningsparkexamples.mini.scala.WordCount \ hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar \ hdfs://host.domain.ex/user/nickt/linkage hdfs://host.domain.ex/user/nickt/wordcounts The jar is submitted fine and I can see it appear on the driver node (i.e. connecting to and reading from HDFS ok): -rw-r--r-- 1 nickt nickt 15K Mar 29 22:05 learning-spark-mini-example_2.10-0.0.1.jar -rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr -rw-r--r-- 1 nickt nickt0 Mar 29 22:05 stdout But it's failing due to a java.io.FileNotFoundException saying my input file is missing: Caused by: java.io.FileNotFoundException: Added file file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage does not exist. I'm using sc.addFile(hdfs://path/to/the_file.txt) to propagate to all the workers and sc.textFile(SparkFiles(the_file.txt)) to return the path to the file on each of the hosts. Has anyone come up against this before when reading from HDFS? No doubt I'm doing something wrong. Full trace below: Launch Command: /usr/java/java8/bin/java -cp :/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar -Dakka.loglevel=WARNING -Dspark.driver.supervise=false -Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount -Dspark.akka.askTimeout=10 -Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar -Dspark.master=spark://host.domain.ex:7077 -Xms512M -Xmx512M org.apache.spark.deploy.worker.DriverWrapper akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker /home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar com.oreilly.learningsparkexamples.mini.scala.WordCount hdfs://host.domain.ex/user/nickt/linkage hdfs://host.domain.ex/user/nickt/wordcounts log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nickt); users with modify permissions: Set(nickt) 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started 15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port 44201. 15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker 15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nickt); users with modify permissions: Set(nickt) 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started 15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on port 33382. 15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker 15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster 15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at /tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666 15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker 15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1 MB 15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is /tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66 15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server 15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT 15/03/29 22:05:05 INFO AbstractConnector: Started SocketConnector@0.0.0.0:42484 15/03/29 22:05:05 INFO Utils: Successfully started service 'HTTP file server' on port 42484. 15/03/29 22:05:05 INFO SparkEnv: Registering OutputCommitCoordinator 15/03/29 22:05:06 INFO Server: jetty-8.y.z-SNAPSHOT 15/03/29 22:05:06 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/03/29 22:05:06 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/03/29 22:05:06 INFO SparkUI: Started SparkUI at http://host5.domain.ex:4040 15/03/29 22:05:06 ERROR SparkContext: Jar not found at