Re: java.io.FileNotFoundException when using HDFS in cluster mode

2015-03-30 Thread Akhil Das
What happens when you do:

sc.textFile(hdfs://path/to/the_file.txt)

Thanks
Best Regards

On Mon, Mar 30, 2015 at 11:04 AM, Nick Travers n.e.trav...@gmail.com
wrote:

 Hi List,

 I'm following this example  here
 
 https://github.com/databricks/learning-spark/tree/master/mini-complete-example
 
 with the following:

 $SPARK_HOME/bin/spark-submit \
   --deploy-mode cluster \
   --master spark://host.domain.ex:7077 \
   --class com.oreilly.learningsparkexamples.mini.scala.WordCount \

 hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar
 \
   hdfs://host.domain.ex/user/nickt/linkage
 hdfs://host.domain.ex/user/nickt/wordcounts

 The jar is submitted fine and I can see it appear on the driver node (i.e.
 connecting to and reading from HDFS ok):

 -rw-r--r-- 1 nickt nickt  15K Mar 29 22:05
 learning-spark-mini-example_2.10-0.0.1.jar
 -rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr
 -rw-r--r-- 1 nickt nickt0 Mar 29 22:05 stdout

 But it's failing due to a java.io.FileNotFoundException saying my input
 file
 is missing:

 Caused by: java.io.FileNotFoundException: Added file

 file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
 does not exist.

 I'm using sc.addFile(hdfs://path/to/the_file.txt) to propagate to all the
 workers and sc.textFile(SparkFiles(the_file.txt)) to return the path to
 the file on each of the hosts.

 Has anyone come up against this before when reading from HDFS? No doubt I'm
 doing something wrong.

 Full trace below:

 Launch Command: /usr/java/java8/bin/java -cp

 :/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar
 -Dakka.loglevel=WARNING -Dspark.driver.supervise=false
 -Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount
 -Dspark.akka.askTimeout=10

 -Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar
 -Dspark.master=spark://host.domain.ex:7077 -Xms512M -Xmx512M
 org.apache.spark.deploy.worker.DriverWrapper
 akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker

 /home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar
 com.oreilly.learningsparkexamples.mini.scala.WordCount
 hdfs://host.domain.ex/user/nickt/linkage
 hdfs://host.domain.ex/user/nickt/wordcounts
 

 log4j:WARN No appenders could be found for logger
 (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
 more info.
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(nickt); users
 with modify permissions: Set(nickt)
 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
 15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port
 44201.
 15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker
 akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker
 15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0
 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(nickt); users
 with modify permissions: Set(nickt)
 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
 15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on
 port 33382.
 15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker
 15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster
 15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at

 /tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666
 15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to
 akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker
 15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1
 MB
 15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is

 /tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66
 15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server
 15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT
 15/03/29 22:05:05 INFO AbstractConnector: Started
 SocketConnector@0.0.0.0:42484
 15/03/29 22:05:05 INFO Utils: Successfully started service 'HTTP file
 server' on port 42484.
 15/03/29 22:05:05 INFO SparkEnv: Registering OutputCommitCoordinator
 15/03/29 22:05:06 INFO Server: jetty-8.y.z-SNAPSHOT
 15/03/29 22:05:06 INFO AbstractConnector: 

RE: java.io.FileNotFoundException when using HDFS in cluster mode

2015-03-30 Thread java8964
I think the jar file has to be local. In HDFS is not supported yet in Spark.
See this answer:
http://stackoverflow.com/questions/28739729/spark-submit-not-working-when-application-jar-is-in-hdfs

 Date: Sun, 29 Mar 2015 22:34:46 -0700
 From: n.e.trav...@gmail.com
 To: user@spark.apache.org
 Subject: java.io.FileNotFoundException when using HDFS in cluster mode
 
 Hi List,
 
 I'm following this example  here
 https://github.com/databricks/learning-spark/tree/master/mini-complete-example
   
 with the following:
 
 $SPARK_HOME/bin/spark-submit \
   --deploy-mode cluster \
   --master spark://host.domain.ex:7077 \
   --class com.oreilly.learningsparkexamples.mini.scala.WordCount \
  
 hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar
 \
   hdfs://host.domain.ex/user/nickt/linkage
 hdfs://host.domain.ex/user/nickt/wordcounts
 
 The jar is submitted fine and I can see it appear on the driver node (i.e.
 connecting to and reading from HDFS ok):
 
 -rw-r--r-- 1 nickt nickt  15K Mar 29 22:05
 learning-spark-mini-example_2.10-0.0.1.jar
 -rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr
 -rw-r--r-- 1 nickt nickt0 Mar 29 22:05 stdout
 
 But it's failing due to a java.io.FileNotFoundException saying my input file
 is missing:
 
 Caused by: java.io.FileNotFoundException: Added file
 file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
 does not exist.
 
 I'm using sc.addFile(hdfs://path/to/the_file.txt) to propagate to all the
 workers and sc.textFile(SparkFiles(the_file.txt)) to return the path to
 the file on each of the hosts.
 
 Has anyone come up against this before when reading from HDFS? No doubt I'm
 doing something wrong.
 
 Full trace below:
 
 Launch Command: /usr/java/java8/bin/java -cp
 :/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar
 -Dakka.loglevel=WARNING -Dspark.driver.supervise=false
 -Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount
 -Dspark.akka.askTimeout=10
 -Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar
 -Dspark.master=spark://host.domain.ex:7077 -Xms512M -Xmx512M
 org.apache.spark.deploy.worker.DriverWrapper
 akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker
 /home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar
 com.oreilly.learningsparkexamples.mini.scala.WordCount
 hdfs://host.domain.ex/user/nickt/linkage
 hdfs://host.domain.ex/user/nickt/wordcounts
 
 
 log4j:WARN No appenders could be found for logger
 (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
 more info.
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(nickt); users
 with modify permissions: Set(nickt)
 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
 15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port
 44201.
 15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker
 akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker
 15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0
 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(nickt); users
 with modify permissions: Set(nickt)
 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
 15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on
 port 33382.
 15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker
 15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster
 15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at
 /tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666
 15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to
 akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker
 15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1
 MB
 15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is
 /tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66
 15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server
 15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT
 15/03/29 22:05:05 INFO AbstractConnector: Started
 SocketConnector@0.0.0.0:42484
 15/03/29 22:05:05 INFO Utils: Successfully started service

Re: java.io.FileNotFoundException when using HDFS in cluster mode

2015-03-30 Thread nsalian
Try running it like this:

sudo -u hdfs spark-submit --class org.apache.spark.examples.SparkPi
--deploy-mode cluster --master yarn
hdfs:///user/spark/spark-examples-1.2.0-cdh5.3.2-hadoop2.5.0-cdh5.3.2.jar 10


Caveats:
1) Make sure the permissions of /user/nick is 775 or 777.
2) No need for hostname, try hdfs://path-to-jar



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-FileNotFoundException-when-using-HDFS-in-cluster-mode-tp22287p22303.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



java.io.FileNotFoundException when using HDFS in cluster mode

2015-03-29 Thread Nick Travers
Hi List,

I'm following this example  here
https://github.com/databricks/learning-spark/tree/master/mini-complete-example
  
with the following:

$SPARK_HOME/bin/spark-submit \
  --deploy-mode cluster \
  --master spark://host.domain.ex:7077 \
  --class com.oreilly.learningsparkexamples.mini.scala.WordCount \
 
hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar
\
  hdfs://host.domain.ex/user/nickt/linkage
hdfs://host.domain.ex/user/nickt/wordcounts

The jar is submitted fine and I can see it appear on the driver node (i.e.
connecting to and reading from HDFS ok):

-rw-r--r-- 1 nickt nickt  15K Mar 29 22:05
learning-spark-mini-example_2.10-0.0.1.jar
-rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr
-rw-r--r-- 1 nickt nickt0 Mar 29 22:05 stdout

But it's failing due to a java.io.FileNotFoundException saying my input file
is missing:

Caused by: java.io.FileNotFoundException: Added file
file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
does not exist.

I'm using sc.addFile(hdfs://path/to/the_file.txt) to propagate to all the
workers and sc.textFile(SparkFiles(the_file.txt)) to return the path to
the file on each of the hosts.

Has anyone come up against this before when reading from HDFS? No doubt I'm
doing something wrong.

Full trace below:

Launch Command: /usr/java/java8/bin/java -cp
:/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar
-Dakka.loglevel=WARNING -Dspark.driver.supervise=false
-Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount
-Dspark.akka.askTimeout=10
-Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar
-Dspark.master=spark://host.domain.ex:7077 -Xms512M -Xmx512M
org.apache.spark.deploy.worker.DriverWrapper
akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker
/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar
com.oreilly.learningsparkexamples.mini.scala.WordCount
hdfs://host.domain.ex/user/nickt/linkage
hdfs://host.domain.ex/user/nickt/wordcounts


log4j:WARN No appenders could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(nickt); users
with modify permissions: Set(nickt)
15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port
44201.
15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker
akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker
15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0
15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(nickt); users
with modify permissions: Set(nickt)
15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on
port 33382.
15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker
15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster
15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at
/tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666
15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to
akka.tcp://sparkwor...@host5.domain.ex:40830/user/Worker
15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1
MB
15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66
15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server
15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/29 22:05:05 INFO AbstractConnector: Started
SocketConnector@0.0.0.0:42484
15/03/29 22:05:05 INFO Utils: Successfully started service 'HTTP file
server' on port 42484.
15/03/29 22:05:05 INFO SparkEnv: Registering OutputCommitCoordinator
15/03/29 22:05:06 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/29 22:05:06 INFO AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040
15/03/29 22:05:06 INFO Utils: Successfully started service 'SparkUI' on port
4040.
15/03/29 22:05:06 INFO SparkUI: Started SparkUI at
http://host5.domain.ex:4040
15/03/29 22:05:06 ERROR SparkContext: Jar not found at