Re: spark on yarn is trying to use file:// instead of hdfs://
ok solved it. as it happened in spark/conf i also had a file called core.site.xml (with some tachyone related stuff in it) so thats why it ignored /etc/hadoop/conf/core-site.xml On Fri, Jun 20, 2014 at 3:24 PM, Koert Kuipers wrote: > i put some logging statements in yarn.Client and that confirms its using > local filesystem: > 14/06/20 15:20:33 INFO Client: fs.defaultFS is file:/// > > so somehow fs.defaultFS is not being picked up from > /etc/hadoop/conf/core-site.xml, but spark does correctly pick up > yarn.resourcemanager.hostname from /etc/hadoop/conf/yarn-site.xml > > strange! > > > On Fri, Jun 20, 2014 at 1:26 PM, Koert Kuipers wrote: > >> in /etc/hadoop/conf/core-site.xml: >> >> fs.defaultFS >> hdfs://cdh5-yarn.tresata.com:8020 >> >> >> >> also hdfs seems the default: >> [koert@cdh5-yarn ~]$ hadoop fs -ls / >> Found 5 items >> drwxr-xr-x - hdfs supergroup 0 2014-06-19 12:31 /data >> drwxrwxrwt - hdfs supergroup 0 2014-06-20 12:17 /lib >> drwxrwxrwt - hdfs supergroup 0 2014-06-18 14:58 /tmp >> drwxr-xr-x - hdfs supergroup 0 2014-06-18 15:02 /user >> drwxr-xr-x - hdfs supergroup 0 2014-06-18 14:59 /var >> >> and in my spark-site.env: >> export HADOOP_CONF_DIR=/etc/hadoop/conf >> >> >> >> On Fri, Jun 20, 2014 at 1:04 PM, bc Wong wrote: >> >>> Koert, is there any chance that your fs.defaultFS isn't setup right? >>> >>> >>> On Fri, Jun 20, 2014 at 9:57 AM, Koert Kuipers >>> wrote: >>> yeah sure see below. i strongly suspect its something i misconfigured causing yarn to try to use local filesystem mistakenly. * [koert@cdh5-yarn ~]$ /usr/local/lib/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --executor-cores 1 hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar 10 14/06/20 12:54:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/06/20 12:54:40 INFO RMProxy: Connecting to ResourceManager at cdh5-yarn.tresata.com/192.168.1.85:8032 14/06/20 12:54:41 INFO Client: Got Cluster metric info from ApplicationsManager (ASM), number of NodeManagers: 1 14/06/20 12:54:41 INFO Client: Queue info ... queueName: root.default, queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0, queueApplicationCount = 0, queueChildQueueCount = 0 14/06/20 12:54:41 INFO Client: Max mem capabililty of a single resource in this cluster 8192 14/06/20 12:54:41 INFO Client: Preparing Local resources 14/06/20 12:54:41 WARN BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 14/06/20 12:54:41 INFO Client: Uploading hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar to file:/home/koert/.sparkStaging/application_1403201750110_0060/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar 14/06/20 12:54:43 INFO Client: Setting up the launch environment 14/06/20 12:54:43 INFO Client: Setting up container launch context 14/06/20 12:54:43 INFO Client: Command for starting the Spark ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx512m, -Djava.io.tmpdir=$PWD/tmp, -Dspark.akka.retry.wait=\"3\", -Dspark.storage.blockManagerTimeoutIntervalMs=\"12\", -Dspark.storage.blockManagerHeartBeatMs=\"12\", -Dspark.app.name=\"org.apache.spark.examples.SparkPi\", -Dspark.akka.frameSize=\"1\", -Dspark.akka.timeout=\"3\", -Dspark.worker.timeout=\"3\", -Dspark.akka.logLifecycleEvents=\"true\", -Dlog4j.configuration=log4j-spark-container.properties, org.apache.spark.deploy.yarn.ApplicationMaster, --class, org.apache.spark.examples.SparkPi, --jar , hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar, --args '10' , --executor-memory, 1024, --executor-cores, 1, --num-executors , 3, 1>, /stdout, 2>, /stderr) 14/06/20 12:54:43 INFO Client: Submitting application to ASM 14/06/20 12:54:43 INFO YarnClientImpl: Submitted application application_1403201750110_0060 14/06/20 12:54:44 INFO Client: Application report from ASM: application identifier: application_1403201750110_0060 appId: 60 clientToAMToken: null appDiagnostics: appMasterHost: N/A appQueue: root.koert appMasterRpcPort: -1 appStartTime: 1403283283505 yarnAppState: ACCEPTED distributedFinalState: UNDEFINED appTrackingUrl: http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/ appUser: koert 14/06/20 12:54:45 INFO Client: Application report from ASM: application identifier: application_1403201750110_0060 appId: 60 clientToAMToken: null appDiagnostics:
Re: spark on yarn is trying to use file:// instead of hdfs://
i put some logging statements in yarn.Client and that confirms its using local filesystem: 14/06/20 15:20:33 INFO Client: fs.defaultFS is file:/// so somehow fs.defaultFS is not being picked up from /etc/hadoop/conf/core-site.xml, but spark does correctly pick up yarn.resourcemanager.hostname from /etc/hadoop/conf/yarn-site.xml strange! On Fri, Jun 20, 2014 at 1:26 PM, Koert Kuipers wrote: > in /etc/hadoop/conf/core-site.xml: > > fs.defaultFS > hdfs://cdh5-yarn.tresata.com:8020 > > > > also hdfs seems the default: > [koert@cdh5-yarn ~]$ hadoop fs -ls / > Found 5 items > drwxr-xr-x - hdfs supergroup 0 2014-06-19 12:31 /data > drwxrwxrwt - hdfs supergroup 0 2014-06-20 12:17 /lib > drwxrwxrwt - hdfs supergroup 0 2014-06-18 14:58 /tmp > drwxr-xr-x - hdfs supergroup 0 2014-06-18 15:02 /user > drwxr-xr-x - hdfs supergroup 0 2014-06-18 14:59 /var > > and in my spark-site.env: > export HADOOP_CONF_DIR=/etc/hadoop/conf > > > > On Fri, Jun 20, 2014 at 1:04 PM, bc Wong wrote: > >> Koert, is there any chance that your fs.defaultFS isn't setup right? >> >> >> On Fri, Jun 20, 2014 at 9:57 AM, Koert Kuipers wrote: >> >>> yeah sure see below. i strongly suspect its something i misconfigured >>> causing yarn to try to use local filesystem mistakenly. >>> >>> * >>> >>> [koert@cdh5-yarn ~]$ /usr/local/lib/spark/bin/spark-submit --class >>> org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 >>> --executor-cores 1 >>> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar 10 >>> 14/06/20 12:54:40 WARN NativeCodeLoader: Unable to load native-hadoop >>> library for your platform... using builtin-java classes where applicable >>> 14/06/20 12:54:40 INFO RMProxy: Connecting to ResourceManager at >>> cdh5-yarn.tresata.com/192.168.1.85:8032 >>> 14/06/20 12:54:41 INFO Client: Got Cluster metric info from >>> ApplicationsManager (ASM), number of NodeManagers: 1 >>> 14/06/20 12:54:41 INFO Client: Queue info ... queueName: root.default, >>> queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0, >>> queueApplicationCount = 0, queueChildQueueCount = 0 >>> 14/06/20 12:54:41 INFO Client: Max mem capabililty of a single resource >>> in this cluster 8192 >>> 14/06/20 12:54:41 INFO Client: Preparing Local resources >>> 14/06/20 12:54:41 WARN BlockReaderLocal: The short-circuit local reads >>> feature cannot be used because libhadoop cannot be loaded. >>> 14/06/20 12:54:41 INFO Client: Uploading >>> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar to >>> file:/home/koert/.sparkStaging/application_1403201750110_0060/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar >>> 14/06/20 12:54:43 INFO Client: Setting up the launch environment >>> 14/06/20 12:54:43 INFO Client: Setting up container launch context >>> 14/06/20 12:54:43 INFO Client: Command for starting the Spark >>> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx512m, >>> -Djava.io.tmpdir=$PWD/tmp, -Dspark.akka.retry.wait=\"3\", >>> -Dspark.storage.blockManagerTimeoutIntervalMs=\"12\", >>> -Dspark.storage.blockManagerHeartBeatMs=\"12\", >>> -Dspark.app.name=\"org.apache.spark.examples.SparkPi\", >>> -Dspark.akka.frameSize=\"1\", -Dspark.akka.timeout=\"3\", >>> -Dspark.worker.timeout=\"3\", >>> -Dspark.akka.logLifecycleEvents=\"true\", >>> -Dlog4j.configuration=log4j-spark-container.properties, >>> org.apache.spark.deploy.yarn.ApplicationMaster, --class, >>> org.apache.spark.examples.SparkPi, --jar , >>> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar, >>> --args '10' , --executor-memory, 1024, --executor-cores, 1, >>> --num-executors , 3, 1>, /stdout, 2>, /stderr) >>> 14/06/20 12:54:43 INFO Client: Submitting application to ASM >>> 14/06/20 12:54:43 INFO YarnClientImpl: Submitted application >>> application_1403201750110_0060 >>> 14/06/20 12:54:44 INFO Client: Application report from ASM: >>> application identifier: application_1403201750110_0060 >>> appId: 60 >>> clientToAMToken: null >>> appDiagnostics: >>> appMasterHost: N/A >>> appQueue: root.koert >>> appMasterRpcPort: -1 >>> appStartTime: 1403283283505 >>> yarnAppState: ACCEPTED >>> distributedFinalState: UNDEFINED >>> appTrackingUrl: >>> http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/ >>> appUser: koert >>> 14/06/20 12:54:45 INFO Client: Application report from ASM: >>> application identifier: application_1403201750110_0060 >>> appId: 60 >>> clientToAMToken: null >>> appDiagnostics: >>> appMasterHost: N/A >>> appQueue: root.koert >>> appMasterRpcPort: -1 >>> appStartTime: 1403283283505 >>> yarnAppState: ACCEPTED >>> distributedFinalState: UNDEFINED >>> appTrackingUrl: >>> http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/ >>> appUser: koert >>> 14/06/20 12:54:46 INFO Clie
Re: spark on yarn is trying to use file:// instead of hdfs://
in /etc/hadoop/conf/core-site.xml: fs.defaultFS hdfs://cdh5-yarn.tresata.com:8020 also hdfs seems the default: [koert@cdh5-yarn ~]$ hadoop fs -ls / Found 5 items drwxr-xr-x - hdfs supergroup 0 2014-06-19 12:31 /data drwxrwxrwt - hdfs supergroup 0 2014-06-20 12:17 /lib drwxrwxrwt - hdfs supergroup 0 2014-06-18 14:58 /tmp drwxr-xr-x - hdfs supergroup 0 2014-06-18 15:02 /user drwxr-xr-x - hdfs supergroup 0 2014-06-18 14:59 /var and in my spark-site.env: export HADOOP_CONF_DIR=/etc/hadoop/conf On Fri, Jun 20, 2014 at 1:04 PM, bc Wong wrote: > Koert, is there any chance that your fs.defaultFS isn't setup right? > > > On Fri, Jun 20, 2014 at 9:57 AM, Koert Kuipers wrote: > >> yeah sure see below. i strongly suspect its something i misconfigured >> causing yarn to try to use local filesystem mistakenly. >> >> * >> >> [koert@cdh5-yarn ~]$ /usr/local/lib/spark/bin/spark-submit --class >> org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 >> --executor-cores 1 >> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar 10 >> 14/06/20 12:54:40 WARN NativeCodeLoader: Unable to load native-hadoop >> library for your platform... using builtin-java classes where applicable >> 14/06/20 12:54:40 INFO RMProxy: Connecting to ResourceManager at >> cdh5-yarn.tresata.com/192.168.1.85:8032 >> 14/06/20 12:54:41 INFO Client: Got Cluster metric info from >> ApplicationsManager (ASM), number of NodeManagers: 1 >> 14/06/20 12:54:41 INFO Client: Queue info ... queueName: root.default, >> queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0, >> queueApplicationCount = 0, queueChildQueueCount = 0 >> 14/06/20 12:54:41 INFO Client: Max mem capabililty of a single resource >> in this cluster 8192 >> 14/06/20 12:54:41 INFO Client: Preparing Local resources >> 14/06/20 12:54:41 WARN BlockReaderLocal: The short-circuit local reads >> feature cannot be used because libhadoop cannot be loaded. >> 14/06/20 12:54:41 INFO Client: Uploading >> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar to >> file:/home/koert/.sparkStaging/application_1403201750110_0060/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar >> 14/06/20 12:54:43 INFO Client: Setting up the launch environment >> 14/06/20 12:54:43 INFO Client: Setting up container launch context >> 14/06/20 12:54:43 INFO Client: Command for starting the Spark >> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx512m, >> -Djava.io.tmpdir=$PWD/tmp, -Dspark.akka.retry.wait=\"3\", >> -Dspark.storage.blockManagerTimeoutIntervalMs=\"12\", >> -Dspark.storage.blockManagerHeartBeatMs=\"12\", >> -Dspark.app.name=\"org.apache.spark.examples.SparkPi\", >> -Dspark.akka.frameSize=\"1\", -Dspark.akka.timeout=\"3\", >> -Dspark.worker.timeout=\"3\", >> -Dspark.akka.logLifecycleEvents=\"true\", >> -Dlog4j.configuration=log4j-spark-container.properties, >> org.apache.spark.deploy.yarn.ApplicationMaster, --class, >> org.apache.spark.examples.SparkPi, --jar , >> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar, >> --args '10' , --executor-memory, 1024, --executor-cores, 1, >> --num-executors , 3, 1>, /stdout, 2>, /stderr) >> 14/06/20 12:54:43 INFO Client: Submitting application to ASM >> 14/06/20 12:54:43 INFO YarnClientImpl: Submitted application >> application_1403201750110_0060 >> 14/06/20 12:54:44 INFO Client: Application report from ASM: >> application identifier: application_1403201750110_0060 >> appId: 60 >> clientToAMToken: null >> appDiagnostics: >> appMasterHost: N/A >> appQueue: root.koert >> appMasterRpcPort: -1 >> appStartTime: 1403283283505 >> yarnAppState: ACCEPTED >> distributedFinalState: UNDEFINED >> appTrackingUrl: >> http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/ >> appUser: koert >> 14/06/20 12:54:45 INFO Client: Application report from ASM: >> application identifier: application_1403201750110_0060 >> appId: 60 >> clientToAMToken: null >> appDiagnostics: >> appMasterHost: N/A >> appQueue: root.koert >> appMasterRpcPort: -1 >> appStartTime: 1403283283505 >> yarnAppState: ACCEPTED >> distributedFinalState: UNDEFINED >> appTrackingUrl: >> http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/ >> appUser: koert >> 14/06/20 12:54:46 INFO Client: Application report from ASM: >> application identifier: application_1403201750110_0060 >> appId: 60 >> clientToAMToken: null >> appDiagnostics: >> appMasterHost: N/A >> appQueue: root.koert >> appMasterRpcPort: -1 >> appStartTime: 1403283283505 >> yarnAppState: ACCEPTED >> distributedFinalState: UNDEFINED >> appTrackingUrl: >> http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/ >> appUser: koert >> 14/06/20 12:54:47 INFO Client: A
Re: spark on yarn is trying to use file:// instead of hdfs://
Koert, is there any chance that your fs.defaultFS isn't setup right? On Fri, Jun 20, 2014 at 9:57 AM, Koert Kuipers wrote: > yeah sure see below. i strongly suspect its something i misconfigured > causing yarn to try to use local filesystem mistakenly. > > * > > [koert@cdh5-yarn ~]$ /usr/local/lib/spark/bin/spark-submit --class > org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 > --executor-cores 1 > hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar 10 > 14/06/20 12:54:40 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 14/06/20 12:54:40 INFO RMProxy: Connecting to ResourceManager at > cdh5-yarn.tresata.com/192.168.1.85:8032 > 14/06/20 12:54:41 INFO Client: Got Cluster metric info from > ApplicationsManager (ASM), number of NodeManagers: 1 > 14/06/20 12:54:41 INFO Client: Queue info ... queueName: root.default, > queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0, > queueApplicationCount = 0, queueChildQueueCount = 0 > 14/06/20 12:54:41 INFO Client: Max mem capabililty of a single resource in > this cluster 8192 > 14/06/20 12:54:41 INFO Client: Preparing Local resources > 14/06/20 12:54:41 WARN BlockReaderLocal: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > 14/06/20 12:54:41 INFO Client: Uploading > hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar to > file:/home/koert/.sparkStaging/application_1403201750110_0060/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar > 14/06/20 12:54:43 INFO Client: Setting up the launch environment > 14/06/20 12:54:43 INFO Client: Setting up container launch context > 14/06/20 12:54:43 INFO Client: Command for starting the Spark > ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx512m, > -Djava.io.tmpdir=$PWD/tmp, -Dspark.akka.retry.wait=\"3\", > -Dspark.storage.blockManagerTimeoutIntervalMs=\"12\", > -Dspark.storage.blockManagerHeartBeatMs=\"12\", > -Dspark.app.name=\"org.apache.spark.examples.SparkPi\", > -Dspark.akka.frameSize=\"1\", -Dspark.akka.timeout=\"3\", > -Dspark.worker.timeout=\"3\", > -Dspark.akka.logLifecycleEvents=\"true\", > -Dlog4j.configuration=log4j-spark-container.properties, > org.apache.spark.deploy.yarn.ApplicationMaster, --class, > org.apache.spark.examples.SparkPi, --jar , > hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar, > --args '10' , --executor-memory, 1024, --executor-cores, 1, > --num-executors , 3, 1>, /stdout, 2>, /stderr) > 14/06/20 12:54:43 INFO Client: Submitting application to ASM > 14/06/20 12:54:43 INFO YarnClientImpl: Submitted application > application_1403201750110_0060 > 14/06/20 12:54:44 INFO Client: Application report from ASM: > application identifier: application_1403201750110_0060 > appId: 60 > clientToAMToken: null > appDiagnostics: > appMasterHost: N/A > appQueue: root.koert > appMasterRpcPort: -1 > appStartTime: 1403283283505 > yarnAppState: ACCEPTED > distributedFinalState: UNDEFINED > appTrackingUrl: > http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/ > appUser: koert > 14/06/20 12:54:45 INFO Client: Application report from ASM: > application identifier: application_1403201750110_0060 > appId: 60 > clientToAMToken: null > appDiagnostics: > appMasterHost: N/A > appQueue: root.koert > appMasterRpcPort: -1 > appStartTime: 1403283283505 > yarnAppState: ACCEPTED > distributedFinalState: UNDEFINED > appTrackingUrl: > http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/ > appUser: koert > 14/06/20 12:54:46 INFO Client: Application report from ASM: > application identifier: application_1403201750110_0060 > appId: 60 > clientToAMToken: null > appDiagnostics: > appMasterHost: N/A > appQueue: root.koert > appMasterRpcPort: -1 > appStartTime: 1403283283505 > yarnAppState: ACCEPTED > distributedFinalState: UNDEFINED > appTrackingUrl: > http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/ > appUser: koert > 14/06/20 12:54:47 INFO Client: Application report from ASM: > application identifier: application_1403201750110_0060 > appId: 60 > clientToAMToken: null > appDiagnostics: Application application_1403201750110_0060 failed 2 > times due to AM Container for appattempt_1403201750110_0060_02 exited > with exitCode: -1000 due to: File > file:/home/koert/.sparkStaging/application_1403201750110_0060/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar > does not exist > .Failing this attempt.. Failing the application. > appMasterHost: N/A > appQueue: root.koert > appMasterRpcPort: -1 > appStartTime: 1403283283505 > yarnAppState: FAILED > distributedFinalState: FAILED > appTrackingUrl: > cdh5-ya
Re: spark on yarn is trying to use file:// instead of hdfs://
yeah sure see below. i strongly suspect its something i misconfigured causing yarn to try to use local filesystem mistakenly. * [koert@cdh5-yarn ~]$ /usr/local/lib/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --executor-cores 1 hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar 10 14/06/20 12:54:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/06/20 12:54:40 INFO RMProxy: Connecting to ResourceManager at cdh5-yarn.tresata.com/192.168.1.85:8032 14/06/20 12:54:41 INFO Client: Got Cluster metric info from ApplicationsManager (ASM), number of NodeManagers: 1 14/06/20 12:54:41 INFO Client: Queue info ... queueName: root.default, queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0, queueApplicationCount = 0, queueChildQueueCount = 0 14/06/20 12:54:41 INFO Client: Max mem capabililty of a single resource in this cluster 8192 14/06/20 12:54:41 INFO Client: Preparing Local resources 14/06/20 12:54:41 WARN BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 14/06/20 12:54:41 INFO Client: Uploading hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar to file:/home/koert/.sparkStaging/application_1403201750110_0060/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar 14/06/20 12:54:43 INFO Client: Setting up the launch environment 14/06/20 12:54:43 INFO Client: Setting up container launch context 14/06/20 12:54:43 INFO Client: Command for starting the Spark ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx512m, -Djava.io.tmpdir=$PWD/tmp, -Dspark.akka.retry.wait=\"3\", -Dspark.storage.blockManagerTimeoutIntervalMs=\"12\", -Dspark.storage.blockManagerHeartBeatMs=\"12\", -Dspark.app.name=\"org.apache.spark.examples.SparkPi\", -Dspark.akka.frameSize=\"1\", -Dspark.akka.timeout=\"3\", -Dspark.worker.timeout=\"3\", -Dspark.akka.logLifecycleEvents=\"true\", -Dlog4j.configuration=log4j-spark-container.properties, org.apache.spark.deploy.yarn.ApplicationMaster, --class, org.apache.spark.examples.SparkPi, --jar , hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar, --args '10' , --executor-memory, 1024, --executor-cores, 1, --num-executors , 3, 1>, /stdout, 2>, /stderr) 14/06/20 12:54:43 INFO Client: Submitting application to ASM 14/06/20 12:54:43 INFO YarnClientImpl: Submitted application application_1403201750110_0060 14/06/20 12:54:44 INFO Client: Application report from ASM: application identifier: application_1403201750110_0060 appId: 60 clientToAMToken: null appDiagnostics: appMasterHost: N/A appQueue: root.koert appMasterRpcPort: -1 appStartTime: 1403283283505 yarnAppState: ACCEPTED distributedFinalState: UNDEFINED appTrackingUrl: http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/ appUser: koert 14/06/20 12:54:45 INFO Client: Application report from ASM: application identifier: application_1403201750110_0060 appId: 60 clientToAMToken: null appDiagnostics: appMasterHost: N/A appQueue: root.koert appMasterRpcPort: -1 appStartTime: 1403283283505 yarnAppState: ACCEPTED distributedFinalState: UNDEFINED appTrackingUrl: http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/ appUser: koert 14/06/20 12:54:46 INFO Client: Application report from ASM: application identifier: application_1403201750110_0060 appId: 60 clientToAMToken: null appDiagnostics: appMasterHost: N/A appQueue: root.koert appMasterRpcPort: -1 appStartTime: 1403283283505 yarnAppState: ACCEPTED distributedFinalState: UNDEFINED appTrackingUrl: http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/ appUser: koert 14/06/20 12:54:47 INFO Client: Application report from ASM: application identifier: application_1403201750110_0060 appId: 60 clientToAMToken: null appDiagnostics: Application application_1403201750110_0060 failed 2 times due to AM Container for appattempt_1403201750110_0060_02 exited with exitCode: -1000 due to: File file:/home/koert/.sparkStaging/application_1403201750110_0060/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar does not exist .Failing this attempt.. Failing the application. appMasterHost: N/A appQueue: root.koert appMasterRpcPort: -1 appStartTime: 1403283283505 yarnAppState: FAILED distributedFinalState: FAILED appTrackingUrl: cdh5-yarn.tresata.com:8088/cluster/app/application_1403201750110_0060 appUser: koert On Fri, Jun 20, 2014 at 12:42 PM, Marcelo Vanzin wrote: > Hi Koert, > > Could you provide more details? Job arguments, log messages, errors, etc. > > On Fri, Jun 20, 2014 at 9:40 AM, Koert Kuipers wrote: > > i noticed that when i submit a job to ya
Re: spark on yarn is trying to use file:// instead of hdfs://
Hi Koert, Could you provide more details? Job arguments, log messages, errors, etc. On Fri, Jun 20, 2014 at 9:40 AM, Koert Kuipers wrote: > i noticed that when i submit a job to yarn it mistakenly tries to upload > files to local filesystem instead of hdfs. what could cause this? > > in spark-env.sh i have HADOOP_CONF_DIR set correctly (and spark-submit does > find yarn), and my core-site.xml has a fs.defaultFS that is hdfs, not local > filesystem. > > thanks! koert -- Marcelo