Re: spark on yarn is trying to use file:// instead of hdfs://

2014-06-20 Thread Koert Kuipers
ok solved it. as it happened in spark/conf i also had a file called
core.site.xml (with some tachyone related stuff in it) so thats why it
ignored /etc/hadoop/conf/core-site.xml




On Fri, Jun 20, 2014 at 3:24 PM, Koert Kuipers  wrote:

> i put some logging statements in yarn.Client and that confirms its using
> local filesystem:
> 14/06/20 15:20:33 INFO Client: fs.defaultFS is file:///
>
> so somehow fs.defaultFS is not being picked up from
> /etc/hadoop/conf/core-site.xml, but spark does correctly pick up
> yarn.resourcemanager.hostname from /etc/hadoop/conf/yarn-site.xml
>
> strange!
>
>
> On Fri, Jun 20, 2014 at 1:26 PM, Koert Kuipers  wrote:
>
>> in /etc/hadoop/conf/core-site.xml:
>>   
>> fs.defaultFS
>> hdfs://cdh5-yarn.tresata.com:8020
>>   
>>
>>
>> also hdfs seems the default:
>> [koert@cdh5-yarn ~]$ hadoop fs -ls /
>> Found 5 items
>> drwxr-xr-x   - hdfs supergroup  0 2014-06-19 12:31 /data
>> drwxrwxrwt   - hdfs supergroup  0 2014-06-20 12:17 /lib
>> drwxrwxrwt   - hdfs supergroup  0 2014-06-18 14:58 /tmp
>> drwxr-xr-x   - hdfs supergroup  0 2014-06-18 15:02 /user
>> drwxr-xr-x   - hdfs supergroup  0 2014-06-18 14:59 /var
>>
>> and in my spark-site.env:
>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>>
>>
>>
>> On Fri, Jun 20, 2014 at 1:04 PM, bc Wong  wrote:
>>
>>> Koert, is there any chance that your fs.defaultFS isn't setup right?
>>>
>>>
>>> On Fri, Jun 20, 2014 at 9:57 AM, Koert Kuipers 
>>> wrote:
>>>
  yeah sure see below. i strongly suspect its something i misconfigured
 causing yarn to try to use local filesystem mistakenly.

 *

 [koert@cdh5-yarn ~]$ /usr/local/lib/spark/bin/spark-submit --class
 org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3
 --executor-cores 1
 hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar 10
 14/06/20 12:54:40 WARN NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 14/06/20 12:54:40 INFO RMProxy: Connecting to ResourceManager at
 cdh5-yarn.tresata.com/192.168.1.85:8032
 14/06/20 12:54:41 INFO Client: Got Cluster metric info from
 ApplicationsManager (ASM), number of NodeManagers: 1
 14/06/20 12:54:41 INFO Client: Queue info ... queueName: root.default,
 queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0,
   queueApplicationCount = 0, queueChildQueueCount = 0
 14/06/20 12:54:41 INFO Client: Max mem capabililty of a single resource
 in this cluster 8192
 14/06/20 12:54:41 INFO Client: Preparing Local resources
 14/06/20 12:54:41 WARN BlockReaderLocal: The short-circuit local reads
 feature cannot be used because libhadoop cannot be loaded.
 14/06/20 12:54:41 INFO Client: Uploading
 hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar to
 file:/home/koert/.sparkStaging/application_1403201750110_0060/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar
 14/06/20 12:54:43 INFO Client: Setting up the launch environment
 14/06/20 12:54:43 INFO Client: Setting up container launch context
 14/06/20 12:54:43 INFO Client: Command for starting the Spark
 ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx512m,
 -Djava.io.tmpdir=$PWD/tmp, -Dspark.akka.retry.wait=\"3\",
 -Dspark.storage.blockManagerTimeoutIntervalMs=\"12\",
 -Dspark.storage.blockManagerHeartBeatMs=\"12\", 
 -Dspark.app.name=\"org.apache.spark.examples.SparkPi\",
 -Dspark.akka.frameSize=\"1\", -Dspark.akka.timeout=\"3\",
 -Dspark.worker.timeout=\"3\",
 -Dspark.akka.logLifecycleEvents=\"true\",
 -Dlog4j.configuration=log4j-spark-container.properties,
 org.apache.spark.deploy.yarn.ApplicationMaster, --class,
 org.apache.spark.examples.SparkPi, --jar ,
 hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar,
 --args  '10' , --executor-memory, 1024, --executor-cores, 1,
 --num-executors , 3, 1>, /stdout, 2>, /stderr)
 14/06/20 12:54:43 INFO Client: Submitting application to ASM
 14/06/20 12:54:43 INFO YarnClientImpl: Submitted application
 application_1403201750110_0060
 14/06/20 12:54:44 INFO Client: Application report from ASM:
  application identifier: application_1403201750110_0060
  appId: 60
  clientToAMToken: null
  appDiagnostics:
  appMasterHost: N/A
  appQueue: root.koert
  appMasterRpcPort: -1
  appStartTime: 1403283283505
  yarnAppState: ACCEPTED
  distributedFinalState: UNDEFINED
  appTrackingUrl:
 http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/
  appUser: koert
 14/06/20 12:54:45 INFO Client: Application report from ASM:
  application identifier: application_1403201750110_0060
  appId: 60
  clientToAMToken: null
  appDiagnostics:

Re: spark on yarn is trying to use file:// instead of hdfs://

2014-06-20 Thread Koert Kuipers
i put some logging statements in yarn.Client and that confirms its using
local filesystem:
14/06/20 15:20:33 INFO Client: fs.defaultFS is file:///

so somehow fs.defaultFS is not being picked up from
/etc/hadoop/conf/core-site.xml, but spark does correctly pick up
yarn.resourcemanager.hostname from /etc/hadoop/conf/yarn-site.xml

strange!


On Fri, Jun 20, 2014 at 1:26 PM, Koert Kuipers  wrote:

> in /etc/hadoop/conf/core-site.xml:
>   
> fs.defaultFS
> hdfs://cdh5-yarn.tresata.com:8020
>   
>
>
> also hdfs seems the default:
> [koert@cdh5-yarn ~]$ hadoop fs -ls /
> Found 5 items
> drwxr-xr-x   - hdfs supergroup  0 2014-06-19 12:31 /data
> drwxrwxrwt   - hdfs supergroup  0 2014-06-20 12:17 /lib
> drwxrwxrwt   - hdfs supergroup  0 2014-06-18 14:58 /tmp
> drwxr-xr-x   - hdfs supergroup  0 2014-06-18 15:02 /user
> drwxr-xr-x   - hdfs supergroup  0 2014-06-18 14:59 /var
>
> and in my spark-site.env:
> export HADOOP_CONF_DIR=/etc/hadoop/conf
>
>
>
> On Fri, Jun 20, 2014 at 1:04 PM, bc Wong  wrote:
>
>> Koert, is there any chance that your fs.defaultFS isn't setup right?
>>
>>
>> On Fri, Jun 20, 2014 at 9:57 AM, Koert Kuipers  wrote:
>>
>>>  yeah sure see below. i strongly suspect its something i misconfigured
>>> causing yarn to try to use local filesystem mistakenly.
>>>
>>> *
>>>
>>> [koert@cdh5-yarn ~]$ /usr/local/lib/spark/bin/spark-submit --class
>>> org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3
>>> --executor-cores 1
>>> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar 10
>>> 14/06/20 12:54:40 WARN NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>> 14/06/20 12:54:40 INFO RMProxy: Connecting to ResourceManager at
>>> cdh5-yarn.tresata.com/192.168.1.85:8032
>>> 14/06/20 12:54:41 INFO Client: Got Cluster metric info from
>>> ApplicationsManager (ASM), number of NodeManagers: 1
>>> 14/06/20 12:54:41 INFO Client: Queue info ... queueName: root.default,
>>> queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0,
>>>   queueApplicationCount = 0, queueChildQueueCount = 0
>>> 14/06/20 12:54:41 INFO Client: Max mem capabililty of a single resource
>>> in this cluster 8192
>>> 14/06/20 12:54:41 INFO Client: Preparing Local resources
>>> 14/06/20 12:54:41 WARN BlockReaderLocal: The short-circuit local reads
>>> feature cannot be used because libhadoop cannot be loaded.
>>> 14/06/20 12:54:41 INFO Client: Uploading
>>> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar to
>>> file:/home/koert/.sparkStaging/application_1403201750110_0060/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar
>>> 14/06/20 12:54:43 INFO Client: Setting up the launch environment
>>> 14/06/20 12:54:43 INFO Client: Setting up container launch context
>>> 14/06/20 12:54:43 INFO Client: Command for starting the Spark
>>> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx512m,
>>> -Djava.io.tmpdir=$PWD/tmp, -Dspark.akka.retry.wait=\"3\",
>>> -Dspark.storage.blockManagerTimeoutIntervalMs=\"12\",
>>> -Dspark.storage.blockManagerHeartBeatMs=\"12\", 
>>> -Dspark.app.name=\"org.apache.spark.examples.SparkPi\",
>>> -Dspark.akka.frameSize=\"1\", -Dspark.akka.timeout=\"3\",
>>> -Dspark.worker.timeout=\"3\",
>>> -Dspark.akka.logLifecycleEvents=\"true\",
>>> -Dlog4j.configuration=log4j-spark-container.properties,
>>> org.apache.spark.deploy.yarn.ApplicationMaster, --class,
>>> org.apache.spark.examples.SparkPi, --jar ,
>>> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar,
>>> --args  '10' , --executor-memory, 1024, --executor-cores, 1,
>>> --num-executors , 3, 1>, /stdout, 2>, /stderr)
>>> 14/06/20 12:54:43 INFO Client: Submitting application to ASM
>>> 14/06/20 12:54:43 INFO YarnClientImpl: Submitted application
>>> application_1403201750110_0060
>>> 14/06/20 12:54:44 INFO Client: Application report from ASM:
>>>  application identifier: application_1403201750110_0060
>>>  appId: 60
>>>  clientToAMToken: null
>>>  appDiagnostics:
>>>  appMasterHost: N/A
>>>  appQueue: root.koert
>>>  appMasterRpcPort: -1
>>>  appStartTime: 1403283283505
>>>  yarnAppState: ACCEPTED
>>>  distributedFinalState: UNDEFINED
>>>  appTrackingUrl:
>>> http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/
>>>  appUser: koert
>>> 14/06/20 12:54:45 INFO Client: Application report from ASM:
>>>  application identifier: application_1403201750110_0060
>>>  appId: 60
>>>  clientToAMToken: null
>>>  appDiagnostics:
>>>  appMasterHost: N/A
>>>  appQueue: root.koert
>>>  appMasterRpcPort: -1
>>>  appStartTime: 1403283283505
>>>  yarnAppState: ACCEPTED
>>>  distributedFinalState: UNDEFINED
>>>  appTrackingUrl:
>>> http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/
>>>  appUser: koert
>>> 14/06/20 12:54:46 INFO Clie

Re: spark on yarn is trying to use file:// instead of hdfs://

2014-06-20 Thread Koert Kuipers
in /etc/hadoop/conf/core-site.xml:
  
fs.defaultFS
hdfs://cdh5-yarn.tresata.com:8020
  


also hdfs seems the default:
[koert@cdh5-yarn ~]$ hadoop fs -ls /
Found 5 items
drwxr-xr-x   - hdfs supergroup  0 2014-06-19 12:31 /data
drwxrwxrwt   - hdfs supergroup  0 2014-06-20 12:17 /lib
drwxrwxrwt   - hdfs supergroup  0 2014-06-18 14:58 /tmp
drwxr-xr-x   - hdfs supergroup  0 2014-06-18 15:02 /user
drwxr-xr-x   - hdfs supergroup  0 2014-06-18 14:59 /var

and in my spark-site.env:
export HADOOP_CONF_DIR=/etc/hadoop/conf



On Fri, Jun 20, 2014 at 1:04 PM, bc Wong  wrote:

> Koert, is there any chance that your fs.defaultFS isn't setup right?
>
>
> On Fri, Jun 20, 2014 at 9:57 AM, Koert Kuipers  wrote:
>
>>  yeah sure see below. i strongly suspect its something i misconfigured
>> causing yarn to try to use local filesystem mistakenly.
>>
>> *
>>
>> [koert@cdh5-yarn ~]$ /usr/local/lib/spark/bin/spark-submit --class
>> org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3
>> --executor-cores 1
>> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar 10
>> 14/06/20 12:54:40 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 14/06/20 12:54:40 INFO RMProxy: Connecting to ResourceManager at
>> cdh5-yarn.tresata.com/192.168.1.85:8032
>> 14/06/20 12:54:41 INFO Client: Got Cluster metric info from
>> ApplicationsManager (ASM), number of NodeManagers: 1
>> 14/06/20 12:54:41 INFO Client: Queue info ... queueName: root.default,
>> queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0,
>>   queueApplicationCount = 0, queueChildQueueCount = 0
>> 14/06/20 12:54:41 INFO Client: Max mem capabililty of a single resource
>> in this cluster 8192
>> 14/06/20 12:54:41 INFO Client: Preparing Local resources
>> 14/06/20 12:54:41 WARN BlockReaderLocal: The short-circuit local reads
>> feature cannot be used because libhadoop cannot be loaded.
>> 14/06/20 12:54:41 INFO Client: Uploading
>> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar to
>> file:/home/koert/.sparkStaging/application_1403201750110_0060/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar
>> 14/06/20 12:54:43 INFO Client: Setting up the launch environment
>> 14/06/20 12:54:43 INFO Client: Setting up container launch context
>> 14/06/20 12:54:43 INFO Client: Command for starting the Spark
>> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx512m,
>> -Djava.io.tmpdir=$PWD/tmp, -Dspark.akka.retry.wait=\"3\",
>> -Dspark.storage.blockManagerTimeoutIntervalMs=\"12\",
>> -Dspark.storage.blockManagerHeartBeatMs=\"12\", 
>> -Dspark.app.name=\"org.apache.spark.examples.SparkPi\",
>> -Dspark.akka.frameSize=\"1\", -Dspark.akka.timeout=\"3\",
>> -Dspark.worker.timeout=\"3\",
>> -Dspark.akka.logLifecycleEvents=\"true\",
>> -Dlog4j.configuration=log4j-spark-container.properties,
>> org.apache.spark.deploy.yarn.ApplicationMaster, --class,
>> org.apache.spark.examples.SparkPi, --jar ,
>> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar,
>> --args  '10' , --executor-memory, 1024, --executor-cores, 1,
>> --num-executors , 3, 1>, /stdout, 2>, /stderr)
>> 14/06/20 12:54:43 INFO Client: Submitting application to ASM
>> 14/06/20 12:54:43 INFO YarnClientImpl: Submitted application
>> application_1403201750110_0060
>> 14/06/20 12:54:44 INFO Client: Application report from ASM:
>>  application identifier: application_1403201750110_0060
>>  appId: 60
>>  clientToAMToken: null
>>  appDiagnostics:
>>  appMasterHost: N/A
>>  appQueue: root.koert
>>  appMasterRpcPort: -1
>>  appStartTime: 1403283283505
>>  yarnAppState: ACCEPTED
>>  distributedFinalState: UNDEFINED
>>  appTrackingUrl:
>> http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/
>>  appUser: koert
>> 14/06/20 12:54:45 INFO Client: Application report from ASM:
>>  application identifier: application_1403201750110_0060
>>  appId: 60
>>  clientToAMToken: null
>>  appDiagnostics:
>>  appMasterHost: N/A
>>  appQueue: root.koert
>>  appMasterRpcPort: -1
>>  appStartTime: 1403283283505
>>  yarnAppState: ACCEPTED
>>  distributedFinalState: UNDEFINED
>>  appTrackingUrl:
>> http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/
>>  appUser: koert
>> 14/06/20 12:54:46 INFO Client: Application report from ASM:
>>  application identifier: application_1403201750110_0060
>>  appId: 60
>>  clientToAMToken: null
>>  appDiagnostics:
>>  appMasterHost: N/A
>>  appQueue: root.koert
>>  appMasterRpcPort: -1
>>  appStartTime: 1403283283505
>>  yarnAppState: ACCEPTED
>>  distributedFinalState: UNDEFINED
>>  appTrackingUrl:
>> http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/
>>  appUser: koert
>> 14/06/20 12:54:47 INFO Client: A

Re: spark on yarn is trying to use file:// instead of hdfs://

2014-06-20 Thread bc Wong
Koert, is there any chance that your fs.defaultFS isn't setup right?


On Fri, Jun 20, 2014 at 9:57 AM, Koert Kuipers  wrote:

>  yeah sure see below. i strongly suspect its something i misconfigured
> causing yarn to try to use local filesystem mistakenly.
>
> *
>
> [koert@cdh5-yarn ~]$ /usr/local/lib/spark/bin/spark-submit --class
> org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3
> --executor-cores 1
> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar 10
> 14/06/20 12:54:40 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/06/20 12:54:40 INFO RMProxy: Connecting to ResourceManager at
> cdh5-yarn.tresata.com/192.168.1.85:8032
> 14/06/20 12:54:41 INFO Client: Got Cluster metric info from
> ApplicationsManager (ASM), number of NodeManagers: 1
> 14/06/20 12:54:41 INFO Client: Queue info ... queueName: root.default,
> queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0,
>   queueApplicationCount = 0, queueChildQueueCount = 0
> 14/06/20 12:54:41 INFO Client: Max mem capabililty of a single resource in
> this cluster 8192
> 14/06/20 12:54:41 INFO Client: Preparing Local resources
> 14/06/20 12:54:41 WARN BlockReaderLocal: The short-circuit local reads
> feature cannot be used because libhadoop cannot be loaded.
> 14/06/20 12:54:41 INFO Client: Uploading
> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar to
> file:/home/koert/.sparkStaging/application_1403201750110_0060/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar
> 14/06/20 12:54:43 INFO Client: Setting up the launch environment
> 14/06/20 12:54:43 INFO Client: Setting up container launch context
> 14/06/20 12:54:43 INFO Client: Command for starting the Spark
> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx512m,
> -Djava.io.tmpdir=$PWD/tmp, -Dspark.akka.retry.wait=\"3\",
> -Dspark.storage.blockManagerTimeoutIntervalMs=\"12\",
> -Dspark.storage.blockManagerHeartBeatMs=\"12\", 
> -Dspark.app.name=\"org.apache.spark.examples.SparkPi\",
> -Dspark.akka.frameSize=\"1\", -Dspark.akka.timeout=\"3\",
> -Dspark.worker.timeout=\"3\",
> -Dspark.akka.logLifecycleEvents=\"true\",
> -Dlog4j.configuration=log4j-spark-container.properties,
> org.apache.spark.deploy.yarn.ApplicationMaster, --class,
> org.apache.spark.examples.SparkPi, --jar ,
> hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar,
> --args  '10' , --executor-memory, 1024, --executor-cores, 1,
> --num-executors , 3, 1>, /stdout, 2>, /stderr)
> 14/06/20 12:54:43 INFO Client: Submitting application to ASM
> 14/06/20 12:54:43 INFO YarnClientImpl: Submitted application
> application_1403201750110_0060
> 14/06/20 12:54:44 INFO Client: Application report from ASM:
>  application identifier: application_1403201750110_0060
>  appId: 60
>  clientToAMToken: null
>  appDiagnostics:
>  appMasterHost: N/A
>  appQueue: root.koert
>  appMasterRpcPort: -1
>  appStartTime: 1403283283505
>  yarnAppState: ACCEPTED
>  distributedFinalState: UNDEFINED
>  appTrackingUrl:
> http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/
>  appUser: koert
> 14/06/20 12:54:45 INFO Client: Application report from ASM:
>  application identifier: application_1403201750110_0060
>  appId: 60
>  clientToAMToken: null
>  appDiagnostics:
>  appMasterHost: N/A
>  appQueue: root.koert
>  appMasterRpcPort: -1
>  appStartTime: 1403283283505
>  yarnAppState: ACCEPTED
>  distributedFinalState: UNDEFINED
>  appTrackingUrl:
> http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/
>  appUser: koert
> 14/06/20 12:54:46 INFO Client: Application report from ASM:
>  application identifier: application_1403201750110_0060
>  appId: 60
>  clientToAMToken: null
>  appDiagnostics:
>  appMasterHost: N/A
>  appQueue: root.koert
>  appMasterRpcPort: -1
>  appStartTime: 1403283283505
>  yarnAppState: ACCEPTED
>  distributedFinalState: UNDEFINED
>  appTrackingUrl:
> http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/
>  appUser: koert
> 14/06/20 12:54:47 INFO Client: Application report from ASM:
>  application identifier: application_1403201750110_0060
>  appId: 60
>  clientToAMToken: null
>  appDiagnostics: Application application_1403201750110_0060 failed 2
> times due to AM Container for appattempt_1403201750110_0060_02 exited
> with  exitCode: -1000 due to: File
> file:/home/koert/.sparkStaging/application_1403201750110_0060/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar
> does not exist
> .Failing this attempt.. Failing the application.
>  appMasterHost: N/A
>  appQueue: root.koert
>  appMasterRpcPort: -1
>  appStartTime: 1403283283505
>  yarnAppState: FAILED
>  distributedFinalState: FAILED
>  appTrackingUrl:
> cdh5-ya

Re: spark on yarn is trying to use file:// instead of hdfs://

2014-06-20 Thread Koert Kuipers
 yeah sure see below. i strongly suspect its something i misconfigured
causing yarn to try to use local filesystem mistakenly.

*

[koert@cdh5-yarn ~]$ /usr/local/lib/spark/bin/spark-submit --class
org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3
--executor-cores 1
hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar 10
14/06/20 12:54:40 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/06/20 12:54:40 INFO RMProxy: Connecting to ResourceManager at
cdh5-yarn.tresata.com/192.168.1.85:8032
14/06/20 12:54:41 INFO Client: Got Cluster metric info from
ApplicationsManager (ASM), number of NodeManagers: 1
14/06/20 12:54:41 INFO Client: Queue info ... queueName: root.default,
queueCurrentCapacity: 0.0, queueMaxCapacity: -1.0,
  queueApplicationCount = 0, queueChildQueueCount = 0
14/06/20 12:54:41 INFO Client: Max mem capabililty of a single resource in
this cluster 8192
14/06/20 12:54:41 INFO Client: Preparing Local resources
14/06/20 12:54:41 WARN BlockReaderLocal: The short-circuit local reads
feature cannot be used because libhadoop cannot be loaded.
14/06/20 12:54:41 INFO Client: Uploading
hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar to
file:/home/koert/.sparkStaging/application_1403201750110_0060/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar
14/06/20 12:54:43 INFO Client: Setting up the launch environment
14/06/20 12:54:43 INFO Client: Setting up container launch context
14/06/20 12:54:43 INFO Client: Command for starting the Spark
ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx512m,
-Djava.io.tmpdir=$PWD/tmp, -Dspark.akka.retry.wait=\"3\",
-Dspark.storage.blockManagerTimeoutIntervalMs=\"12\",
-Dspark.storage.blockManagerHeartBeatMs=\"12\",
-Dspark.app.name=\"org.apache.spark.examples.SparkPi\",
-Dspark.akka.frameSize=\"1\", -Dspark.akka.timeout=\"3\",
-Dspark.worker.timeout=\"3\",
-Dspark.akka.logLifecycleEvents=\"true\",
-Dlog4j.configuration=log4j-spark-container.properties,
org.apache.spark.deploy.yarn.ApplicationMaster, --class,
org.apache.spark.examples.SparkPi, --jar ,
hdfs://cdh5-yarn/lib/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar,
--args  '10' , --executor-memory, 1024, --executor-cores, 1,
--num-executors , 3, 1>, /stdout, 2>, /stderr)
14/06/20 12:54:43 INFO Client: Submitting application to ASM
14/06/20 12:54:43 INFO YarnClientImpl: Submitted application
application_1403201750110_0060
14/06/20 12:54:44 INFO Client: Application report from ASM:
 application identifier: application_1403201750110_0060
 appId: 60
 clientToAMToken: null
 appDiagnostics:
 appMasterHost: N/A
 appQueue: root.koert
 appMasterRpcPort: -1
 appStartTime: 1403283283505
 yarnAppState: ACCEPTED
 distributedFinalState: UNDEFINED
 appTrackingUrl:
http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/
 appUser: koert
14/06/20 12:54:45 INFO Client: Application report from ASM:
 application identifier: application_1403201750110_0060
 appId: 60
 clientToAMToken: null
 appDiagnostics:
 appMasterHost: N/A
 appQueue: root.koert
 appMasterRpcPort: -1
 appStartTime: 1403283283505
 yarnAppState: ACCEPTED
 distributedFinalState: UNDEFINED
 appTrackingUrl:
http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/
 appUser: koert
14/06/20 12:54:46 INFO Client: Application report from ASM:
 application identifier: application_1403201750110_0060
 appId: 60
 clientToAMToken: null
 appDiagnostics:
 appMasterHost: N/A
 appQueue: root.koert
 appMasterRpcPort: -1
 appStartTime: 1403283283505
 yarnAppState: ACCEPTED
 distributedFinalState: UNDEFINED
 appTrackingUrl:
http://cdh5-yarn.tresata.com:8088/proxy/application_1403201750110_0060/
 appUser: koert
14/06/20 12:54:47 INFO Client: Application report from ASM:
 application identifier: application_1403201750110_0060
 appId: 60
 clientToAMToken: null
 appDiagnostics: Application application_1403201750110_0060 failed 2
times due to AM Container for appattempt_1403201750110_0060_02 exited
with  exitCode: -1000 due to: File
file:/home/koert/.sparkStaging/application_1403201750110_0060/spark-examples-1.0.0-hadoop2.3.0-cdh5.0.2.jar
does not exist
.Failing this attempt.. Failing the application.
 appMasterHost: N/A
 appQueue: root.koert
 appMasterRpcPort: -1
 appStartTime: 1403283283505
 yarnAppState: FAILED
 distributedFinalState: FAILED
 appTrackingUrl:
cdh5-yarn.tresata.com:8088/cluster/app/application_1403201750110_0060
 appUser: koert




On Fri, Jun 20, 2014 at 12:42 PM, Marcelo Vanzin 
wrote:

> Hi Koert,
>
> Could you provide more details? Job arguments, log messages, errors, etc.
>
> On Fri, Jun 20, 2014 at 9:40 AM, Koert Kuipers  wrote:
> > i noticed that when i submit a job to ya

Re: spark on yarn is trying to use file:// instead of hdfs://

2014-06-20 Thread Marcelo Vanzin
Hi Koert,

Could you provide more details? Job arguments, log messages, errors, etc.

On Fri, Jun 20, 2014 at 9:40 AM, Koert Kuipers  wrote:
> i noticed that when i submit a job to yarn it mistakenly tries to upload
> files to local filesystem instead of hdfs. what could cause this?
>
> in spark-env.sh i have HADOOP_CONF_DIR set correctly (and spark-submit does
> find yarn), and my core-site.xml has a fs.defaultFS that is hdfs, not local
> filesystem.
>
> thanks! koert



-- 
Marcelo