Not sure how to solve this, but spotted these lines in the logs:

14/11/18 14:28:23 INFO YarnAllocationHandler: Container marked as
*failed*: container_1415961020140_0325_01_000002

14/11/18 14:28:38 INFO YarnAllocationHandler: Container marked as
*failed*: container_1415961020140_0325_01_000003

And the lines following it says its trying to allocate some space of
1408B but its failing to do so. You might want to look into that


On Tue, Nov 18, 2014 at 1:23 PM, LinCharlie <lin_q...@outlook.com> wrote:

> Hi All:
> I was submitting a spark_program.jar to `spark on yarn cluster` on a
> driver machine with yarn-client mode. Here is the spark-submit command I
> used:
>
> ./spark-submit --master yarn-client --class
> com.charlie.spark.grax.OldFollowersExample --queue dt_spark
> ~/script/spark-flume-test-0.1-SNAPSHOT-hadoop2.0.0-mr1-cdh4.2.1.jar
>
> The queue `dt_spark` was free, and the program was submitted succesfully
> and running on the cluster.  But on console, it showed repeatedly that:
>
> 14/11/18 15:11:48 WARN YarnClientClusterScheduler: Initial job has not
> accepted any resources; check your cluster UI to ensure that workers are
> registered and have sufficient memory
>
> Checked the cluster UI logs, I find no errors:
>
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/disk5/yarn/usercache/linqili/filecache/6957209742046754908/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/hadoop/hadoop-2.0.0-cdh4.2.1/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 14/11/18 14:28:16 INFO SecurityManager: Changing view acls to: hadoop,linqili
> 14/11/18 14:28:16 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(hadoop, linqili)
> 14/11/18 14:28:17 INFO Slf4jLogger: Slf4jLogger started
> 14/11/18 14:28:17 INFO Remoting: Starting remoting
> 14/11/18 14:28:17 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkyar...@longzhou-hdp3.lz.dscc:37187]
> 14/11/18 14:28:17 INFO Remoting: Remoting now listens on addresses: 
> [akka.tcp://sparkyar...@longzhou-hdp3.lz.dscc:37187]
> 14/11/18 14:28:17 INFO ExecutorLauncher: ApplicationAttemptId: 
> appattempt_1415961020140_0325_000001
> 14/11/18 14:28:17 INFO ExecutorLauncher: Connecting to ResourceManager at 
> longzhou-hdpnn.lz.dscc/192.168.19.107:12032
> 14/11/18 14:28:17 INFO ExecutorLauncher: Registering the ApplicationMaster
> 14/11/18 14:28:18 INFO ExecutorLauncher: Waiting for spark driver to be 
> reachable.
> 14/11/18 14:28:18 INFO ExecutorLauncher: Master now available: 
> 192.168.59.90:36691
> 14/11/18 14:28:18 INFO ExecutorLauncher: Listen to driver: 
> akka.tcp://spark@192.168.59.90:36691/user/CoarseGrainedScheduler
> 14/11/18 
> <http://spark@192.168.59.90:36691/user/CoarseGrainedScheduler14/11/18> 
> 14:28:18 INFO ExecutorLauncher: Allocating 1 executors.
> 14/11/18 14:28:18 INFO YarnAllocationHandler: Allocating 1 executor 
> containers with 1408 of memory each.
> 14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num 
> containers: 1, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:18 INFO YarnAllocationHandler: Allocating 1 executor 
> containers with 1408 of memory each.
> 14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num 
> containers: 1, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:18 INFO RackResolver: Resolved longzhou-hdp3.lz.dscc to /rack1
> 14/11/18 14:28:18 INFO YarnAllocationHandler: launching container on 
> container_1415961020140_0325_01_000002 host longzhou-hdp3.lz.dscc
> 14/11/18 14:28:18 INFO ExecutorRunnable: Starting Executor Container
> 14/11/18 14:28:18 INFO ExecutorRunnable: Connecting to ContainerManager at 
> longzhou-hdp3.lz.dscc:12040
> 14/11/18 14:28:18 INFO ExecutorRunnable: Setting up ContainerLaunchContext
> 14/11/18 14:28:18 INFO ExecutorRunnable: Preparing Local resources
> 14/11/18 14:28:18 INFO ExecutorLauncher: All executors have launched.
> 14/11/18 14:28:18 INFO ExecutorLauncher: Started progress reporter thread - 
> sleep time : 5000
> 14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num 
> containers: 0, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:18 INFO ExecutorRunnable: Prepared Local resources 
> Map(__spark__.jar -> resource {, scheme: "hdfs", host: 
> "longzhou-hdpnn.lz.dscc", port: 11000, file: 
> "/user/linqili/.sparkStaging/application_1415961020140_0325/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar",
>  }, size: 134859131, timestamp: 1416292093988, type: FILE, visibility: 
> PRIVATE, )
> 14/11/18 14:28:18 INFO ExecutorRunnable: Setting up executor with commands: 
> List($JAVA_HOME/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', 
> -Xms1024m -Xmx1024m , 
> -Djava.security.krb5.conf=/home/linqili/proc/spark_client/hadoop/kerberos5-client/etc/krb5.conf
>  
> -Djava.library.path=/home/linqili/proc/spark_client/hadoop/lib/native/Linux-amd64-64,
>  -Djava.io.tmpdir=$PWD/tmp,  
> -Dlog4j.configuration=log4j-spark-container.properties, 
> org.apache.spark.executor.CoarseGrainedExecutorBackend, 
> akka.tcp://spark@192.168.59.90:36691/user/CoarseGrainedScheduler, 1, 
> longzhou-hdp3.lz.dscc, 3, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
> 14/11/18 14:28:23 INFO YarnAllocationHandler: ResourceRequest (host : *, num 
> containers: 0, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:23 INFO YarnAllocationHandler: Completed container 
> container_1415961020140_0325_01_000002 (state: COMPLETE, exit status: 1)
> 14/11/18 14:28:23 INFO YarnAllocationHandler: Container marked as failed: 
> container_1415961020140_0325_01_000002
> 14/11/18 14:28:28 INFO ExecutorLauncher: Allocating 1 containers to make up 
> for (potentially ?) lost containers
> 14/11/18 14:28:28 INFO YarnAllocationHandler: Allocating 1 executor 
> containers with 1408 of memory each.
> 14/11/18 14:28:28 INFO YarnAllocationHandler: ResourceRequest (host : *, num 
> containers: 1, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:33 INFO ExecutorLauncher: Allocating 1 containers to make up 
> for (potentially ?) lost containers
> 14/11/18 14:28:33 INFO YarnAllocationHandler: Allocating 1 executor 
> containers with 1408 of memory each.
> 14/11/18 14:28:33 INFO YarnAllocationHandler: ResourceRequest (host : *, num 
> containers: 1, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:33 INFO RackResolver: Resolved longzhou-hdp2.lz.dscc to /rack1
> 14/11/18 14:28:33 INFO YarnAllocationHandler: launching container on 
> container_1415961020140_0325_01_000003 host longzhou-hdp2.lz.dscc
> 14/11/18 14:28:33 INFO ExecutorRunnable: Starting Executor Container
> 14/11/18 14:28:33 INFO ExecutorRunnable: Connecting to ContainerManager at 
> longzhou-hdp2.lz.dscc:12040
> 14/11/18 14:28:33 INFO ExecutorRunnable: Setting up ContainerLaunchContext
> 14/11/18 14:28:33 INFO ExecutorRunnable: Preparing Local resources
> 14/11/18 14:28:33 INFO ExecutorRunnable: Prepared Local resources 
> Map(__spark__.jar -> resource {, scheme: "hdfs", host: 
> "longzhou-hdpnn.lz.dscc", port: 11000, file: 
> "/user/linqili/.sparkStaging/application_1415961020140_0325/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar",
>  }, size: 134859131, timestamp: 1416292093988, type: FILE, visibility: 
> PRIVATE, )
> 14/11/18 14:28:33 INFO ExecutorRunnable: Setting up executor with commands: 
> List($JAVA_HOME/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', 
> -Xms1024m -Xmx1024m , 
> -Djava.security.krb5.conf=/home/linqili/proc/spark_client/hadoop/kerberos5-client/etc/krb5.conf
>  
> -Djava.library.path=/home/linqili/proc/spark_client/hadoop/lib/native/Linux-amd64-64,
>  -Djava.io.tmpdir=$PWD/tmp,  
> -Dlog4j.configuration=log4j-spark-container.properties, 
> org.apache.spark.executor.CoarseGrainedExecutorBackend, 
> akka.tcp://spark@192.168.59.90:36691/user/CoarseGrainedScheduler, 2, 
> longzhou-hdp2.lz.dscc, 3, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
> 14/11/18 14:28:38 INFO YarnAllocationHandler: ResourceRequest (host : *, num 
> containers: 0, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:38 INFO YarnAllocationHandler: Ignoring container 
> container_1415961020140_0325_01_000004 at host longzhou-hdp2.lz.dscc, since 
> we already have the required number of containers for it.
> 14/11/18 14:28:38 INFO YarnAllocationHandler: Completed container 
> container_1415961020140_0325_01_000003 (state: COMPLETE, exit status: 1)
> 14/11/18 14:28:38 INFO YarnAllocationHandler: Container marked as failed: 
> container_1415961020140_0325_01_000003
> 14/11/18 14:28:43 INFO ExecutorLauncher: Allocating 1 containers to make up 
> for (potentially ?) lost containers
> 14/11/18 14:28:43 INFO YarnAllocationHandler: Releasing 1 containers. 
> pendingReleaseContainers : {container_1415961020140_0325_01_000004=true}
> 14/11/18 14:28:43 INFO YarnAllocationHandler: Allocating 1 executor 
> containers with 1408 of memory each.
> 14/11/18 14:28:43 INFO YarnAllocationHandler: ResourceRequest (host : *, num 
> containers: 1, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:48 INFO ExecutorLauncher: Allocating 1 containers to make up 
> for (potentially ?) lost containers
> 14/11/18 14:28:48 INFO YarnAllocationHandler: Allocating 1 executor 
> containers with 1408 of memory each.
> 14/11/18 14:28:48 INFO YarnAllocationHandler: ResourceRequest (host : *, num 
> containers: 1, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:48 INFO YarnAllocationHandler: launching container on 
> container_1415961020140_0325_01_000005 host longzhou-hdp2.lz.dscc
> 14/11/18 14:28:48 INFO ExecutorRunnable: Starting Executor Container
> 14/11/18 14:28:48 INFO ExecutorRunnable: Connecting to ContainerManager at 
> longzhou-hdp2.lz.dscc:12040
> 14/11/18 14:28:48 INFO ExecutorRunnable: Setting up ContainerLaunchContext
> 14/11/18 14:28:48 INFO ExecutorRunnable: Preparing Local resources
> 14/11/18 14:28:48 INFO ExecutorRunnable: Prepared Local resources 
> Map(__spark__.jar -> resource {, scheme: "hdfs", host: 
> "longzhou-hdpnn.lz.dscc", port: 11000, file: 
> "/user/linqili/.sparkStaging/application_1415961020140_0325/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar",
>  }, size: 134859131, timestamp: 1416292093988, type: FILE, visibility: 
> PRIVATE, )
> 14/11/18 14:28:48 INFO ExecutorRunnable: Setting up executor with commands: 
> List($JAVA_HOME/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', 
> -Xms1024m -Xmx1024m , 
> -Djava.security.krb5.conf=/home/linqili/proc/spark_client/hadoop/kerberos5-client/etc/krb5.conf
>  
> -Djava.library.path=/home/linqili/proc/spark_client/hadoop/lib/native/Linux-amd64-64,
>  -Djava.io.tmpdir=$PWD/tmp,  
> -Dlog4j.configuration=log4j-spark-container.properties, 
> org.apache.spark.executor.CoarseGrainedExecutorBackend, 
> akka.tcp://spark@192.168.59.90:36691/user/CoarseGrainedScheduler, 3, 
> longzhou-hdp2.lz.dscc, 3, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
> 14/11/18 14:28:53 INFO YarnAllocationHandler: ResourceRequest (host : *, num 
> containers: 0, priority = 1 , capability : memory: 1408)
> 14/11/18 14:28:53 INFO YarnAllocationHandler: Ignoring container 
> container_1415961020140_0325_01_000006 at host longzhou-hdp2.lz.dscc, since 
> we already have the required number of containers for it.
>
>
> Is there any hint? Thanks.
>
>

Reply via email to