[jira] [Updated] (SPARK-19226) Report failure reason from Reporter Thread

2017-01-15 Thread Maheedhar Reddy Chappidi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maheedhar Reddy Chappidi updated SPARK-19226:
-
Description: 
With the exponential[1] increase in executor count the Reporter thread [2] 
fails without proper message.

==
17/01/12 09:33:44 INFO YarnAllocator: Driver requested a total number of 32767 
executor(s).
17/01/12 09:33:44 INFO YarnAllocator: Will request 24576 executor containers, 
each with 2 cores and 5632 MB memory including 512 MB overhead
17/01/12 09:33:44 INFO YarnAllocator: Canceled 0 container requests (locality 
no longer needed)
17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 34419 
executor(s).
17/01/12 09:33:52 INFO ApplicationMaster: Final app status: FAILED, exitCode: 
12, (reason: Exception was thrown 1 time(s) from Reporter thread.)
17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 34410 
executor(s).
17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 34409 
executor(s).
17/01/12 09:33:52 INFO ShutdownHookManager: Shutdown hook called
==

We were able to run the workflows by setting/limiting the maxExecutor count 
(spark.dynamicAllocation.maxExecutors) to avoid more requests(35k->65k).
Added I don't see any issues with ApplicationMaster's container memory/compute.
Is it possible to parse more ErrorReason from if/else?

[1]  
https://github.com/apache/spark/blob/6ee28423ad1b2e6089b82af64a31d77d3552bb38/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
[2] 
https://github.com/apache/spark/blob/01e14bf303e61a5726f3b1418357a50c1bf8b16f/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L446-L480

  was:
With the exponential[1] increase in executor count the Reporter thread [2] 
fails without proper message.

==
17/01/12 09:33:44 INFO YarnAllocator: Driver requested a total number of 32767 
executor(s).
17/01/12 09:33:44 INFO YarnAllocator: Will request 24576 executor containers, 
each with 2 cores and 5632 MB memory including 512 MB overhead
17/01/12 09:33:44 INFO YarnAllocator: Canceled 0 container requests (locality 
no longer needed)
17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 34419 
executor(s).
17/01/12 09:33:52 INFO ApplicationMaster: Final app status: FAILED, exitCode: 
12, (reason: Exception was thrown 1 time(s) from Reporter thread.)
17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 34410 
executor(s).
17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 34409 
executor(s).
17/01/12 09:33:52 INFO ShutdownHookManager: Shutdown hook called
==

We were able to run the workflows by setting/limiting the maxExecutor count 
(spark.dynamicAllocation.maxExecutors) to avoid more requests(35k->65k).
Added I don't see any issues with ApplicationMaster's container memory/compute.

[1]  
https://github.com/apache/spark/blob/6ee28423ad1b2e6089b82af64a31d77d3552bb38/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
[2] 
https://github.com/apache/spark/blob/01e14bf303e61a5726f3b1418357a50c1bf8b16f/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L446-L480


> Report failure reason from Reporter Thread 
> ---
>
> Key: SPARK-19226
> URL: https://issues.apache.org/jira/browse/SPARK-19226
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.0.2
> Environment: emr-5.2.1 with Zeppelin 0.6.2/Spark2.0.2 and 10 r3.xl 
> core nodes
>Reporter: Maheedhar Reddy Chappidi
>Priority: Minor
>
> With the exponential[1] increase in executor count the Reporter thread [2] 
> fails without proper message.
> ==
> 17/01/12 09:33:44 INFO YarnAllocator: Driver requested a total number of 
> 32767 executor(s).
> 17/01/12 09:33:44 INFO YarnAllocator: Will request 24576 executor containers, 
> each with 2 cores and 5632 MB memory including 512 MB overhead
> 17/01/12 09:33:44 INFO YarnAllocator: Canceled 0 container requests (locality 
> no longer needed)
> 17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 
> 34419 executor(s).
> 17/01/12 09:33:52 INFO ApplicationMaster: Final app status: FAILED, exitCode: 
> 12, (reason: Exception was thrown 1 time(s) from Reporter thread.)
> 17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 
> 34410 executor(s).
> 17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 
> 34409 executor(s).
> 17/01/12 09:33:52 INFO ShutdownHookManager: Shutdown hook called
> ==
> We were able to run the workflows by setting/limiting the maxExecutor count 
> (spark.dynamicAllocation.maxExecutors) to avoid more requests(35k->65k).
> Added I don't see any issues with ApplicationMaster's container 

[jira] [Created] (SPARK-19226) Report failure reason from Reporter Thread

2017-01-15 Thread Maheedhar Reddy Chappidi (JIRA)
Maheedhar Reddy Chappidi created SPARK-19226:


 Summary: Report failure reason from Reporter Thread 
 Key: SPARK-19226
 URL: https://issues.apache.org/jira/browse/SPARK-19226
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 2.0.2
 Environment: emr-5.2.1 with Zeppelin 0.6.2/Spark2.0.2 and 10 r3.xl 
core nodes
Reporter: Maheedhar Reddy Chappidi
Priority: Minor


With the exponential[1] increase in executor count the Reporter thread [2] 
fails without proper message.

==
17/01/12 09:33:44 INFO YarnAllocator: Driver requested a total number of 32767 
executor(s).
17/01/12 09:33:44 INFO YarnAllocator: Will request 24576 executor containers, 
each with 2 cores and 5632 MB memory including 512 MB overhead
17/01/12 09:33:44 INFO YarnAllocator: Canceled 0 container requests (locality 
no longer needed)
17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 34419 
executor(s).
17/01/12 09:33:52 INFO ApplicationMaster: Final app status: FAILED, exitCode: 
12, (reason: Exception was thrown 1 time(s) from Reporter thread.)
17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 34410 
executor(s).
17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 34409 
executor(s).
17/01/12 09:33:52 INFO ShutdownHookManager: Shutdown hook called
==

We were able to run the workflows by setting/limiting the maxExecutor count 
(spark.dynamicAllocation.maxExecutors) to avoid more requests(35k->65k).
Added I don't see any issues with ApplicationMaster's container memory/compute.

[1]  
https://github.com/apache/spark/blob/6ee28423ad1b2e6089b82af64a31d77d3552bb38/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
[2] 
https://github.com/apache/spark/blob/01e14bf303e61a5726f3b1418357a50c1bf8b16f/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L446-L480



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org