Re: Spark job fails because of timeout to Driver

2019-10-06 Thread Jochen Hebbrecht
Hi Roland,

I just tried what you've suggested and it actually helped me finding the
root cause. Once I had the default EMR cluster, I've submitted a Spark job
using the master instance (using the 'spark-submit' command on a terminal)
- and not use Livy to submit this job.
In this way, I had much more logging in the terminal and now the logging
actually indicated me what the timeout was causing. The timeout was related
to a service call in our company and this service call failed due to access
constraints.

Fixing those access constraints, made the Spark job succeed!

So conclusion: nothing related to Spark itself, but it's the Livy output
logging which was hiding the real error details.

Thank you all for help! :-)

Jochen

Op vr 4 okt. 2019 om 19:32 schreef Roland Johann :

> Hi Jochen,
>
> Can you crate a small EMR cluster wirh all defaults and rhn the job there?
> This way we can ensure that the issue is not infrastructure and YARN
> configuration related.
>
> Kind regards
>
> Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019
> um 19:27:
>
>> Hi Roland,
>>
>> I switched to the default security groups, ran my job again but the same
>> exception pops up :-( ...
>> All traffic is open on the security groups now.
>>
>> Jochen
>>
>> Op vr 4 okt. 2019 om 17:37 schreef Roland Johann <
>> roland.joh...@phenetic.io>:
>>
>>> This are dynamic port ranges and dependa on configuration of your
>>> cluster. Per job there is a separate application master so there can‘t be
>>> just one port.
>>> If I remeber correctly the default EMR setup creates worker security
>>> groups with unrestricted traffic within the group, e.g. Between the worker
>>> nodes.
>>> Depending on your security requirements I suggest that you start with a
>>>  default like setup and determine ports and port ranges from the docs
>>> afterwards to further restrict traffic between the nodes.
>>>
>>> Kind regards
>>>
>>> Jochen Hebbrecht  schrieb am Fr. 4. Okt.
>>> 2019 um 17:16:
>>>
 Hi Roland,

 We have indeed custom security groups. Can you tell me where exactly I
 need to be able to access what?
 For example, is it from the master instance to the driver instance? And
 which port should be open?

 Jochen

 Op vr 4 okt. 2019 om 17:14 schreef Roland Johann <
 roland.joh...@phenetic.io>:

> Ho Jochen,
>
> did you setup the EMR cluster with custom security groups? Can you
> confirm that the relevant EC2 instances can connect through relevant 
> ports?
>
> Best regards
>
> Jochen Hebbrecht  schrieb am Fr. 4. Okt.
> 2019 um 17:09:
>
>> Hi Jeff,
>>
>> Thanks! Just tried that, but the same timeout occurs :-( ...
>>
>> Jochen
>>
>> Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang :
>>
>>> You can try to increase property spark.yarn.am.waitTime (by default
>>> it is 100s)
>>> Maybe you are doing some very time consuming operation when
>>> initializing SparkContext, which cause timeout.
>>>
>>> See this property here
>>> http://spark.apache.org/docs/latest/running-on-yarn.html
>>>
>>>
>>> Jochen Hebbrecht  于2019年10月4日周五
>>> 下午10:08写道:
>>>
 Hi,

 I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark
 job towards the cluster. Thhe job gets accepted, but the YARN 
 application
 fails with:


 {code}
 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception:
 java.util.concurrent.TimeoutException: Futures timed out after
 [10 milliseconds]
 at
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
 at
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
 at
 org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
 at org.apache.spark.deploy.yarn.ApplicationMaster.org
 $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
 at
 

Re: Spark job fails because of timeout to Driver

2019-10-06 Thread Jochen Hebbrecht
Hi Igor,

No, it was not a memory issue - but thanks for your question. Could have
been a resources problem indeed :-)

Jochen

Op vr 4 okt. 2019 om 19:51 schreef igor cabral uchoa <
igorucho...@yahoo.com.br>:

> Maybe it is a basic question, but your cluster has enough resource to run
> your application? It is requesting 208G of RAM
>
> Thanks,
>
> Sent from Yahoo Mail for iPhone
> 
>
> On Friday, October 4, 2019, 2:31 PM, Jochen Hebbrecht <
> jochenhebbre...@gmail.com> wrote:
>
> Hi Igor,
>
> We are deploying by submitting a batch job on a Livy server (from our
> local PC or a Jenkins node). The Livy server then deploys the Spark job on
> the cluster itself.
>
> For example:
> ---
>
> Running '/usr/lib/spark/bin/spark-submit' '--class' '##MY_MAIN_CLASS##' 
> '--conf' 'spark.driver.userClassPathFirst=true' '--conf' 
> 'spark.default.parallelism=180' '--conf' 'spark.executor.memory=52g' '--conf' 
> 'spark.driver.memory=52g' '--conf' 'spark.yarn.tags=livy-batch-0-owjPBdmC' 
> '--conf' 'spark.executor.instances=3' '--conf' 
> 'spark.executor.memoryOverhead=6144' '--conf' 'spark.driver.cores=6' '--conf' 
> 'spark.driver.memoryOverhead=6144' '--conf' 
> 'spark.executor.extraJavaOptions=-XX:ThreadStackSize=2048 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 
> -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled 
> -XX:OnOutOfMemoryError=\'kill -9 %p\'' '--conf' 
> 'spark.executor.userClassPathFirst=true' '--conf' 
> 'spark.submit.deployMode=cluster' '--conf' 
> 'spark.yarn.submit.waitAppCompletion=false' '--conf' 
> 'spark.executor.extraClassPath=true' '-- ...
>
> ---
>
> Jochen
>
> Op vr 4 okt. 2019 om 17:42 schreef igor cabral uchoa <
> igorucho...@yahoo.com.br>:
>
> Hi Roland!
>
> What deploy mode are you using when you submit your applications? It is
> client or cluster mode?
>
> Regards,
>
>
> Sent from Yahoo Mail for iPhone
> 
>
> On Friday, October 4, 2019, 12:37 PM, Roland Johann
>  wrote:
>
> This are dynamic port ranges and dependa on configuration of your cluster.
> Per job there is a separate application master so there can‘t be just one
> port.
> If I remeber correctly the default EMR setup creates worker security
> groups with unrestricted traffic within the group, e.g. Between the worker
> nodes.
> Depending on your security requirements I suggest that you start with a
>  default like setup and determine ports and port ranges from the docs
> afterwards to further restrict traffic between the nodes.
>
> Kind regards
>
> Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019
> um 17:16:
>
> Hi Roland,
>
> We have indeed custom security groups. Can you tell me where exactly I
> need to be able to access what?
> For example, is it from the master instance to the driver instance? And
> which port should be open?
>
> Jochen
>
> Op vr 4 okt. 2019 om 17:14 schreef Roland Johann <
> roland.joh...@phenetic.io>:
>
> Ho Jochen,
>
> did you setup the EMR cluster with custom security groups? Can you confirm
> that the relevant EC2 instances can connect through relevant ports?
>
> Best regards
>
> Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019
> um 17:09:
>
> Hi Jeff,
>
> Thanks! Just tried that, but the same timeout occurs :-( ...
>
> Jochen
>
> Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang :
>
> You can try to increase property spark.yarn.am.waitTime (by default it is
> 100s)
> Maybe you are doing some very time consuming operation when initializing
> SparkContext, which cause timeout.
>
> See this property here
> http://spark.apache.org/docs/latest/running-on-yarn.html
>
>
> Jochen Hebbrecht  于2019年10月4日周五 下午10:08写道:
>
> Hi,
>
> I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job
> towards the cluster. Thhe job gets accepted, but the YARN application fails
> with:
>
>
> {code}
> 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception:
> java.util.concurrent.TimeoutException: Futures timed out after [10
> milliseconds]
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
> at org.apache.spark.deploy.yarn.ApplicationMaster.org
> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.ja

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread igor cabral uchoa
Maybe it is a basic question, but your cluster has enough resource to run your 
application? It is requesting 208G of RAM 
Thanks,

Sent from Yahoo Mail for iPhone


On Friday, October 4, 2019, 2:31 PM, Jochen Hebbrecht 
 wrote:

Hi Igor,
We are deploying by submitting a batch job on a Livy server (from our local PC 
or a Jenkins node). The Livy server then deploys the Spark job on the cluster 
itself.

For example:
---
Running '/usr/lib/spark/bin/spark-submit' '--class' '##MY_MAIN_CLASS##' 
'--conf' 'spark.driver.userClassPathFirst=true' '--conf' 
'spark.default.parallelism=180' '--conf' 'spark.executor.memory=52g' '--conf' 
'spark.driver.memory=52g' '--conf' 'spark.yarn.tags=livy-batch-0-owjPBdmC' 
'--conf' 'spark.executor.instances=3' '--conf' 
'spark.executor.memoryOverhead=6144' '--conf' 'spark.driver.cores=6' '--conf' 
'spark.driver.memoryOverhead=6144' '--conf' 
'spark.executor.extraJavaOptions=-XX:ThreadStackSize=2048 
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 
-XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled 
-XX:OnOutOfMemoryError=\'kill -9 %p\'' '--conf' 
'spark.executor.userClassPathFirst=true' '--conf' 
'spark.submit.deployMode=cluster' '--conf' 
'spark.yarn.submit.waitAppCompletion=false' '--conf' 
'spark.executor.extraClassPath=true' '-- ...---

Jochen
Op vr 4 okt. 2019 om 17:42 schreef igor cabral uchoa :

Hi Roland!
What deploy mode are you using when you submit your applications? It is client 
or cluster mode?
Regards,


Sent from Yahoo Mail for iPhone


On Friday, October 4, 2019, 12:37 PM, Roland Johann 
 wrote:

This are dynamic port ranges and dependa on configuration of your cluster. Per 
job there is a separate application master so there can‘t be just one port.If I 
remeber correctly the default EMR setup creates worker security groups with 
unrestricted traffic within the group, e.g. Between the worker nodes.Depending 
on your security requirements I suggest that you start with a  default like 
setup and determine ports and port ranges from the docs afterwards to further 
restrict traffic between the nodes.
Kind regards
Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019 um 
17:16:

Hi Roland,
We have indeed custom security groups. Can you tell me where exactly I need to 
be able to access what?
For example, is it from the master instance to the driver instance? And which 
port should be open?

Jochen
Op vr 4 okt. 2019 om 17:14 schreef Roland Johann :

Ho Jochen,
did you setup the EMR cluster with custom security groups? Can you confirm that 
the relevant EC2 instances can connect through relevant ports?
Best regards
Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019 um 
17:09:

Hi Jeff,
Thanks! Just tried that, but the same timeout occurs :-( ...

Jochen
Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang :

You can try to increase property spark.yarn.am.waitTime (by default it is 100s) 
 Maybe you are doing some very time consuming operation when initializing 
SparkContext, which cause timeout.
See this property here http://spark.apache.org/docs/latest/running-on-yarn.html

Jochen Hebbrecht  于2019年10月4日周五 下午10:08写道:


Hi,

I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job towards 
the cluster. Thhe job gets accepted, but the YARN application fails with:


{code}
19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception: 
java.util.concurrent.TimeoutException: Futures timed out after [10 
milliseconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
 at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
 at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED, exitCode: 
13, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures 
ti

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Jochen Hebbrecht
Hi Igor,

We are deploying by submitting a batch job on a Livy server (from our local
PC or a Jenkins node). The Livy server then deploys the Spark job on the
cluster itself.

For example:
---

Running '/usr/lib/spark/bin/spark-submit' '--class'
'##MY_MAIN_CLASS##' '--conf' 'spark.driver.userClassPathFirst=true'
'--conf' 'spark.default.parallelism=180' '--conf'
'spark.executor.memory=52g' '--conf' 'spark.driver.memory=52g'
'--conf' 'spark.yarn.tags=livy-batch-0-owjPBdmC' '--conf'
'spark.executor.instances=3' '--conf'
'spark.executor.memoryOverhead=6144' '--conf' 'spark.driver.cores=6'
'--conf' 'spark.driver.memoryOverhead=6144' '--conf'
'spark.executor.extraJavaOptions=-XX:ThreadStackSize=2048
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70
-XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled
-XX:OnOutOfMemoryError=\'kill -9 %p\'' '--conf'
'spark.executor.userClassPathFirst=true' '--conf'
'spark.submit.deployMode=cluster' '--conf'
'spark.yarn.submit.waitAppCompletion=false' '--conf'
'spark.executor.extraClassPath=true' '-- ...

---

Jochen

Op vr 4 okt. 2019 om 17:42 schreef igor cabral uchoa <
igorucho...@yahoo.com.br>:

> Hi Roland!
>
> What deploy mode are you using when you submit your applications? It is
> client or cluster mode?
>
> Regards,
>
>
> Sent from Yahoo Mail for iPhone
> 
>
> On Friday, October 4, 2019, 12:37 PM, Roland Johann
>  wrote:
>
> This are dynamic port ranges and dependa on configuration of your cluster.
> Per job there is a separate application master so there can‘t be just one
> port.
> If I remeber correctly the default EMR setup creates worker security
> groups with unrestricted traffic within the group, e.g. Between the worker
> nodes.
> Depending on your security requirements I suggest that you start with a
>  default like setup and determine ports and port ranges from the docs
> afterwards to further restrict traffic between the nodes.
>
> Kind regards
>
> Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019
> um 17:16:
>
> Hi Roland,
>
> We have indeed custom security groups. Can you tell me where exactly I
> need to be able to access what?
> For example, is it from the master instance to the driver instance? And
> which port should be open?
>
> Jochen
>
> Op vr 4 okt. 2019 om 17:14 schreef Roland Johann <
> roland.joh...@phenetic.io>:
>
> Ho Jochen,
>
> did you setup the EMR cluster with custom security groups? Can you confirm
> that the relevant EC2 instances can connect through relevant ports?
>
> Best regards
>
> Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019
> um 17:09:
>
> Hi Jeff,
>
> Thanks! Just tried that, but the same timeout occurs :-( ...
>
> Jochen
>
> Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang :
>
> You can try to increase property spark.yarn.am.waitTime (by default it is
> 100s)
> Maybe you are doing some very time consuming operation when initializing
> SparkContext, which cause timeout.
>
> See this property here
> http://spark.apache.org/docs/latest/running-on-yarn.html
>
>
> Jochen Hebbrecht  于2019年10月4日周五 下午10:08写道:
>
> Hi,
>
> I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job
> towards the cluster. Thhe job gets accepted, but the YARN application fails
> with:
>
>
> {code}
> 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception:
> java.util.concurrent.TimeoutException: Futures timed out after [10
> milliseconds]
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
> at org.apache.spark.deploy.yarn.ApplicationMaster.org
> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
> 19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED,
> exitCode: 13, (reason: Uncaught exception:
> java.util.concurrent.TimeoutEx

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Roland Johann
Hi Jochen,

Can you crate a small EMR cluster wirh all defaults and rhn the job there?
This way we can ensure that the issue is not infrastructure and YARN
configuration related.

Kind regards

Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019 um
19:27:

> Hi Roland,
>
> I switched to the default security groups, ran my job again but the same
> exception pops up :-( ...
> All traffic is open on the security groups now.
>
> Jochen
>
> Op vr 4 okt. 2019 om 17:37 schreef Roland Johann <
> roland.joh...@phenetic.io>:
>
>> This are dynamic port ranges and dependa on configuration of your
>> cluster. Per job there is a separate application master so there can‘t be
>> just one port.
>> If I remeber correctly the default EMR setup creates worker security
>> groups with unrestricted traffic within the group, e.g. Between the worker
>> nodes.
>> Depending on your security requirements I suggest that you start with a
>>  default like setup and determine ports and port ranges from the docs
>> afterwards to further restrict traffic between the nodes.
>>
>> Kind regards
>>
>> Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019
>> um 17:16:
>>
>>> Hi Roland,
>>>
>>> We have indeed custom security groups. Can you tell me where exactly I
>>> need to be able to access what?
>>> For example, is it from the master instance to the driver instance? And
>>> which port should be open?
>>>
>>> Jochen
>>>
>>> Op vr 4 okt. 2019 om 17:14 schreef Roland Johann <
>>> roland.joh...@phenetic.io>:
>>>
 Ho Jochen,

 did you setup the EMR cluster with custom security groups? Can you
 confirm that the relevant EC2 instances can connect through relevant ports?

 Best regards

 Jochen Hebbrecht  schrieb am Fr. 4. Okt.
 2019 um 17:09:

> Hi Jeff,
>
> Thanks! Just tried that, but the same timeout occurs :-( ...
>
> Jochen
>
> Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang :
>
>> You can try to increase property spark.yarn.am.waitTime (by default
>> it is 100s)
>> Maybe you are doing some very time consuming operation when
>> initializing SparkContext, which cause timeout.
>>
>> See this property here
>> http://spark.apache.org/docs/latest/running-on-yarn.html
>>
>>
>> Jochen Hebbrecht  于2019年10月4日周五 下午10:08写道:
>>
>>> Hi,
>>>
>>> I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark
>>> job towards the cluster. Thhe job gets accepted, but the YARN 
>>> application
>>> fails with:
>>>
>>>
>>> {code}
>>> 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception:
>>> java.util.concurrent.TimeoutException: Futures timed out after
>>> [10 milliseconds]
>>> at
>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>>> at
>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>>> at
>>> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
>>> at org.apache.spark.deploy.yarn.ApplicationMaster.org
>>> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
>>> 19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED,
>>> exitCode: 13, (reason: Uncaught exception:
>>> java.util.concurrent.TimeoutException: Futures timed out after [10
>>> milliseconds]
>>> at
>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>>> at
>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>>> at
>>> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
>>> at org.apache.spark.deploy.yarn.Applica

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Jochen Hebbrecht
Hi Roland,

I switched to the default security groups, ran my job again but the same
exception pops up :-( ...
All traffic is open on the security groups now.

Jochen

Op vr 4 okt. 2019 om 17:37 schreef Roland Johann :

> This are dynamic port ranges and dependa on configuration of your cluster.
> Per job there is a separate application master so there can‘t be just one
> port.
> If I remeber correctly the default EMR setup creates worker security
> groups with unrestricted traffic within the group, e.g. Between the worker
> nodes.
> Depending on your security requirements I suggest that you start with a
>  default like setup and determine ports and port ranges from the docs
> afterwards to further restrict traffic between the nodes.
>
> Kind regards
>
> Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019
> um 17:16:
>
>> Hi Roland,
>>
>> We have indeed custom security groups. Can you tell me where exactly I
>> need to be able to access what?
>> For example, is it from the master instance to the driver instance? And
>> which port should be open?
>>
>> Jochen
>>
>> Op vr 4 okt. 2019 om 17:14 schreef Roland Johann <
>> roland.joh...@phenetic.io>:
>>
>>> Ho Jochen,
>>>
>>> did you setup the EMR cluster with custom security groups? Can you
>>> confirm that the relevant EC2 instances can connect through relevant ports?
>>>
>>> Best regards
>>>
>>> Jochen Hebbrecht  schrieb am Fr. 4. Okt.
>>> 2019 um 17:09:
>>>
 Hi Jeff,

 Thanks! Just tried that, but the same timeout occurs :-( ...

 Jochen

 Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang :

> You can try to increase property spark.yarn.am.waitTime (by default
> it is 100s)
> Maybe you are doing some very time consuming operation when
> initializing SparkContext, which cause timeout.
>
> See this property here
> http://spark.apache.org/docs/latest/running-on-yarn.html
>
>
> Jochen Hebbrecht  于2019年10月4日周五 下午10:08写道:
>
>> Hi,
>>
>> I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark
>> job towards the cluster. Thhe job gets accepted, but the YARN application
>> fails with:
>>
>>
>> {code}
>> 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception:
>> java.util.concurrent.TimeoutException: Futures timed out after
>> [10 milliseconds]
>> at
>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>> at
>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>> at
>> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
>> at org.apache.spark.deploy.yarn.ApplicationMaster.org
>> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
>> 19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED,
>> exitCode: 13, (reason: Uncaught exception:
>> java.util.concurrent.TimeoutException: Futures timed out after [10
>> milliseconds]
>> at
>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>> at
>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>> at
>> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
>> at org.apache.spark.deploy.yarn.ApplicationMaster.org
>> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread igor cabral uchoa
Hi Roland!
What deploy mode are you using when you submit your applications? It is client 
or cluster mode?
Regards,


Sent from Yahoo Mail for iPhone


On Friday, October 4, 2019, 12:37 PM, Roland Johann 
 wrote:

This are dynamic port ranges and dependa on configuration of your cluster. Per 
job there is a separate application master so there can‘t be just one port.If I 
remeber correctly the default EMR setup creates worker security groups with 
unrestricted traffic within the group, e.g. Between the worker nodes.Depending 
on your security requirements I suggest that you start with a  default like 
setup and determine ports and port ranges from the docs afterwards to further 
restrict traffic between the nodes.
Kind regards
Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019 um 
17:16:

Hi Roland,
We have indeed custom security groups. Can you tell me where exactly I need to 
be able to access what?
For example, is it from the master instance to the driver instance? And which 
port should be open?

Jochen
Op vr 4 okt. 2019 om 17:14 schreef Roland Johann :

Ho Jochen,
did you setup the EMR cluster with custom security groups? Can you confirm that 
the relevant EC2 instances can connect through relevant ports?
Best regards
Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019 um 
17:09:

Hi Jeff,
Thanks! Just tried that, but the same timeout occurs :-( ...

Jochen
Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang :

You can try to increase property spark.yarn.am.waitTime (by default it is 100s) 
 Maybe you are doing some very time consuming operation when initializing 
SparkContext, which cause timeout.
See this property here http://spark.apache.org/docs/latest/running-on-yarn.html

Jochen Hebbrecht  于2019年10月4日周五 下午10:08写道:


Hi,

I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job towards 
the cluster. Thhe job gets accepted, but the YARN application fails with:


{code}
19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception: 
java.util.concurrent.TimeoutException: Futures timed out after [10 
milliseconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
 at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
 at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED, exitCode: 
13, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures 
timed out after [10 milliseconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
 at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
 at 
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
 at or

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Roland Johann
This are dynamic port ranges and dependa on configuration of your cluster.
Per job there is a separate application master so there can‘t be just one
port.
If I remeber correctly the default EMR setup creates worker security groups
with unrestricted traffic within the group, e.g. Between the worker nodes.
Depending on your security requirements I suggest that you start with a
 default like setup and determine ports and port ranges from the docs
afterwards to further restrict traffic between the nodes.

Kind regards

Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019 um
17:16:

> Hi Roland,
>
> We have indeed custom security groups. Can you tell me where exactly I
> need to be able to access what?
> For example, is it from the master instance to the driver instance? And
> which port should be open?
>
> Jochen
>
> Op vr 4 okt. 2019 om 17:14 schreef Roland Johann <
> roland.joh...@phenetic.io>:
>
>> Ho Jochen,
>>
>> did you setup the EMR cluster with custom security groups? Can you
>> confirm that the relevant EC2 instances can connect through relevant ports?
>>
>> Best regards
>>
>> Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019
>> um 17:09:
>>
>>> Hi Jeff,
>>>
>>> Thanks! Just tried that, but the same timeout occurs :-( ...
>>>
>>> Jochen
>>>
>>> Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang :
>>>
 You can try to increase property spark.yarn.am.waitTime (by default it
 is 100s)
 Maybe you are doing some very time consuming operation when
 initializing SparkContext, which cause timeout.

 See this property here
 http://spark.apache.org/docs/latest/running-on-yarn.html


 Jochen Hebbrecht  于2019年10月4日周五 下午10:08写道:

> Hi,
>
> I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark
> job towards the cluster. Thhe job gets accepted, but the YARN application
> fails with:
>
>
> {code}
> 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception:
> java.util.concurrent.TimeoutException: Futures timed out after [10
> milliseconds]
> at
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
> at
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
> at
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
> at org.apache.spark.deploy.yarn.ApplicationMaster.org
> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
> 19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED,
> exitCode: 13, (reason: Uncaught exception:
> java.util.concurrent.TimeoutException: Futures timed out after [10
> milliseconds]
> at
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
> at
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
> at
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
> at org.apache.spark.deploy.yarn.ApplicationMaster.org
> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInfo

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Jochen Hebbrecht
Hi Roland,

We have indeed custom security groups. Can you tell me where exactly I need
to be able to access what?
For example, is it from the master instance to the driver instance? And
which port should be open?

Jochen

Op vr 4 okt. 2019 om 17:14 schreef Roland Johann :

> Ho Jochen,
>
> did you setup the EMR cluster with custom security groups? Can you confirm
> that the relevant EC2 instances can connect through relevant ports?
>
> Best regards
>
> Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019
> um 17:09:
>
>> Hi Jeff,
>>
>> Thanks! Just tried that, but the same timeout occurs :-( ...
>>
>> Jochen
>>
>> Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang :
>>
>>> You can try to increase property spark.yarn.am.waitTime (by default it
>>> is 100s)
>>> Maybe you are doing some very time consuming operation when initializing
>>> SparkContext, which cause timeout.
>>>
>>> See this property here
>>> http://spark.apache.org/docs/latest/running-on-yarn.html
>>>
>>>
>>> Jochen Hebbrecht  于2019年10月4日周五 下午10:08写道:
>>>
 Hi,

 I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job
 towards the cluster. Thhe job gets accepted, but the YARN application fails
 with:


 {code}
 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception:
 java.util.concurrent.TimeoutException: Futures timed out after [10
 milliseconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
 at
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
 at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
 at org.apache.spark.deploy.yarn.ApplicationMaster.org
 $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
 19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED,
 exitCode: 13, (reason: Uncaught exception:
 java.util.concurrent.TimeoutException: Futures timed out after [10
 milliseconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
 at
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
 at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
 at org.apache.spark.deploy.yarn.ApplicationMaster.org
 $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
 at
 org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
 {code}

 It actually goes wrong at this line:
 https://github.com/apache/spark/blob/v2.4.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L468

 Now, I'm 100% sure Spark is OK and there's no bug, but there must be
 something wrong with my setup. I don't understand the code 

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Roland Johann
Ho Jochen,

did you setup the EMR cluster with custom security groups? Can you confirm
that the relevant EC2 instances can connect through relevant ports?

Best regards

Jochen Hebbrecht  schrieb am Fr. 4. Okt. 2019 um
17:09:

> Hi Jeff,
>
> Thanks! Just tried that, but the same timeout occurs :-( ...
>
> Jochen
>
> Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang :
>
>> You can try to increase property spark.yarn.am.waitTime (by default it
>> is 100s)
>> Maybe you are doing some very time consuming operation when initializing
>> SparkContext, which cause timeout.
>>
>> See this property here
>> http://spark.apache.org/docs/latest/running-on-yarn.html
>>
>>
>> Jochen Hebbrecht  于2019年10月4日周五 下午10:08写道:
>>
>>> Hi,
>>>
>>> I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job
>>> towards the cluster. Thhe job gets accepted, but the YARN application fails
>>> with:
>>>
>>>
>>> {code}
>>> 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception:
>>> java.util.concurrent.TimeoutException: Futures timed out after [10
>>> milliseconds]
>>> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>>> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>>> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
>>> at org.apache.spark.deploy.yarn.ApplicationMaster.org
>>> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
>>> 19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED,
>>> exitCode: 13, (reason: Uncaught exception:
>>> java.util.concurrent.TimeoutException: Futures timed out after [10
>>> milliseconds]
>>> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>>> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>>> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
>>> at org.apache.spark.deploy.yarn.ApplicationMaster.org
>>> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>> at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
>>> at
>>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
>>> {code}
>>>
>>> It actually goes wrong at this line:
>>> https://github.com/apache/spark/blob/v2.4.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L468
>>>
>>> Now, I'm 100% sure Spark is OK and there's no bug, but there must be
>>> something wrong with my setup. I don't understand the code of the
>>> ApplicationMaster, so could somebody explain me what it is trying to reach?
>>> Where exactly does the connection timeout? So at least I can debug it
>>> further because I don't have a clue what it is doing :-)
>>>
>>> Thanks for any help!
>>> Jochen
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
> --


*Roland Johann*Software Developer/Data Engineer

*phenetic GmbH*
Lütticher 

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Jochen Hebbrecht
Hi Jeff,

Thanks! Just tried that, but the same timeout occurs :-( ...

Jochen

Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang :

> You can try to increase property spark.yarn.am.waitTime (by default it is
> 100s)
> Maybe you are doing some very time consuming operation when initializing
> SparkContext, which cause timeout.
>
> See this property here
> http://spark.apache.org/docs/latest/running-on-yarn.html
>
>
> Jochen Hebbrecht  于2019年10月4日周五 下午10:08写道:
>
>> Hi,
>>
>> I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job
>> towards the cluster. Thhe job gets accepted, but the YARN application fails
>> with:
>>
>>
>> {code}
>> 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception:
>> java.util.concurrent.TimeoutException: Futures timed out after [10
>> milliseconds]
>> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
>> at org.apache.spark.deploy.yarn.ApplicationMaster.org
>> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
>> 19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED,
>> exitCode: 13, (reason: Uncaught exception:
>> java.util.concurrent.TimeoutException: Futures timed out after [10
>> milliseconds]
>> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
>> at org.apache.spark.deploy.yarn.ApplicationMaster.org
>> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
>> at
>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
>> {code}
>>
>> It actually goes wrong at this line:
>> https://github.com/apache/spark/blob/v2.4.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L468
>>
>> Now, I'm 100% sure Spark is OK and there's no bug, but there must be
>> something wrong with my setup. I don't understand the code of the
>> ApplicationMaster, so could somebody explain me what it is trying to reach?
>> Where exactly does the connection timeout? So at least I can debug it
>> further because I don't have a clue what it is doing :-)
>>
>> Thanks for any help!
>> Jochen
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Jeff Zhang
You can try to increase property spark.yarn.am.waitTime (by default it is
100s)
Maybe you are doing some very time consuming operation when initializing
SparkContext, which cause timeout.

See this property here
http://spark.apache.org/docs/latest/running-on-yarn.html


Jochen Hebbrecht  于2019年10月4日周五 下午10:08写道:

> Hi,
>
> I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job
> towards the cluster. Thhe job gets accepted, but the YARN application fails
> with:
>
>
> {code}
> 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception:
> java.util.concurrent.TimeoutException: Futures timed out after [10
> milliseconds]
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
> at org.apache.spark.deploy.yarn.ApplicationMaster.org
> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
> 19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED,
> exitCode: 13, (reason: Uncaught exception:
> java.util.concurrent.TimeoutException: Futures timed out after [10
> milliseconds]
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
> at org.apache.spark.deploy.yarn.ApplicationMaster.org
> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
> {code}
>
> It actually goes wrong at this line:
> https://github.com/apache/spark/blob/v2.4.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L468
>
> Now, I'm 100% sure Spark is OK and there's no bug, but there must be
> something wrong with my setup. I don't understand the code of the
> ApplicationMaster, so could somebody explain me what it is trying to reach?
> Where exactly does the connection timeout? So at least I can debug it
> further because I don't have a clue what it is doing :-)
>
> Thanks for any help!
> Jochen
>


-- 
Best Regards

Jeff Zhang


Spark job fails because of timeout to Driver

2019-10-04 Thread Jochen Hebbrecht
Hi,

I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job
towards the cluster. Thhe job gets accepted, but the YARN application fails
with:


{code}
19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [10
milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
at
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
at org.apache.spark.deploy.yarn.ApplicationMaster.org
$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at
org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED,
exitCode: 13, (reason: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [10
milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
at
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
at org.apache.spark.deploy.yarn.ApplicationMaster.org
$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at
org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
{code}

It actually goes wrong at this line:
https://github.com/apache/spark/blob/v2.4.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L468

Now, I'm 100% sure Spark is OK and there's no bug, but there must be
something wrong with my setup. I don't understand the code of the
ApplicationMaster, so could somebody explain me what it is trying to reach?
Where exactly does the connection timeout? So at least I can debug it
further because I don't have a clue what it is doing :-)

Thanks for any help!
Jochen