Re: Spark job fails because of timeout to Driver

2019-10-04 Thread igor cabral uchoa
Maybe it is a basic question, but your cluster has enough resource to run your application? It is requesting 208G of RAM  Thanks, Sent from Yahoo Mail for iPhone On Friday, October 4, 2019, 2:31 PM, Jochen Hebbrecht wrote: Hi Igor, We are deploying by submitting a batch job on a Livy server

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Jochen Hebbrecht
Hi Igor, We are deploying by submitting a batch job on a Livy server (from our local PC or a Jenkins node). The Livy server then deploys the Spark job on the cluster itself. For example: --- Running '/usr/lib/spark/bin/spark-submit' '--class' '##MY_MAIN_CLASS##' '--conf'

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Roland Johann
Hi Jochen, Can you crate a small EMR cluster wirh all defaults and rhn the job there? This way we can ensure that the issue is not infrastructure and YARN configuration related. Kind regards Jochen Hebbrecht schrieb am Fr. 4. Okt. 2019 um 19:27: > Hi Roland, > > I switched to the default

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Jochen Hebbrecht
Hi Roland, I switched to the default security groups, ran my job again but the same exception pops up :-( ... All traffic is open on the security groups now. Jochen Op vr 4 okt. 2019 om 17:37 schreef Roland Johann : > This are dynamic port ranges and dependa on configuration of your cluster. >

Re: Spark on kubernetes : missing spark.kubernetes.driver.request.cores parameter ?

2019-10-04 Thread jcdauchy
I am actually answering myself as I have check on master 3.x branch, and there is this feature ! https://issues.apache.org/jira/browse/SPARK-27754 So my understanding was correct. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Spark on kubernetes : missing spark.kubernetes.driver.request.cores parameter ?

2019-10-04 Thread jcdauchy
Hello all, I am surprised that it is not possible to define "spark.kubernetes.driver.request.cores" when submitting a spark job on kubernetes. My understanding is that it would limit the CPU requests for the driver on the k8s cluster and we could still define how many cores (threads) we use in

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread igor cabral uchoa
Hi Roland! What deploy mode are you using when you submit your applications? It is client or cluster mode? Regards, Sent from Yahoo Mail for iPhone On Friday, October 4, 2019, 12:37 PM, Roland Johann wrote: This are dynamic port ranges and dependa on configuration of your cluster. Per job

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Roland Johann
This are dynamic port ranges and dependa on configuration of your cluster. Per job there is a separate application master so there can‘t be just one port. If I remeber correctly the default EMR setup creates worker security groups with unrestricted traffic within the group, e.g. Between the worker

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Jochen Hebbrecht
Hi Roland, We have indeed custom security groups. Can you tell me where exactly I need to be able to access what? For example, is it from the master instance to the driver instance? And which port should be open? Jochen Op vr 4 okt. 2019 om 17:14 schreef Roland Johann : > Ho Jochen, > > did

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Roland Johann
Ho Jochen, did you setup the EMR cluster with custom security groups? Can you confirm that the relevant EC2 instances can connect through relevant ports? Best regards Jochen Hebbrecht schrieb am Fr. 4. Okt. 2019 um 17:09: > Hi Jeff, > > Thanks! Just tried that, but the same timeout occurs :-(

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Jochen Hebbrecht
Hi Jeff, Thanks! Just tried that, but the same timeout occurs :-( ... Jochen Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang : > You can try to increase property spark.yarn.am.waitTime (by default it is > 100s) > Maybe you are doing some very time consuming operation when initializing >

Re: Spark job fails because of timeout to Driver

2019-10-04 Thread Jeff Zhang
You can try to increase property spark.yarn.am.waitTime (by default it is 100s) Maybe you are doing some very time consuming operation when initializing SparkContext, which cause timeout. See this property here http://spark.apache.org/docs/latest/running-on-yarn.html Jochen Hebbrecht

Spark job fails because of timeout to Driver

2019-10-04 Thread Jochen Hebbrecht
Hi, I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job towards the cluster. Thhe job gets accepted, but the YARN application fails with: {code} 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after