[ https://issues.apache.org/jira/browse/AIRFLOW-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergio Soto updated AIRFLOW-5385: --------------------------------- Description: Hello, we have an issue with SparkSubmitOperator. Airflow DAGs shows that some streaming applications breaks out. I analyzed this behaviour. The SparkSubmitHook is the responsable of check the driver status. We discovered some timeouts and tried to reproduce checking command. This is an execution with `time`: {code:java} time /opt/java/jdk1.8.0_181/jre/bin/java -cp /opt/shared/spark/client/conf/:/opt/shared/spark/client/jars/* -Xmx1g org.apache.spark.deploy.SparkSubmit --master spark://spark-master.corp.com:6066 --status driver-20190901180337-2749 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 19/09/02 17:05:53 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20190901180337-2749 in spark://lgmadbdtpspk01v.corp.logitravelgroup.com:6066. 19/09/02 17:05:59 INFO RestSubmissionClient: Server responded with SubmissionStatusResponse: { "action" : "SubmissionStatusResponse", "driverState" : "RUNNING", "serverSparkVersion" : "2.2.1", "submissionId" : "driver-20190901180337-2749", "success" : true, "workerHostPort" : "172.25.10.194:45441", "workerId" : "worker-20190821201014-172.25.10.194-45441" } real 0m11.598s user 0m2.092s sys 0m0.222s{code} We analyzed the Scala code and Spark API. This spark-submit status command ends with a http get request to an url. Using curl, this is the time spent by spark master to return status: {code:java} time curl "http://spark-master.corp.com:6066/v1/submissions/status/driver-20190901180337-2749" { "action" : "SubmissionStatusResponse", "driverState" : "RUNNING", "serverSparkVersion" : "2.2.1", "submissionId" : "driver-20190901180337-2749", "success" : true, "workerHostPort" : "172.25.10.194:45441", "workerId" : "worker-20190821201014-172.25.10.194-45441" } real 0m0.011s user 0m0.000s sys 0m0.006s {code} Task spends 11.59 seconds with spark submit versus 0.011seconds with curl How can be this behaviour explained? was: Hello, we have an issue with SparkSubmitOperator. Airflow DAGs shows that some streaming applications breaks out. I analyzed this behaviour. The SparkSubmitHook is the responsable of check the driver status. We discovered some timeouts and tried to reproduce checking command. This is an execution with `time`: {code:java} time /opt/java/jdk1.8.0_181/jre/bin/java -cp /opt/shared/spark/client/conf/:/opt/shared/spark/client/jars/* -Xmx1g org.apache.spark.deploy.SparkSubmit --master spark://spark-master.corp.com:6066 --status driver-20190901180337-2749 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 19/09/02 17:05:53 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20190901180337-2749 in spark://lgmadbdtpspk01v.corp.logitravelgroup.com:6066. 19/09/02 17:05:59 INFO RestSubmissionClient: Server responded with SubmissionStatusResponse: { "action" : "SubmissionStatusResponse", "driverState" : "RUNNING", "serverSparkVersion" : "2.2.1", "submissionId" : "driver-20190901180337-2749", "success" : true, "workerHostPort" : "172.25.10.194:45441", "workerId" : "worker-20190821201014-172.25.10.194-45441" } real 0m11.598s user 0m2.092s sys 0m0.222s{code} We analyzed the Scala code and Spark API. This spark-submit status command ends with a http get request to an url. Using curl, this is the time spent by spark master to return status: {code:java} time curl "http://lgmadbdtpspk01v.corp.logitravelgroup.com:6066/v1/submissions/status/driver-20190901180337-2749" { "action" : "SubmissionStatusResponse", "driverState" : "RUNNING", "serverSparkVersion" : "2.2.1", "submissionId" : "driver-20190901180337-2749", "success" : true, "workerHostPort" : "172.25.10.194:45441", "workerId" : "worker-20190821201014-172.25.10.194-45441" } real 0m0.011s user 0m0.000s sys 0m0.006s {code} Task spends 11.59 seconds with spark submit versus 0.011seconds with curl How can be this behaviour explained? > SparkSubmit status spend lot of time > ------------------------------------ > > Key: AIRFLOW-5385 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5385 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib > Affects Versions: 1.10.2 > Reporter: Sergio Soto > Priority: Blocker > > Hello, > we have an issue with SparkSubmitOperator. Airflow DAGs shows that some > streaming applications breaks out. I analyzed this behaviour. The > SparkSubmitHook is the responsable of check the driver status. > We discovered some timeouts and tried to reproduce checking command. This is > an execution with `time`: > {code:java} > time /opt/java/jdk1.8.0_181/jre/bin/java -cp > /opt/shared/spark/client/conf/:/opt/shared/spark/client/jars/* -Xmx1g > org.apache.spark.deploy.SparkSubmit --master > spark://spark-master.corp.com:6066 --status driver-20190901180337-2749 > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 19/09/02 17:05:53 INFO RestSubmissionClient: Submitting a request for the > status of submission driver-20190901180337-2749 in > spark://lgmadbdtpspk01v.corp.logitravelgroup.com:6066. > 19/09/02 17:05:59 INFO RestSubmissionClient: Server responded with > SubmissionStatusResponse: > { > "action" : "SubmissionStatusResponse", > "driverState" : "RUNNING", > "serverSparkVersion" : "2.2.1", > "submissionId" : "driver-20190901180337-2749", > "success" : true, > "workerHostPort" : "172.25.10.194:45441", > "workerId" : "worker-20190821201014-172.25.10.194-45441" > } > real 0m11.598s > user 0m2.092s > sys 0m0.222s{code} > We analyzed the Scala code and Spark API. This spark-submit status command > ends with a http get request to an url. Using curl, this is the time spent by > spark master to return status: > {code:java} > time curl > "http://spark-master.corp.com:6066/v1/submissions/status/driver-20190901180337-2749" > { > "action" : "SubmissionStatusResponse", > "driverState" : "RUNNING", > "serverSparkVersion" : "2.2.1", > "submissionId" : "driver-20190901180337-2749", > "success" : true, > "workerHostPort" : "172.25.10.194:45441", > "workerId" : "worker-20190821201014-172.25.10.194-45441" > } > real 0m0.011s > user 0m0.000s > sys 0m0.006s > {code} > Task spends 11.59 seconds with spark submit versus 0.011seconds with curl > How can be this behaviour explained? -- This message was sent by Atlassian Jira (v8.3.2#803003)