[ 
https://issues.apache.org/jira/browse/AIRFLOW-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Soto updated AIRFLOW-5385:
---------------------------------
    Description: 
Hello,

we have an issue with SparkSubmitOperator.  Airflow DAGs shows that some 
streaming applications breaks out. I analyzed this behaviour. The 
SparkSubmitHook is the responsable of check the driver status.

We discovered some timeouts and tried to reproduce checking command. This is an 
execution with `time`:
{code:java}
time /opt/java/jdk1.8.0_181/jre/bin/java -cp 
/opt/shared/spark/client/conf/:/opt/shared/spark/client/jars/* -Xmx1g 
org.apache.spark.deploy.SparkSubmit --master spark://spark-master.corp.com:6066 
--status driver-20190901180337-2749 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/09/02 17:05:53 INFO RestSubmissionClient: Submitting a request for the 
status of submission driver-20190901180337-2749 in 
spark://lgmadbdtpspk01v.corp.logitravelgroup.com:6066.
19/09/02 17:05:59 INFO RestSubmissionClient: Server responded with 
SubmissionStatusResponse:
{
  "action" : "SubmissionStatusResponse",
  "driverState" : "RUNNING",
  "serverSparkVersion" : "2.2.1",
  "submissionId" : "driver-20190901180337-2749",
  "success" : true,
  "workerHostPort" : "172.25.10.194:45441",
  "workerId" : "worker-20190821201014-172.25.10.194-45441"
}

real 0m11.598s 
user 0m2.092s 
sys 0m0.222s{code}
We analyzed the Scala code and Spark API. This spark-submit status command ends 
with a http get request to an url. Using curl, this is the time spent by spark 
master to return status:
{code:java}
 time curl 
"http://spark-master.corp.com:6066/v1/submissions/status/driver-20190901180337-2749";
{
  "action" : "SubmissionStatusResponse",
  "driverState" : "RUNNING",
  "serverSparkVersion" : "2.2.1",
  "submissionId" : "driver-20190901180337-2749",
  "success" : true,
  "workerHostPort" : "172.25.10.194:45441",
  "workerId" : "worker-20190821201014-172.25.10.194-45441"
}
real    0m0.011s
user    0m0.000s
sys     0m0.006s
{code}
Task spends 11.59 seconds with spark submit versus 0.011seconds with curl

How can be this behaviour explained?

  was:
Hello,

we have an issue with SparkSubmitOperator.  Airflow DAGs shows that some 
streaming applications breaks out. I analyzed this behaviour. The 
SparkSubmitHook is the responsable of check the driver status. 

We discovered some timeouts and tried to reproduce checking command. This is an 
execution with `time`:
{code:java}
time /opt/java/jdk1.8.0_181/jre/bin/java -cp 
/opt/shared/spark/client/conf/:/opt/shared/spark/client/jars/* -Xmx1g 
org.apache.spark.deploy.SparkSubmit --master spark://spark-master.corp.com:6066 
--status driver-20190901180337-2749 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/09/02 17:05:53 INFO RestSubmissionClient: Submitting a request for the 
status of submission driver-20190901180337-2749 in 
spark://lgmadbdtpspk01v.corp.logitravelgroup.com:6066.
19/09/02 17:05:59 INFO RestSubmissionClient: Server responded with 
SubmissionStatusResponse:
{
  "action" : "SubmissionStatusResponse",
  "driverState" : "RUNNING",
  "serverSparkVersion" : "2.2.1",
  "submissionId" : "driver-20190901180337-2749",
  "success" : true,
  "workerHostPort" : "172.25.10.194:45441",
  "workerId" : "worker-20190821201014-172.25.10.194-45441"
}

real 0m11.598s 
user 0m2.092s 
sys 0m0.222s{code}
We analyzed the Scala code and Spark API. This spark-submit status command ends 
with a http get request to an url. Using curl, this is the time spent by spark 
master to return status:
{code:java}
 time curl 
"http://lgmadbdtpspk01v.corp.logitravelgroup.com:6066/v1/submissions/status/driver-20190901180337-2749";
{
  "action" : "SubmissionStatusResponse",
  "driverState" : "RUNNING",
  "serverSparkVersion" : "2.2.1",
  "submissionId" : "driver-20190901180337-2749",
  "success" : true,
  "workerHostPort" : "172.25.10.194:45441",
  "workerId" : "worker-20190821201014-172.25.10.194-45441"
}
real    0m0.011s
user    0m0.000s
sys     0m0.006s
{code}
Task spends 11.59 seconds with spark submit versus 0.011seconds with curl

How can be this behaviour explained?


> SparkSubmit status spend lot of time
> ------------------------------------
>
>                 Key: AIRFLOW-5385
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5385
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: contrib
>    Affects Versions: 1.10.2
>            Reporter: Sergio Soto
>            Priority: Blocker
>
> Hello,
> we have an issue with SparkSubmitOperator.  Airflow DAGs shows that some 
> streaming applications breaks out. I analyzed this behaviour. The 
> SparkSubmitHook is the responsable of check the driver status.
> We discovered some timeouts and tried to reproduce checking command. This is 
> an execution with `time`:
> {code:java}
> time /opt/java/jdk1.8.0_181/jre/bin/java -cp 
> /opt/shared/spark/client/conf/:/opt/shared/spark/client/jars/* -Xmx1g 
> org.apache.spark.deploy.SparkSubmit --master 
> spark://spark-master.corp.com:6066 --status driver-20190901180337-2749 
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 19/09/02 17:05:53 INFO RestSubmissionClient: Submitting a request for the 
> status of submission driver-20190901180337-2749 in 
> spark://lgmadbdtpspk01v.corp.logitravelgroup.com:6066.
> 19/09/02 17:05:59 INFO RestSubmissionClient: Server responded with 
> SubmissionStatusResponse:
> {
>   "action" : "SubmissionStatusResponse",
>   "driverState" : "RUNNING",
>   "serverSparkVersion" : "2.2.1",
>   "submissionId" : "driver-20190901180337-2749",
>   "success" : true,
>   "workerHostPort" : "172.25.10.194:45441",
>   "workerId" : "worker-20190821201014-172.25.10.194-45441"
> }
> real 0m11.598s 
> user 0m2.092s 
> sys 0m0.222s{code}
> We analyzed the Scala code and Spark API. This spark-submit status command 
> ends with a http get request to an url. Using curl, this is the time spent by 
> spark master to return status:
> {code:java}
>  time curl 
> "http://spark-master.corp.com:6066/v1/submissions/status/driver-20190901180337-2749";
> {
>   "action" : "SubmissionStatusResponse",
>   "driverState" : "RUNNING",
>   "serverSparkVersion" : "2.2.1",
>   "submissionId" : "driver-20190901180337-2749",
>   "success" : true,
>   "workerHostPort" : "172.25.10.194:45441",
>   "workerId" : "worker-20190821201014-172.25.10.194-45441"
> }
> real  0m0.011s
> user  0m0.000s
> sys   0m0.006s
> {code}
> Task spends 11.59 seconds with spark submit versus 0.011seconds with curl
> How can be this behaviour explained?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to