[ 
https://issues.apache.org/jira/browse/AIRFLOW-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kaxil Naik updated AIRFLOW-6229:
--------------------------------
    Issue Type: Bug  (was: New Feature)

> SparkSubmitOperator polls forever if status json can't find driverState tag
> ---------------------------------------------------------------------------
>
>                 Key: AIRFLOW-6229
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6229
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.10.6
>            Reporter: t oo
>            Assignee: t oo
>            Priority: Major
>             Fix For: 1.10.8
>
>
> You click ‘release’ on a new spark cluster while the prior spark cluster is 
> processing some spark submits from airflow. Then airflow is never able to 
> finish the sparksubmit task as it polls from status on the new spark cluster 
> build which it can’t find status for as the submit happened on earlier spark 
> cluster build….the status loop goes on forever
>  
> [https://github.com/apache/airflow/blob/1.10.6/airflow/contrib/hooks/spark_submit_hook.py#L446]
> [https://github.com/apache/airflow/blob/1.10.6/airflow/contrib/hooks/spark_submit_hook.py#L489]
> It loops forever if it can’t find driverState tag in the json response, since 
> the new build (pointed to by the released DNS name) doesn’t know about the 
> driver submitted (in previously released build) then the 2nd response below 
> does not contain the driverState tag.
>   
> #response before clicking release on new build
> [ec2-user@reda ~]$
> curl +[http://dns:6066/v1/submissions/status/driver-20191202142207-0000]+
> {  "action" : "SubmissionStatusResponse",  "driverState" : "RUNNING",  
> "serverSparkVersion" : "2.3.4",  "submissionId" : 
> "driver-20191202142207-0000",  "success" : true,  "workerHostPort" : 
> "reda:31489",  "workerId" : "worker-20191202133526-reda-31489"}
>  
> #response after clicking release on new build
> [ec2-user@reda ~]$
> curl [http://dns:6066/v1/submissions/status/driver-20191202142207-0000]     
> {  "action" : "SubmissionStatusResponse",  "serverSparkVersion" : "2.3.4",  
> "submissionId" : "driver-20191202142207-0000",  "success" : false             
>   }
>                
>  
> Definitely a defect in current code. Can fix this by modifying 
> _process_spark_status_log function to set driver status to UNKNOWN if 
> driverState is not in response after iterating all lines.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to