[ https://issues.apache.org/jira/browse/AIRFLOW-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kaxil Naik updated AIRFLOW-6229: -------------------------------- Issue Type: Bug (was: New Feature) > SparkSubmitOperator polls forever if status json can't find driverState tag > --------------------------------------------------------------------------- > > Key: AIRFLOW-6229 > URL: https://issues.apache.org/jira/browse/AIRFLOW-6229 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler > Affects Versions: 1.10.6 > Reporter: t oo > Assignee: t oo > Priority: Major > Fix For: 1.10.8 > > > You click ‘release’ on a new spark cluster while the prior spark cluster is > processing some spark submits from airflow. Then airflow is never able to > finish the sparksubmit task as it polls from status on the new spark cluster > build which it can’t find status for as the submit happened on earlier spark > cluster build….the status loop goes on forever > > [https://github.com/apache/airflow/blob/1.10.6/airflow/contrib/hooks/spark_submit_hook.py#L446] > [https://github.com/apache/airflow/blob/1.10.6/airflow/contrib/hooks/spark_submit_hook.py#L489] > It loops forever if it can’t find driverState tag in the json response, since > the new build (pointed to by the released DNS name) doesn’t know about the > driver submitted (in previously released build) then the 2nd response below > does not contain the driverState tag. > > #response before clicking release on new build > [ec2-user@reda ~]$ > curl +[http://dns:6066/v1/submissions/status/driver-20191202142207-0000]+ > { "action" : "SubmissionStatusResponse", "driverState" : "RUNNING", > "serverSparkVersion" : "2.3.4", "submissionId" : > "driver-20191202142207-0000", "success" : true, "workerHostPort" : > "reda:31489", "workerId" : "worker-20191202133526-reda-31489"} > > #response after clicking release on new build > [ec2-user@reda ~]$ > curl [http://dns:6066/v1/submissions/status/driver-20191202142207-0000] > { "action" : "SubmissionStatusResponse", "serverSparkVersion" : "2.3.4", > "submissionId" : "driver-20191202142207-0000", "success" : false > } > > > Definitely a defect in current code. Can fix this by modifying > _process_spark_status_log function to set driver status to UNKNOWN if > driverState is not in response after iterating all lines. > -- This message was sent by Atlassian Jira (v8.3.4#803005)