[ https://issues.apache.org/jira/browse/AIRFLOW-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kaxil Naik resolved AIRFLOW-3035. --------------------------------- Resolution: Fixed Fix Version/s: (was: 1.10.1) 2.0.0 Resolved by https://github.com/apache/incubator-airflow/pull/3884 > gcp_dataproc_hook should treat CANCELLED job state consistently > --------------------------------------------------------------- > > Key: AIRFLOW-3035 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3035 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib > Affects Versions: 1.10.0 > Reporter: Jeffrey Payne > Assignee: Jeffrey Payne > Priority: Minor > Labels: dataproc > Fix For: 2.0.0 > > > When a DP job is cancelled, {{gcp_dataproc_hook.py}} does not treat the > {{CANCELLED}} state in a consistent and non-intuitive manner: > # The API internal to {{gcp_dataproc_hook.py}} returns {{False}} from > {{_DataProcJob.wait_for_done()}}, resulting in {{raise_error()}} being called > for cancelled jobs, yet {{raise_error()}} only raises {{Exception}} if the > job state is {{ERROR}}. > # The end result from the perspective of the {{dataproc_operator.py}} for a > cancelled job is that the job succeeded, which results in the success > callback being called. This seems strange to me, as a "cancelled" job is > rarely considered successful, in my experience. > Simply changing {{raise_error()}} from: > {code:python} > if 'ERROR' == self.job['status']['state']: > {code} > to > {code:python} > if self.job['status']['state'] in ('ERROR', 'CANCELLED'): > {code} > would fix both of these... > Another, perhaps better, option would be to have the dataproc job operators > accept a list of {{error_states}} that could be passed into > {{raise_error()}}, allowing the caller to determine which states should > result in "failure" of the task. I would lean towards that option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)