[jira] [Assigned] (AIRFLOW-2009) DataFlowHook does not use correct service account
[ https://issues.apache.org/jira/browse/AIRFLOW-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Lu reassigned AIRFLOW-2009: Assignee: Feng Lu > DataFlowHook does not use correct service account > - > > Key: AIRFLOW-2009 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2009 > Project: Apache Airflow > Issue Type: Bug > Components: Dataflow, hooks >Affects Versions: 2.0.0 >Reporter: Jessica Laughlin >Assignee: Feng Lu >Priority: Major > > We have been using the DataFlowOperator to schedule DataFlow jobs. > We found that the DataFlowHook used by the DataFlowOperator doesn't actually > use the passed `gcp_conn_id` to schedule the DataFlow job, but only to read > the results after. > code > (https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L158): > _Dataflow(cmd).wait_for_done() > _DataflowJob(self.get_conn(), variables['project'], > name, self.poll_sleep).wait_for_done() > The first line here should also be using self.get_conn(). > For this reason, our tasks using the DataFlowOperator have actually been > using the default Google Compute Engine service account (which has DataFlow > permissions) to schedule DataFlow jobs. It is only when our provided service > account (which does not have DataFlow permissions) is used in the second line > that we are seeing a permissions error. > I would like to fix this bug, but have to work around it at the moment due to > time constraints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2009) DataFlowHook does not use correct service account
[ https://issues.apache.org/jira/browse/AIRFLOW-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilson Lian reassigned AIRFLOW-2009: Assignee: (was: Wilson Lian) > DataFlowHook does not use correct service account > - > > Key: AIRFLOW-2009 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2009 > Project: Apache Airflow > Issue Type: Bug > Components: Dataflow, hooks >Affects Versions: 2.0.0 >Reporter: Jessica Laughlin >Priority: Major > > We have been using the DataFlowOperator to schedule DataFlow jobs. > We found that the DataFlowHook used by the DataFlowOperator doesn't actually > use the passed `gcp_conn_id` to schedule the DataFlow job, but only to read > the results after. > code > (https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L158): > _Dataflow(cmd).wait_for_done() > _DataflowJob(self.get_conn(), variables['project'], > name, self.poll_sleep).wait_for_done() > The first line here should also be using self.get_conn(). > For this reason, our tasks using the DataFlowOperator have actually been > using the default Google Compute Engine service account (which has DataFlow > permissions) to schedule DataFlow jobs. It is only when our provided service > account (which does not have DataFlow permissions) is used in the second line > that we are seeing a permissions error. > I would like to fix this bug, but have to work around it at the moment due to > time constraints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2009) DataFlowHook does not use correct service account
[ https://issues.apache.org/jira/browse/AIRFLOW-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilson Lian reassigned AIRFLOW-2009: Assignee: Wilson Lian > DataFlowHook does not use correct service account > - > > Key: AIRFLOW-2009 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2009 > Project: Apache Airflow > Issue Type: Bug > Components: Dataflow, hooks >Affects Versions: 2.0.0 >Reporter: Jessica Laughlin >Assignee: Wilson Lian >Priority: Major > > We have been using the DataFlowOperator to schedule DataFlow jobs. > We found that the DataFlowHook used by the DataFlowOperator doesn't actually > use the passed `gcp_conn_id` to schedule the DataFlow job, but only to read > the results after. > code > (https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/gcp_dataflow_hook.py#L158): > _Dataflow(cmd).wait_for_done() > _DataflowJob(self.get_conn(), variables['project'], > name, self.poll_sleep).wait_for_done() > The first line here should also be using self.get_conn(). > For this reason, our tasks using the DataFlowOperator have actually been > using the default Google Compute Engine service account (which has DataFlow > permissions) to schedule DataFlow jobs. It is only when our provided service > account (which does not have DataFlow permissions) is used in the second line > that we are seeing a permissions error. > I would like to fix this bug, but have to work around it at the moment due to > time constraints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)