vchiapaikeo commented on issue #32870:
URL: https://github.com/apache/airflow/issues/32870#issuecomment-1775458002

   > Can you please advise how to get gcp_conn_id? I only used default value so 
far. Thanks!
   
   So here's a simple dag example:
   
   ```py
   from airflow import DAG
   from airflow.providers.google.cloud.operators.bigquery import 
BigQueryValueCheckOperator
   
   
   DEFAULT_TASK_ARGS = {
       "owner": "gcp-data-platform",
       "start_date": "2023-03-13",
       "retries": 1,
       "retry_delay": 300,
   }
   
   
   with DAG(
       schedule_interval="@daily",
       max_active_runs=1,
       max_active_tasks=5,
       catchup=False,
       dag_id="test_bigquery_value_check",
       default_args=DEFAULT_TASK_ARGS,
   ) as dag:
       value_check_on_same_project_without_impersonation = 
BigQueryValueCheckOperator(
          task_id="value_check_on_same_project_without_impersonation",
          sql=f"select count(1) from `airflow-vchiapaikeo.test.table1`",
          pass_value=1,
          tolerance=0.15,
          use_legacy_sql=False,
          location="US",
          gcp_conn_id="google_cloud_default",
          # deferrable=True,
          # 
impersonation_chain=["airf...@airflow-vchiapaikeo.iam.gserviceaccount.com"],
       )
   
       value_check_on_diff_project_with_impersonation = 
BigQueryValueCheckOperator(
          
task_id="value_check_on_diff_project_without_impersonation_expect_fail",
          sql=f"select count(1) from `airflow2-vchiapaikeo.test.table1`",
          pass_value=1,
          tolerance=0.15,
          use_legacy_sql=False,
          location="US",
          gcp_conn_id="google_cloud_default2",
          # deferrable=True,
          
impersonation_chain=["airfl...@airflow2-vchiapaikeo.iam.gserviceaccount.com"],
       )
   ```
   
   I define two different gcp_conn_ids. One w/ project A and the other w/ 
project B. 
   
   <img width="1440" alt="image" 
src="https://github.com/apache/airflow/assets/9200263/def44dd2-99b0-42cf-aaf1-074fae41bb7f";>
   
   You can see the second operator gets executed in the correct project and 
with the correct service account here: 
   
   <img width="1299" alt="image" 
src="https://github.com/apache/airflow/assets/9200263/ea2c7d1d-2267-454d-a6a1-92f16adc58e9";>
   
   
   > And as I read the source code of other operators, they use a hook to pass 
impersonation chain, and send the request via the hook, instead of send the 
request directly. I guess this might be the reason? Is that possible to use a 
hook as well in this operator as well?
   
   
   This is a little complicated actually and I don't totally understand all of 
it. Part of the hook uses the [soon to be deprecated discovery 
API](https://github.com/apache/airflow/blob/789222cb1378079e2afd24c70c1a6783b57e27e6/airflow/providers/google/cloud/hooks/bigquery.py#L149)
 and the other part uses the [BigQuery 
client](https://github.com/apache/airflow/blob/789222cb1378079e2afd24c70c1a6783b57e27e6/airflow/providers/google/cloud/hooks/bigquery.py#L37).
 The part that uses the discovery api infers the project id from the 
gcp_conn_id connection. The common code shared among the DbApiHook probably 
needs to be refactored to move away from the discovery API and to use BigQuery 
client... but it will be quite difficult 😓 . Please correct me if I am wrong, 
anybody that knows this code better than I do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to