BarBuccianti opened a new issue, #68908:
URL: https://github.com/apache/airflow/issues/68908

   ### Description
   
   Add provider-maintained deferrable support for invoking and waiting on 
Google Cloud Functions / HTTP Cloud Run functions from 
apache-airflow-providers-google.
   
   Today, the Google provider supports deferrable execution for several Google 
Cloud services, including BigQuery, GCS, Dataflow, Pub/Sub, and Cloud Run Jobs 
via CloudRunExecuteJobOperator. However, there does not appear to be an 
equivalent deferrable operator/sensor/trigger pattern for HTTP Cloud Functions 
or HTTP Cloud Run functions.
   
   CloudFunctionInvokeFunctionOperator is synchronous and documented as 
intended for testing purposes with limited traffic. For production workflows 
that trigger a function and then need to wait for asynchronous completion, 
users currently need to either:
   
   maintain custom Airflow trigger/sensor/operator code, including Google auth, 
polling, timeout, retry, and failure semantics; or
   implement an indirect durable-status pattern, such as having the function 
write completion state to BigQuery/GCS and waiting on that state with an 
existing deferrable sensor.
   It would be useful to have a first-class deferrable pattern in the Google 
provider for this use case, for example a deferrable Cloud Functions / HTTP 
Cloud Run function operator or sensor that handles invocation, authenticated 
HTTP requests, polling/completion checks, timeout handling, retries, and 
failure propagation.
   
   
   ### Use case/motivation
   
   We have Airflow DAGs that trigger Google Cloud Functions / HTTP Cloud Run 
functions to perform asynchronous work. The function invocation itself is 
short, but the downstream processing can take longer, and Airflow needs to wait 
for the work to complete before continuing the DAG.
   
   Because there is no provider-maintained deferrable Cloud Functions / HTTP 
Cloud Run function sensor/trigger today, using a synchronous task or regular 
sensor would occupy worker resources while waiting. To avoid that, we currently 
use a workaround where the function writes completion/status data to BigQuery, 
and Airflow waits on that status using an existing deferrable BigQuery sensor.
   
   This works, but it adds extra infrastructure and indirection only to 
compensate for the missing deferrable Cloud Functions / HTTP function pattern. 
A provider-supported deferrable operator/sensor would reduce maintenance 
burden, avoid custom triggerer code, and make this pattern more consistent with 
other Google provider integrations such as BigQuery, GCS, Pub/Sub, Dataflow, 
dbt-style async workflows, and Cloud Run Jobs.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to