Hello folks, My team and I are interested in the possibility of contributing a BigQueryDataFramesOperator / BigFramesOperator that would take some Python code, similar to the PythonOperator or PythonVirtualenvOperator. The idea is that I could automatically set some best practices before running the supplied code, as demonstrated by my blog post here: https://medium.com/google-cloud/creating-a-production-ready-data-pipeline-with-apache-airflow-and-bigframes-bead7d7d164b . I have a couple of questions before I begin implementation:
1. BigFrames is a large-ish package with a pretty big dependency tree. I'm wary of having an operator depend directly on it. Is there a way other folks have avoided this? 2. One idea for isolating dependencies is the virtualenv operator. Would it be acceptable to have an operator that wraps a PythonVirtualenvOperator? Or subclasses? If so, which would be preferred? 3. Another feature I'd like to make sure I handle is getting credentials using the BigQueryHook. There are a lot of complex auth scenarios, such as impersonation_scopes ( https://airflow.apache.org/docs/apache-airflow-providers-google/stable/_api/airflow/providers/google/cloud/hooks/bigquery/index.html#airflow.providers.google.cloud.hooks.bigquery.BigQueryHook.impersonation_scopes) that I want to make sure we handle. One idea we are bouncing around is using get_python_source() ( https://github.com/apache/airflow/blob/1c5aa24ccd63b5a5052eaebc52959c7a20fc298a/providers/standard/src/airflow/providers/standard/operators/python.py#L493) and injecting the custom initialization code after the function definition. I'd love to hear your thoughts. * • **Tim Sweña (Swast)* * • *Team Lead, BigQuery DataFrames * • *Google Cloud Platform * • *Chicago, IL, USA
