brian-lavery opened a new pull request #11111: URL: https://github.com/apache/airflow/pull/11111
This is a request to add the GreatExpectationsBigQueryOperator to Airflow. I'm probably bending protocol a bit because this is what I call a 'preliminary' pull request. Before going to the effort of writing the tests for the new operator and submitting a more refined pull request, I need feedback about the viability of this operator being accepted into Airflow and any other major flaws I need to address. I'll start with some background. What is Great Expectations(GE)? It's an open source python project that automates data QA testing and data profiling. A user's data '[expectations](https://docs.greatexpectations.io/en/latest/reference/glossary_of_expectations.html)' can be laid out in a json file and GE will check those expectations against a table or query result by firing off a series of SQL queries. Results of the checks are output in json and html files. I've created a [short video](https://photos.app.goo.gl/PPHgCySA6iprVSWz5) to explain how GE and the operator work. I'm looking for any feedback I can get but I have three immediate questions: 1. Does this sound like a contribution that would be accepted into Airflow? 2. Is there any special testing that needs to be done because the operator imports parts of the great expectations package, which has never been used before in Airflow? 3. Is it a problem that the operator connects to BigQuery in a somewhat unusual way compared with other operators? The operator takes a BigQuery connection as a parameter but the actual connection to BigQuery is made inside of the GE code. Therefore, the operator pulls some connection info from the hook but just passes that information to GE so it can make the connection. Thanks, Brian ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
