brian-lavery opened a new pull request #11111:
URL: https://github.com/apache/airflow/pull/11111


   This is a request to add the GreatExpectationsBigQueryOperator to Airflow.  
I'm probably bending protocol a bit because this is what I call a 'preliminary' 
pull request.  Before going to the effort of writing the tests for the new 
operator and submitting a more refined pull request, I need feedback about the 
viability of this operator being accepted into Airflow and any other major 
flaws I need to address.
   
   I'll start with some background.  What is Great Expectations(GE)?  It's an 
open source python project that automates data QA testing and data profiling.  
A user's data 
'[expectations](https://docs.greatexpectations.io/en/latest/reference/glossary_of_expectations.html)'
 can be laid out in a json file and GE will check those expectations against a 
table or query result by firing off a series of SQL queries.  Results of the 
checks are output in json and html files.  I've created a [short 
video](https://photos.app.goo.gl/PPHgCySA6iprVSWz5) to explain how GE and the 
operator work.
   
   I'm looking for any feedback I can get but I have three immediate questions:
   1. Does this sound like a contribution that would be accepted into Airflow?
   2. Is there any special testing that needs to be done because the operator 
imports parts of the great expectations package, which has never been used 
before in Airflow?
   3. Is it a problem that the operator connects to BigQuery in a somewhat 
unusual way compared with other operators?  The operator takes a BigQuery 
connection as a parameter but the actual connection to BigQuery is made inside 
of the GE code.  Therefore, the operator pulls some connection info from the 
hook but just passes that information to GE so it can make the connection.
   
   Thanks,
   Brian


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to