Yeah I'm not saying there shouldn't be an airflow library. It's just > unclear to me what its purpose would be and it would be helpful in > evaluating the question to have some kind of a sketch of it. What > interface it would introduce, how it would be used etc. >
Yep. Very reasonable questions to ask. Mind -it's not hashed out - it's not a "proposal" yet, just "discussion" - mostly to find out whether this one raises someone's eyebrows in terms of "yeah, we would love to have it" or "meh - not worth getting into details". So I definitely do not have many answers. But roughly speaking. Ibis defines python API for data processing (that's simplifying things of course) to manipulate data - mapping the Pythonic/Dataframe interface to underlying DB engines. You can essentially use the same code to access data in local in-memory DuckDB in dev and Bigquery in PROD. g = t.group_by(["species", "island"]).agg(count=t.count()).order_by("count") As mentioned before - initially (similarly to common.io) - Airflow connection id's could be used to instantiate Ibis connection instead of con = ibis.connect("duckdb://") it would be (for example - it likely could be better): con = common.dataframe.Dataframe.get(conn_id) So nothing really fancy. No new API to define, just a glue to existing Airflow connections. Eventually (and as Gil mentioned that might be future) - thanks to unified Ibis API, it could mean that lineage information is extracted automatically at the level of the Ibis API, rather than having to implement it separately for each of those engines that are supported by Ibis (and future ones). I hope that's enough to hear if that seems like something that is of interest :). J.