Yeah I'm not saying there shouldn't be an airflow library.  It's just

> unclear to me what its purpose would be and it would be helpful in
> evaluating the question to have some kind of a sketch of it.  What
> interface it would introduce, how it would be used etc.
>

Yep. Very reasonable questions to ask.

Mind -it's not hashed out - it's not a "proposal" yet, just "discussion" -
mostly to find out whether this one raises someone's eyebrows in terms of
"yeah, we would love to have it" or "meh - not worth getting into
details".  So I definitely do not have many answers.

But roughly speaking. Ibis defines python API for data processing (that's
simplifying things of course) to manipulate data - mapping the
Pythonic/Dataframe interface to underlying DB engines. You can essentially
use the same code to access data in local in-memory DuckDB in dev and
Bigquery in PROD.

g = t.group_by(["species", "island"]).agg(count=t.count()).order_by("count")

As mentioned before - initially (similarly to common.io) - Airflow
connection id's could be used to instantiate Ibis connection

instead of

con = ibis.connect("duckdb://")

it would be (for example - it likely could be better):

con = common.dataframe.Dataframe.get(conn_id)

So nothing really fancy. No new API to define, just a glue to existing
Airflow connections.

Eventually (and as Gil mentioned that might be future) - thanks to unified
Ibis API, it could mean that lineage information is extracted automatically
at the level of the Ibis API, rather than having to implement it separately
for each of those engines that are supported by Ibis (and future ones).

I hope that's enough to hear if that seems like something that is of
interest :).

J.

Reply via email to