Hello!

Ibis core developer here.  This is an exciting proposal, we'd love to see Ibis 
and Airflow working together smoothly.  To Daniel's point, yes, Ibis is a 
common dataframe library on its own, but to reiterate Jarek's response, we 
don't really handle orchestration, especially between multiple connections or 
multiple backends. 

Having Airflow handle passing around authentication, as well as various 
connection metadata, would, I think, be a pretty powerful chunk of tooling for 
interacting with one or more SQL (or other) compute engines.

Please do open issues with any feature requests or questions about hooking into 
Ibis.  We're also happy to sync up at some point if it would be helpful to 
sketch out a general pattern for how to integrate Ibis.

To Jarek's last point, we don't currently offer a public API for inspecting 
column lineage, but Ibis has a deferred computation model, so we have all of 
that information available.  I wrote up a long-ish response in a GitHub 
discussion last year outlining the general method of inspecting the lineage of 
a given expression that is (I think) still up-to-date: 
https://github.com/ibis-project/ibis/discussions/7248#discussioncomment-7138710

- Gil

On 2024/06/25 02:51:04 Jarek Potiuk wrote:
> And another - far more important reason (and reason why we have common.sql)
> - we could VERY LIKELY (maybe Maciej and Kacper could comment on that) - we
> could have equivalent of column-level lineage implemented once for all the
> engines - by adding "common.dataframe" open-lineage information.
> 
> On Tue, Jun 25, 2024 at 4:46 AM Jarek Potiuk <[email protected]> wrote:
> 
> > That is a very good question and I forgot to mention it. The main reason
> > is the same as in common.io - we could make it work with our standard
> > "Hook/Connection" framework so that you could get
> > authentication information from Airflow Connections, plugging in the
> > Secrets/ DB Connection information.
> >
> > So basically that would be a glue between Airflow configuration of
> > authentication and Ibis.
> >
> > J.
> >
> >
> > On Tue, Jun 25, 2024 at 12:45 AM Daniel Standish
> > <[email protected]> wrote:
> >
> >> There might be a good case for "why ibis", but why should airflow wrap
> >> ibis? Why do we need a common dataframe library?  Is ibis not "that"
> >> already?
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Mon, Jun 24, 2024 at 3:31 PM Kaxil Naik <[email protected]> wrote:
> >>
> >> > Yeah, the other option is to include it in the common.sql package since
> >> > they are related. But I am okay with the common.dataframe, too.
> >> >
> >> >
> >> >
> >> > On Mon, 24 Jun 2024 at 20:04, Jarek Potiuk <[email protected]> wrote:
> >> >
> >> > > Hello here,
> >> > >
> >> > > At Pycon US earlier this year I had a number of interesting
> >> conversations
> >> > > and one of the - very interesting - conversations I had was with the
> >> Ibis
> >> > > team and I thought maybe we should consider releasing
> >> "common.dataframe"
> >> > > provider for Airflow - following up after "common.sql" and "common.io
> >> ".
> >> > >
> >> > > Ibis is gaining a lot of popularity recently and it might be at
> >> > > more-or-less the same "place" as fsspec when Bolke added "common.io".
> >> > Plus
> >> > > if airflow adds it as a community provider, it might also bring Ibis'
> >> > > popularity up.
> >> > >
> >> > > In short - Ibis is a "Portable Python dataframe library". It becomes
> >> more
> >> > > and more popular and it not only serves 20+ dataframe backends with
> >> the
> >> > > same, portable API, but also allows to mix SQL with dataframes and few
> >> > more
> >> > > things. Some time ago there were some ideas that we could add
> >> > "SQLAlchemy"
> >> > > as an additional "common" interface in "common.sql" - but actually it
> >> > seems
> >> > > that Ibis provides a much better abstraction that unifies SQL and
> >> > Dataframe
> >> > > approach nicely - way better suited for the "data science" world of
> >> > > Airflow.
> >> > >
> >> > > You can see very nice overview "why Ibis" here:
> >> > > https://ibis-project.org/why
> >> > > - and I think it would be pretty natural thing to add on top of
> >> > > "common.sql" and "common.io" - following "Airflow As a Platform"
> >> mantra.
> >> > >
> >> > > WDYT?
> >> > >
> >> > > J.
> >> > >
> >> >
> >>
> >
> 

Reply via email to