Yeah I'm not saying there shouldn't be an airflow library.  It's just
unclear to me what its purpose would be and it would be helpful in
evaluating the question to have some kind of a sketch of it.  What
interface it would introduce, how it would be used etc.

On Tue, Jun 25, 2024 at 6:51 AM Gil Forsyth <g...@forsyth.dev> wrote:

> Hello!
>
> Ibis core developer here.  This is an exciting proposal, we'd love to see
> Ibis and Airflow working together smoothly.  To Daniel's point, yes, Ibis
> is a common dataframe library on its own, but to reiterate Jarek's
> response, we don't really handle orchestration, especially between multiple
> connections or multiple backends.
>
> Having Airflow handle passing around authentication, as well as various
> connection metadata, would, I think, be a pretty powerful chunk of tooling
> for interacting with one or more SQL (or other) compute engines.
>
> Please do open issues with any feature requests or questions about hooking
> into Ibis.  We're also happy to sync up at some point if it would be
> helpful to sketch out a general pattern for how to integrate Ibis.
>
> To Jarek's last point, we don't currently offer a public API for
> inspecting column lineage, but Ibis has a deferred computation model, so we
> have all of that information available.  I wrote up a long-ish response in
> a GitHub discussion last year outlining the general method of inspecting
> the lineage of a given expression that is (I think) still up-to-date:
> https://github.com/ibis-project/ibis/discussions/7248#discussioncomment-7138710
>
> - Gil
>
> On 2024/06/25 02:51:04 Jarek Potiuk wrote:
> > And another - far more important reason (and reason why we have
> common.sql)
> > - we could VERY LIKELY (maybe Maciej and Kacper could comment on that) -
> we
> > could have equivalent of column-level lineage implemented once for all
> the
> > engines - by adding "common.dataframe" open-lineage information.
> >
> > On Tue, Jun 25, 2024 at 4:46 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > > That is a very good question and I forgot to mention it. The main
> reason
> > > is the same as in common.io - we could make it work with our standard
> > > "Hook/Connection" framework so that you could get
> > > authentication information from Airflow Connections, plugging in the
> > > Secrets/ DB Connection information.
> > >
> > > So basically that would be a glue between Airflow configuration of
> > > authentication and Ibis.
> > >
> > > J.
> > >
> > >
> > > On Tue, Jun 25, 2024 at 12:45 AM Daniel Standish
> > > <da...@astronomer.io.invalid> wrote:
> > >
> > >> There might be a good case for "why ibis", but why should airflow wrap
> > >> ibis? Why do we need a common dataframe library?  Is ibis not "that"
> > >> already?
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Jun 24, 2024 at 3:31 PM Kaxil Naik <ka...@gmail.com> wrote:
> > >>
> > >> > Yeah, the other option is to include it in the common.sql package
> since
> > >> > they are related. But I am okay with the common.dataframe, too.
> > >> >
> > >> >
> > >> >
> > >> > On Mon, 24 Jun 2024 at 20:04, Jarek Potiuk <ja...@potiuk.com>
> wrote:
> > >> >
> > >> > > Hello here,
> > >> > >
> > >> > > At Pycon US earlier this year I had a number of interesting
> > >> conversations
> > >> > > and one of the - very interesting - conversations I had was with
> the
> > >> Ibis
> > >> > > team and I thought maybe we should consider releasing
> > >> "common.dataframe"
> > >> > > provider for Airflow - following up after "common.sql" and "
> common.io
> > >> ".
> > >> > >
> > >> > > Ibis is gaining a lot of popularity recently and it might be at
> > >> > > more-or-less the same "place" as fsspec when Bolke added "
> common.io".
> > >> > Plus
> > >> > > if airflow adds it as a community provider, it might also bring
> Ibis'
> > >> > > popularity up.
> > >> > >
> > >> > > In short - Ibis is a "Portable Python dataframe library". It
> becomes
> > >> more
> > >> > > and more popular and it not only serves 20+ dataframe backends
> with
> > >> the
> > >> > > same, portable API, but also allows to mix SQL with dataframes
> and few
> > >> > more
> > >> > > things. Some time ago there were some ideas that we could add
> > >> > "SQLAlchemy"
> > >> > > as an additional "common" interface in "common.sql" - but
> actually it
> > >> > seems
> > >> > > that Ibis provides a much better abstraction that unifies SQL and
> > >> > Dataframe
> > >> > > approach nicely - way better suited for the "data science" world
> of
> > >> > > Airflow.
> > >> > >
> > >> > > You can see very nice overview "why Ibis" here:
> > >> > > https://ibis-project.org/why
> > >> > > - and I think it would be pretty natural thing to add on top of
> > >> > > "common.sql" and "common.io" - following "Airflow As a Platform"
> > >> mantra.
> > >> > >
> > >> > > WDYT?
> > >> > >
> > >> > > J.
> > >> > >
> > >> >
> > >>
> > >
> >

Reply via email to