There are cases where people need datafusion but not a SQL parser. For
example, people building a composable query engine for graph or other data
modality may not choose SQL as the DSL. Decoupling them seems to be a good
idea.

On Tue, Feb 27, 2024, 6:20 AM Mehmet Ozan Kabak <o...@synnada.ai> wrote:

> In this case, maybe we can bring sqlparser-rs into the ASF umbrella
> following the arrow-datafusion model?
>
> Once DataFusion becomes a top-level project, we could move it to
> datafusion-sqlparser-rs — it would be a quasi-independent project just like
> how DataFusion is today w.r.t. Arrow. But it would get most benefits of
> having a community behind it.
>
> > On Feb 27, 2024, at 2:11 AM, Andrew Lamb <al...@influxdata.com> wrote:
> >
> > Julian, thank you for your insight. I very much agree with it.
> >
> >> I think the ASF is wrong on this. I think it needs to provide a home
> >> for medium-sized projects such as sqlparser-rs in an existing
> >> top-level project;
> >
> > It could be said that DataFusion fits this model  -- it isn't really an
> > "Arrow" project but needed a place to live and grow, and the Arrow ASF
> > community provided that.
> >
> > Andrew
> >
> >
> >
> >
> > On Mon, Feb 26, 2024 at 1:09 PM Julian Hyde <jh...@apache.org> wrote:
> >
> >> I am torn on this.
> >>
> >> One one hand, I am a big fan of components that are standalone - have
> >> no more dependencies than necessary, and are self-evidently
> >> standalone. So, I think that re-absorbing sqlparser-rs back into
> >> DataFusion would not be a good step. It would reduce the perception
> >> that it is standalone.
> >>
> >> On the other hand, it sounds as if sqlparser-rs would benefit by
> >> having an Apache-like community around it. DataFusion isn't a perfect
> >> fit - there is not much overlap between DataFusion and sqlparser-rs
> >> users - but it takes a lot of effort to create and run a top-level
> >> project, and DataFusion is already up and running.
> >>
> >> The tension is that people want to consume components that they
> >> perceive to be standalone, and yet the ASF wants to create communities
> >> that produce either a single large component or sets of highly-coupled
> >> components. The ASF used to do 'umbrella projects' whose sub-projects
> >> were in the same subject area but had little or no dependencies. For
> >> example, Apache DB [ https://db.apache.org/ ] has JDO, Derby and
> >> Torque. And commons included many useful Java libraries. Umbrella
> >> projects caused problems during the Jakarta and Hadoop eras, and now
> >> are strongly discouraged at the ASF.
> >>
> >> I think the ASF is wrong on this. I think it needs to provide a home
> >> for medium-sized projects such as sqlparser-rs in an existing
> >> top-level project; maybe those projects grow into top-level projects,
> >> or maybe they remain medium-sized projects. This is especially
> >> necessary in the Rust community, where there are many exciting
> >> projects, but they are almost all happening outside ASF. (This is
> >> exactly where Java was in ~2005. Maybe we need a rust-commons or
> >> rust-db?)
> >>
> >> My conclusion is to leave sqlparser-rs where it is for now, but to
> >> continue talking about what might be an attractive home for it in ASF.
> >>
> >> Julian
> >>
> >> On Mon, Feb 26, 2024 at 8:12 AM Andrew Lamb <al...@influxdata.com>
> wrote:
> >>>
> >>> Sorry for the late reply,
> >>>
> >>> I think sqlparser-rs users are quite a bit more varied than DataFusion
> >> and
> >>> there is not a large overlap between the contributors of the two
> >> projects.
> >>> I currently seem to be the one reviewing / merging most sqlparser-rs
> >>> reviews, and I would definitely love some more help.
> >>>
> >>> However, given that the project is not an Apache project, I did not
> have
> >>> good luck attracting help.  A related discussion is here [1].
> >>>
> >>> If the DataFusion community would like to accelerate releases, we can
> >> also
> >>> try to do that without bringing it into Apache governance.
> Specifically,
> >> it
> >>> would be great to have help reviewing the PRs -- the actual release
> >> process
> >>> is pretty low overhead. The reviews are what take the vast majority of
> >> the
> >>> maintenance time.
> >>>
> >>> Andrew
> >>>
> >>> [1]: https://github.com/sqlparser-rs/sqlparser-rs/issues/818
> >>>
> >>>
> >>>
> >>> On Sat, Feb 17, 2024 at 4:44 PM Aldrin <octalene....@pm.me.invalid>
> >> wrote:
> >>>
> >>>> do users of sqlparser-rs mostly use datafusion? I don't know the
> >>>> community, but it seems like it would be an annoying change for users
> >> who
> >>>> use it with a different query engine. Just a thought
> >>>>
> >>>> Sent from Proton Mail <https://proton.me/mail/home> for iOS
> >>>>
> >>>>
> >>>> On Sat, Feb 17, 2024 at 10:26, Andy Grove <andygrov...@gmail.com
> >>>> <On+Sat,+Feb+17,+2024+at+10:26,+Andy+Grove+%3C%3Ca+href=>> wrote:
> >>>>
> >>>> I agree that it simplifies shipping new SQL features in DataFusion
> >> since we
> >>>> can develop the changes in the parser concurrently with the changes in
> >>>> other DataFusion crates and then release them all together.
> >>>>
> >>>> The name of the crate would not need to change, so downstream users
> >> should
> >>>> see no impact.
> >>>>
> >>>> We would need to decide if we want to keep a separate version number
> or
> >>>> bring it in line with DataFusion version numbers (I have no preference
> >>>> either way).
> >>>>
> >>>>
> >>>>
> >>>> On Sat, Feb 17, 2024 at 11:09 AM Mehmet Ozan Kabak <o...@synnada.ai>
> >>>> wrote:
> >>>>
> >>>>> Doing this will probably reduce the time-to-ship for DataFusion
> >> features
> >>>>> that need parsing support due to increased convenience, so I’m
> >> inclined
> >>>> to
> >>>>> see it in a positive light.
> >>>>>
> >>>>> What would be the impact of doing this on people who use only
> >>>>> sqlparser-rs, if any?
> >>>>>
> >>>>>> On Feb 17, 2024, at 7:16 PM, Andy Grove <andygrov...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>> The sqlparser-rs project [1] seems to have become the de-facto SQL
> >>>> parser
> >>>>>> for Rust, with almost 4 million downloads so far. This was
> >> originally
> >>>>> part
> >>>>>> of DataFusion very early on, and I moved it into a separate project
> >>>>> because
> >>>>>> it seemed useful for other projects. This was before DataFusion was
> >>>> known
> >>>>>> as a composable query engine, and with hindsight, I probably should
> >>>> have
> >>>>>> left it as part of the DataFusion project.
> >>>>>>
> >>>>>> Now that DataFusion has a reputation as a composable query engine,
> >> I
> >>>>> think
> >>>>>> it would make sense to move this code back into DataFusion, where
> >> it
> >>>>> would
> >>>>>> benefit from a larger community of maintainers.
> >>>>>>
> >>>>>> I would like to hear thoughts from the Apache Arrow / DataFusion
> >>>>> community.
> >>>>>> Does this seem like a good idea?
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Andy.
> >>>>>>
> >>>>>> [1] https://github.com/sqlparser-rs/sqlparser-rs
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>
>
>

Reply via email to