There are cases where people need datafusion but not a SQL parser. For example, people building a composable query engine for graph or other data modality may not choose SQL as the DSL. Decoupling them seems to be a good idea.
On Tue, Feb 27, 2024, 6:20 AM Mehmet Ozan Kabak <o...@synnada.ai> wrote: > In this case, maybe we can bring sqlparser-rs into the ASF umbrella > following the arrow-datafusion model? > > Once DataFusion becomes a top-level project, we could move it to > datafusion-sqlparser-rs — it would be a quasi-independent project just like > how DataFusion is today w.r.t. Arrow. But it would get most benefits of > having a community behind it. > > > On Feb 27, 2024, at 2:11 AM, Andrew Lamb <al...@influxdata.com> wrote: > > > > Julian, thank you for your insight. I very much agree with it. > > > >> I think the ASF is wrong on this. I think it needs to provide a home > >> for medium-sized projects such as sqlparser-rs in an existing > >> top-level project; > > > > It could be said that DataFusion fits this model -- it isn't really an > > "Arrow" project but needed a place to live and grow, and the Arrow ASF > > community provided that. > > > > Andrew > > > > > > > > > > On Mon, Feb 26, 2024 at 1:09 PM Julian Hyde <jh...@apache.org> wrote: > > > >> I am torn on this. > >> > >> One one hand, I am a big fan of components that are standalone - have > >> no more dependencies than necessary, and are self-evidently > >> standalone. So, I think that re-absorbing sqlparser-rs back into > >> DataFusion would not be a good step. It would reduce the perception > >> that it is standalone. > >> > >> On the other hand, it sounds as if sqlparser-rs would benefit by > >> having an Apache-like community around it. DataFusion isn't a perfect > >> fit - there is not much overlap between DataFusion and sqlparser-rs > >> users - but it takes a lot of effort to create and run a top-level > >> project, and DataFusion is already up and running. > >> > >> The tension is that people want to consume components that they > >> perceive to be standalone, and yet the ASF wants to create communities > >> that produce either a single large component or sets of highly-coupled > >> components. The ASF used to do 'umbrella projects' whose sub-projects > >> were in the same subject area but had little or no dependencies. For > >> example, Apache DB [ https://db.apache.org/ ] has JDO, Derby and > >> Torque. And commons included many useful Java libraries. Umbrella > >> projects caused problems during the Jakarta and Hadoop eras, and now > >> are strongly discouraged at the ASF. > >> > >> I think the ASF is wrong on this. I think it needs to provide a home > >> for medium-sized projects such as sqlparser-rs in an existing > >> top-level project; maybe those projects grow into top-level projects, > >> or maybe they remain medium-sized projects. This is especially > >> necessary in the Rust community, where there are many exciting > >> projects, but they are almost all happening outside ASF. (This is > >> exactly where Java was in ~2005. Maybe we need a rust-commons or > >> rust-db?) > >> > >> My conclusion is to leave sqlparser-rs where it is for now, but to > >> continue talking about what might be an attractive home for it in ASF. > >> > >> Julian > >> > >> On Mon, Feb 26, 2024 at 8:12 AM Andrew Lamb <al...@influxdata.com> > wrote: > >>> > >>> Sorry for the late reply, > >>> > >>> I think sqlparser-rs users are quite a bit more varied than DataFusion > >> and > >>> there is not a large overlap between the contributors of the two > >> projects. > >>> I currently seem to be the one reviewing / merging most sqlparser-rs > >>> reviews, and I would definitely love some more help. > >>> > >>> However, given that the project is not an Apache project, I did not > have > >>> good luck attracting help. A related discussion is here [1]. > >>> > >>> If the DataFusion community would like to accelerate releases, we can > >> also > >>> try to do that without bringing it into Apache governance. > Specifically, > >> it > >>> would be great to have help reviewing the PRs -- the actual release > >> process > >>> is pretty low overhead. The reviews are what take the vast majority of > >> the > >>> maintenance time. > >>> > >>> Andrew > >>> > >>> [1]: https://github.com/sqlparser-rs/sqlparser-rs/issues/818 > >>> > >>> > >>> > >>> On Sat, Feb 17, 2024 at 4:44 PM Aldrin <octalene....@pm.me.invalid> > >> wrote: > >>> > >>>> do users of sqlparser-rs mostly use datafusion? I don't know the > >>>> community, but it seems like it would be an annoying change for users > >> who > >>>> use it with a different query engine. Just a thought > >>>> > >>>> Sent from Proton Mail <https://proton.me/mail/home> for iOS > >>>> > >>>> > >>>> On Sat, Feb 17, 2024 at 10:26, Andy Grove <andygrov...@gmail.com > >>>> <On+Sat,+Feb+17,+2024+at+10:26,+Andy+Grove+%3C%3Ca+href=>> wrote: > >>>> > >>>> I agree that it simplifies shipping new SQL features in DataFusion > >> since we > >>>> can develop the changes in the parser concurrently with the changes in > >>>> other DataFusion crates and then release them all together. > >>>> > >>>> The name of the crate would not need to change, so downstream users > >> should > >>>> see no impact. > >>>> > >>>> We would need to decide if we want to keep a separate version number > or > >>>> bring it in line with DataFusion version numbers (I have no preference > >>>> either way). > >>>> > >>>> > >>>> > >>>> On Sat, Feb 17, 2024 at 11:09 AM Mehmet Ozan Kabak <o...@synnada.ai> > >>>> wrote: > >>>> > >>>>> Doing this will probably reduce the time-to-ship for DataFusion > >> features > >>>>> that need parsing support due to increased convenience, so I’m > >> inclined > >>>> to > >>>>> see it in a positive light. > >>>>> > >>>>> What would be the impact of doing this on people who use only > >>>>> sqlparser-rs, if any? > >>>>> > >>>>>> On Feb 17, 2024, at 7:16 PM, Andy Grove <andygrov...@gmail.com> > >> wrote: > >>>>>> > >>>>>> The sqlparser-rs project [1] seems to have become the de-facto SQL > >>>> parser > >>>>>> for Rust, with almost 4 million downloads so far. This was > >> originally > >>>>> part > >>>>>> of DataFusion very early on, and I moved it into a separate project > >>>>> because > >>>>>> it seemed useful for other projects. This was before DataFusion was > >>>> known > >>>>>> as a composable query engine, and with hindsight, I probably should > >>>> have > >>>>>> left it as part of the DataFusion project. > >>>>>> > >>>>>> Now that DataFusion has a reputation as a composable query engine, > >> I > >>>>> think > >>>>>> it would make sense to move this code back into DataFusion, where > >> it > >>>>> would > >>>>>> benefit from a larger community of maintainers. > >>>>>> > >>>>>> I would like to hear thoughts from the Apache Arrow / DataFusion > >>>>> community. > >>>>>> Does this seem like a good idea? > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Andy. > >>>>>> > >>>>>> [1] https://github.com/sqlparser-rs/sqlparser-rs > >>>>> > >>>>> > >>>> > >>>> > >> > >