To touch on the question about supported features -- is it possible to advertise arbitrary/custom "capabilites" in GetSqlInfo? Say that you want to represent some set of behaviors that FlightSQL services can support.
Stuff like "Supports grouping by multiple distinct aggregates", "Supports self-joins on aliased tables" etc This is going to be unique to each implementation, but I couldn't determine whether there was a way to express arbitrary capabilities Also, in case it's helpful I put together an ASCII diagram of what I'm trying to do with FlightSQL If anyone has a moment, would appreciate input on whether it's feasible/a good idea https://pastebin.com/raw/VF2r0F3f Thank you =) On Fri, Mar 4, 2022 at 2:37 PM David Li <lidav...@apache.org> wrote: > We could also add say CommandSubstraitQuery as a distinct message, and > older servers would just reject it as an unknown request type. > > -David > > On Fri, Mar 4, 2022, at 17:01, Micah Kornfield wrote: > >> > >> 1. How does a server report that it supports each command type? Initial > >> thought is a property in GetSqlInfo. > > > > > > This sounds reasonable. > > > > > >> What happens to client code written prior to changing the command type > >> to be a oneOf field? Same for servers. > > > > > > It is transparent from older clients (I'm 99% sure the wire protocol > > doesn't change). Servers is a little harder. The one saving grace is I > > don't think an empty/not-present SQL string would be something most > servers > > could handle, so they would probably error with something that while > > not-obvious would give a clue to the clients (but hopefully this would > be a > > non-issue because the capabilities would be checked for clients wishing > to > > to use this feature first). > > > > -Micah > > > > On Fri, Mar 4, 2022 at 1:50 PM James Duong <jam...@bitquilltech.com > .invalid> > > wrote: > > > >> It sounds like an interesting and useful project to use Subtstrait as an > >> alternative to SQL strings. > >> > >> Important aspects to spec out are: > >> 1. How does a server report that it supports each command type? Initial > >> thought is a property in GetSqlInfo. > >> 2. What happens to client code written prior to changing the command > type > >> to be a oneOf field? Same for servers. > >> More generally, how should backward compatibility work, and what should > >> happen if a client sends an unsupported > >> command type to a server. > >> 3. Should inputs to catalog RPC calls also accept Substrait structures? > >> > >> On Thu, Mar 3, 2022 at 11:00 PM Gavin Ray <ray.gavi...@gmail.com> > wrote: > >> > >> > @James Duong <jam...@bitquilltech.com> > >> > > >> > You are absolutely right, I realized this and confirmed whether this > >> > would be possible with Jacques to double-check. > >> > It would amount to what I might call "dollar-store Substrait." It's > not > >> > elegant or a good solution, but definitely presents a good duct-tape > hack > >> > and is a crafty idea. > >> > > >> > I agree with Jacques -- when you think about FlightSQL, what you are > >> > attempting with a query isn't necessarily SQL, but a general > data-compute > >> > operation. > >> > SQL just so happens to be a fairly universal way to express them, > with an > >> > ANSI standard, but FlightSQL doesn't recognize any particular subset > of > >> it > >> > and for all intents and purposes it doesn't matter what the operation > >> > string contains. > >> > > >> > Substrait would make a fantastic logical next-feature because it's > >> > targeted as a specification for expressing relational algebra and > >> > data-compute operations > >> > This more-or-less equates to SQL strings (in my mind at least) with a > >> much > >> > better toolkit and Dev UX. If there is anything I can do to help move > >> this > >> > forward, please let me know because I am extremely motivated to do so. > >> > > >> > @David Li <git...@lidavidm.me> > >> > > >> > Also agreed. Substrait is put together by folks much smarter than > myself, > >> > and if I had to hedge my bets, I'd put money on it being the future of > >> > data-compute interop. > >> > I would love nothing more than to adopt this technology and push it > >> along. > >> > > >> > Your project does sound interesting - basically, it sounds like a > tabular > >> >> data storage service with query pushdown? > >> >> > >> > > >> > Yeah this is more or less the details of it (my personal email, with > >> > discretion assumed, is always open) > >> > > >> > Imagine an environment where a backend wants to advertise some kind of > >> > schema/data catalog > >> > > >> > And then a central service introspects these backends, and dynamically > >> > generates an API from the data catalogues/schemas, where requests get > >> > proxied to the underlying backend service for each schema to actually > be > >> > executed > >> > > >> > In text, the flow would look something like: > >> > > >> > > >> > <----> Data Provider Backend 0 > >> > Client <-----> Central Service <---> Generated API <----> > Data-Provider > >> > Backend 1 > >> > > >> > <----> Data Provider Backend 2 > >> > > >> > > >> > > >> > On Thu, Mar 3, 2022 at 5:52 PM David Li <lidav...@apache.org> wrote: > >> > > >> >> Gavin, thanks for sharing. I'm not so sure you'll find an > alternative to > >> >> Substrait, at least one that isn't even more nascent or one that's > very > >> >> tied to a particular language, so perhaps it might be better to get > >> >> involved in Substrait and see if it suits your needs? Convincing a > team > >> to > >> >> try something new can be hard, though, and it is somewhat of a moving > >> >> target - but Flight SQL is in a similar spot, I think, as it's still > >> >> getting enhancements. > >> >> > >> >> Your project does sound interesting - basically, it sounds like a > >> tabular > >> >> data storage service with query pushdown? > >> >> > >> >> On Thu, Mar 3, 2022, at 19:58, Jacques Nadeau wrote: > >> >> > James, I agree that you could use JSON but that feels a bit hacky > >> >> > (mis-use > >> >> > of the paradigm). Instead, I'd really like to do something like > David > >> is > >> >> > suggesting: support Substrait as an alternative to a SQL string. > >> >> > Something like this: > >> >> > > >> >> > >> > https://github.com/jacques-n/arrow/commit/e22674fa882e77c2889cf95f69f6e3701db362bc > >> >> > > >> >> > It would be great if someone wanted to pick this up. It would be a > >> nice > >> >> > enhancement to FlightSQL (and provide a structured way to express > >> >> > operations). > >> >> > > >> >> > > >> >> > > >> >> > On Thu, Mar 3, 2022 at 4:56 PM James Duong < > jam...@bitquilltech.com > >> >> .invalid> > >> >> > wrote: > >> >> > > >> >> >> In the same way that you could write an ODBC driver that takes in > >> text > >> >> >> that's not SQL, you could write a Flight SQL server that takes in > >> text > >> >> >> that's JSON. > >> >> >> Flight SQL doesn't parse the query, so you could create commands > that > >> >> are > >> >> >> just JSON text. > >> >> >> > >> >> >> Is that the only bit you need, Gavin? > >> >> >> > >> >> >> On Thu, Mar 3, 2022 at 4:26 PM Gavin Ray <ray.gavi...@gmail.com> > >> >> wrote: > >> >> >> > >> >> >> > I am enthusiastic about Substrait and have followed it's > progress > >> >> eagerly > >> >> >> > =D > >> >> >> > > >> >> >> > When I presented it as a tentative option, there were > reservations > >> >> >> because > >> >> >> > of the project/spec being young and the functionality still > being > >> >> >> > fleshed out. > >> >> >> > I think if I were having this conversation in say, 8-16 months, > it > >> >> would > >> >> >> > have been an easy choice, no doubt. > >> >> >> > > >> >> >> > On a public mailing list (and I can share more details in > private > >> if > >> >> >> you're > >> >> >> > curious), the gist of it is this: > >> >> >> > > >> >> >> > Some well-defined/backed-by-mature tech solution for expressing > >> data > >> >> >> > compute operations between services would be a useful thing to > have > >> >> >> > (Especially if it's language-agnostic) > >> >> >> > > >> >> >> > The goal is for an "implementing service" to have: > >> >> >> > - An introspectable schema (IE, "describe yourself to me") > >> >> >> > - A query/operation execution endpoint (IE: "perform this > operation > >> >> on > >> >> >> your > >> >> >> > data") > >> >> >> > > >> >> >> > With FlightSQL this is possible I believe, but it requires the > >> >> operation > >> >> >> to > >> >> >> > be expressed as a SQL string which isn't ideal. > >> >> >> > > >> >> >> > Working with some programmatic, structured object that has the > same > >> >> >> > semantics ("Logical Plan", or whatnot) as a SQL query would > have, > >> >> would > >> >> >> be > >> >> >> > a better experience > >> >> >> > (Jacques is on to something here!) > >> >> >> > > >> >> >> > This interface between services would be somewhat the > equivalent of > >> >> an > >> >> >> > "SDK", so it would be nice to have a strongly-typed library for > >> >> >> expressing > >> >> >> > and building-up query/data-compute ops. > >> >> >> > > >> >> >> > > >> >> >> > On Thu, Mar 3, 2022 at 3:17 PM David Li <lidav...@apache.org> > >> wrote: > >> >> >> > > >> >> >> > > You probably want Substrait: https://substrait.io/ > >> >> >> > > > >> >> >> > > Which is being worked on by several people, including Arrow > >> >> community > >> >> >> > > members. > >> >> >> > > > >> >> >> > > It might be interesting to generalize Flight SQL to include > >> >> support for > >> >> >> > > Substrait. I'm curious what your application, if you're able > to > >> >> share > >> >> >> > more. > >> >> >> > > > >> >> >> > > -David > >> >> >> > > > >> >> >> > > On Thu, Mar 3, 2022, at 18:05, Gavin Ray wrote: > >> >> >> > > > Hiya, > >> >> >> > > > > >> >> >> > > > I am drafting a proposal for a way to enable services to > >> express > >> >> data > >> >> >> > > > compute operations to each other. > >> >> >> > > > > >> >> >> > > > However I think it'll be difficult to get buy-in if the only > >> >> >> > > representation > >> >> >> > > > for queries is as SQL strings. > >> >> >> > > > > >> >> >> > > > Is there any kind of lower-level API that can be used to > >> express > >> >> >> > > operations? > >> >> >> > > > > >> >> >> > > > IE instead of "SELECT name FROM user" > >> >> >> > > > > >> >> >> > > > A structured representation like: > >> >> >> > > > { > >> >> >> > > > "op": "query", > >> >> >> > > > "schema": "user", > >> >> >> > > > "project": ["name"] > >> >> >> > > > } > >> >> >> > > > > >> >> >> > > > Or maybe this is a bad idea/doesn't make sense? > >> >> >> > > > > >> >> >> > > > Thank you =) > >> >> >> > > > >> >> >> > > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> > >> >> >> *James Duong* > >> >> >> Lead Software Developer > >> >> >> Bit Quill Technologies Inc. > >> >> >> Direct: +1.604.562.6082 | jam...@bitquilltech.com > >> >> >> https://www.bitquilltech.com > >> >> >> > >> >> >> This email message is for the sole use of the intended > recipient(s) > >> >> and may > >> >> >> contain confidential and privileged information. Any unauthorized > >> >> review, > >> >> >> use, disclosure, or distribution is prohibited. If you are not > the > >> >> >> intended recipient, please contact the sender by reply email and > >> >> destroy > >> >> >> all copies of the original message. Thank you. > >> >> >> > >> >> > >> > > >> > >> -- > >> > >> *James Duong* > >> Lead Software Developer > >> Bit Quill Technologies Inc. > >> Direct: +1.604.562.6082 | jam...@bitquilltech.com > >> https://www.bitquilltech.com > >> > >> This email message is for the sole use of the intended recipient(s) and > may > >> contain confidential and privileged information. Any unauthorized > review, > >> use, disclosure, or distribution is prohibited. If you are not the > >> intended recipient, please contact the sender by reply email and destroy > >> all copies of the original message. Thank you. > >> >