David, thanks for the review. sorry, that line's horribly ambiguous. Spark preconfigures sql generation for most popular dialects here <https://github.com/apache/spark/tree/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc> and there's a way for a user to provide a "plugin" dialect as well. The goal with the PR is to avoid all that and enable drivers themselves to be self-describing.
Having said that, JDBC also does some capabilities reporting (probably not nearly enough, though). From a practical perspective, convincing all major jdbc driver vendors to agree on that kind of metadata inclusion is at this point next to impossible. Since adbc drivers are new, relatively fewer and concentrated in fewer hands, it seems realistic to me that we might succeed in making it a norm. On Thu, Jun 4, 2026 at 10:16 AM David Li <[email protected]> wrote: > This seems reasonable to me. > > One thing that I'm curious about: > > >> The immediate motivation is enabling the ADBC data source in Spark ( > >> apache/spark#54603 <https://github.com/apache/spark/issues/54603>) > >> without hardcoded per-dialect configuration in Spark code, the way the > JDBC > >> source does today. > > What are the corresponding definitions in JDBC? (Or should I read this as > "the JDBC source currently has to hardcode per-dialect configuration [and > we would like to avoid that for ADBC if possible]"?) > > On Thu, Jun 4, 2026, at 15:09, Tornike Gurgenidze wrote: > > Hi, a gentle reminder that the PR's still waiting for a review. > > > > Thanks, > > Torniker > > > > On Sat, May 16, 2026 at 7:32 AM Tornike Gurgenidze < > [email protected]> > > wrote: > > > >> Hi all, > >> > >> I'd like to propose adding four new SqlInfo codes to FlightSql.proto to > >> fill gaps in dialect metadata that clients need when compiling SQL > >> per-backend: > >> > >> - SQL_SUPPORTED_LIMIT_OFFSET (577) — row-limit / offset grammar > >> (LIMIT/OFFSET, OFFSET…FETCH, TOP) > >> - SQL_SUPPORTED_NULLS_ORDERING (578) — explicit NULLS FIRST / NULLS LAST > >> support in ORDER BY (distinct from the existing SQL_NULL_ORDERING (507), > >> which reports the server's *default* null ordering) > >> - SQL_SUPPORTED_BOOLEAN_LITERAL (579) — accepted boolean literal forms > >> (TRUE/FALSE, 1/0) > >> - SQL_SUPPORTED_DATETIME_LITERAL (580) — accepted date/time/timestamp > >> literal forms (ANSI DATE '…' keyword vs. bare quoted string) > >> > >> The goal here is intentionally narrow to give clients just enough > dialect > >> metadata to emit correct SQL for common pushdown operations (predicate > >> pushdown, projection pushdown, LIMIT/OFFSET, ORDER BY). It is explicitly > >> not an attempt to describe enough of each dialect to support > >> general-purpose SQL generation, Substrait is probably the right > long-term > >> answer for engines that need to push arbitrary plans across backends. > These > >> codes are a pragmatic solution for the much smaller surface area that > >> pushdown requires. > >> > >> All four are int32 bitmasks (not scalar enums), following the existing > >> SQL_SUPPORTED_GROUP_BY / SupportedSqlGrammar convention — dialects > >> frequently accept multiple forms (e.g. PostgreSQL supports both > >> LIMIT/OFFSET and OFFSET/FETCH; MySQL accepts both TRUE/FALSE and 1/0). > The > >> accompanying enums are intentionally minimal — just enough for current > use > >> cases. > >> > >> The immediate motivation is enabling the ADBC data source in Spark ( > >> apache/spark#54603 <https://github.com/apache/spark/issues/54603>) > >> without hardcoded per-dialect configuration in Spark code, the way the > JDBC > >> source does today. Since ADBC reuses Flight SQL's SqlInfo codes, the > change > >> applies to both. > >> > >> - Issue: https://github.com/apache/arrow/issues/49792 > >> - PR: https://github.com/apache/arrow/pull/49796 > >> > >> Thanks, > >> Tornike > >> >
