Re: [VOTE] Adopt ADBC database client connectivity specification

Kun Liu Thu, 22 Sep 2022 18:03:40 -0700

+1
(non-binding)

Gavin Ray <[email protected]> 于2022年9月23日周五 03:40写道：


> Ah yeah that's true, good point
>
>
>
> On Thu, Sep 22, 2022 at 2:38 PM David Li <[email protected]> wrote:
>
> > I suppose the separator would have to be known to the client somehow
> > (perhaps as metadata) - you'd have the same problem in the opposite
> > direction if the result were a list right? You wouldn't be able to
> > concatenate the parts together without knowing a safe separator to use.
> >
> > On Thu, Sep 22, 2022, at 14:23, Gavin Ray wrote:
> > > Wait, what happens if a datasource's spec allows dots as valid
> > identifiers?
> > >
> > > On Thu, Sep 22, 2022 at 2:22 PM Gavin Ray <[email protected]>
> wrote:
> > >
> > >> Ah okay, yeah that's a reasonable angle too haha
> > >>
> > >>
> > >> On Thu, Sep 22, 2022 at 1:59 PM David Li <[email protected]> wrote:
> > >>
> > >>> Frankly it was from a "not drastically refactoring things"
> perspective
> > :)
> > >>>
> > >>> At least for Arrow: list[utf8] is effectively a utf8 array with an
> > extra
> > >>> array of offsets, so there's relatively little overhead. (In
> > particular,
> > >>> there's not an extra allocation per array; there's just an overall
> > >>> allocation of a bitmap/offsets buffer.)
> > >>>
> > >>> On Thu, Sep 22, 2022, at 13:46, Gavin Ray wrote:
> > >>> > I suppose you're thinking from a memory/performance perspective
> > right?
> > >>> > Allocating a dot character is a lot better than allocating multiple
> > >>> arrays
> > >>> >
> > >>> > Yeah I don't see why not -- this could even be a library internal
> > where
> > >>> the
> > >>> > fact that it's dotted is an implementation detail
> > >>> > Then in the Java implementation or whatnot, you can call
> > >>> > ".getFullyQualifiedTableName()" which will do the allocating parse
> > to a
> > >>> > List<String> for you, or whatnot
> > >>> >
> > >>> > The array was mostly for convenience's sake (our API is JSON and
> not
> > >>> > particularly performance-oriented)
> > >>> >
> > >>> > On Thu, Sep 22, 2022 at 1:40 PM David Li <[email protected]>
> > wrote:
> > >>> >
> > >>> >> Ah, interesting…
> > >>> >>
> > >>> >> A self-recursive schema wouldn't work in Arrow's schema system, so
> > it'd
> > >>> >> have to be the latter solution. Or, would it work to have a dotted
> > >>> name in
> > >>> >> the schema name column? Would parsing that back out (for
> > applications
> > >>> that
> > >>> >> want to work with the full hierarchy) be too much trouble?
> > >>> >>
> > >>> >> On Thu, Sep 22, 2022, at 13:14, Gavin Ray wrote:
> > >>> >> > Antoine, I can't comment on the Go code (not qualified) but to
> me,
> > >>> the
> > >>> >> > "verification" test
> > >>> >> > examples look like a mixture between JDBC and Java FlightSQL
> > driver
> > >>> >> usage,
> > >>> >> > and seem solid.
> > >>> >> >
> > >>> >> > There was one reservation I had about the ability to handle
> > >>> datasource
> > >>> >> > namespacing that I brought up early on in the proposal
> discussions
> > >>> >> > (David responded to it but I got busy and forgot to reply again)
> > >>> >> >
> > >>> >> > If you have a datasource which provides possibly arbitrary
> levels
> > of
> > >>> >> schema
> > >>> >> > namespace (something like Apache Calcite, for example)
> > >>> >> > How do you represent the table/schema names?
> > >>> >> >
> > >>> >> > Suppose I have a service with a DB layout like this:
> > >>> >> >
> > >>> >> > / foo
> > >>> >> >     / bar
> > >>> >> >         / baz
> > >>> >> >             /qux
> > >>> >> >               / table1
> > >>> >> >                 - column1
> > >>> >> >
> > >>> >> > At my dayjob, we have a technology which is very similar to
> > >>> >> > ADBC/FlightSQL
> > >>> >> > (would be great to adopt Substrait + ADBC once they're mature
> > enough)
> > >>> >> > -
> > >>> >> >
> > >>> >>
> > >>>
> >
> https://github.com/hasura/graphql-engine/blob/master/dc-agents/README.md#data-connectors
> > >>> >> > -
> > >>> >> >
> > >>> >>
> > >>>
> >
> https://techcrunch.com/2022/06/28/hasura-now-lets-developers-turn-any-data-source-into-a-graphql-api/
> > >>> >> >
> > >>> >> > We wound up having to redesign the specification to handle
> > >>> datasources
> > >>> >> that
> > >>> >> > don't fit the "database-schema-table" or "database-table" mould
> > >>> >> >
> > >>> >> > In the ADBC schema for schema metadata, it looks like it
> expects a
> > >>> >> > single
> > >>> >> > "schema" struct:
> > >>> >> >
> > >>> >>
> > >>>
> >
> https://github.com/apache/arrow-adbc/blob/7866a566f5b7b635267bfb7a87ea49b01dfe89fa/java/core/src/main/java/org/apache/arrow/adbc/core/StandardSchemas.java#L132-L152
> > >>> >> >
> > >>> >> > If you want to be flexible, IMO it would be good to either:
> > >>> >> >
> > >>> >> > 1. Have DB_SCHEMA_SCHEMA be self-recursive, so that schemas
> (with
> > or
> > >>> >> > without tables) can be nested arbitrarily deep underneath each
> > other
> > >>> >> >       - Fully-Qualified-Table-Name (FQTN) can then be computed
> by
> > >>> walking
> > >>> >> > up from a table and concating the schema name until the root
> > schema
> > >>> is
> > >>> >> > reached
> > >>> >> >
> > >>> >> > 2. Make "catalog" and "schema" go away entirely, and tables just
> > >>> have a
> > >>> >> > FQTN that is an array, a database is a collection of tables
> > >>> >> >      - You can compute what would have been the catalog + schema
> > >>> >> hierarchy
> > >>> >> > by doing a .reduce() over the list of tables and
> > >>> >> >
> > >>> >> > Or maybe there is another, better way. But that's my $0.02 and
> the
> > >>> only
> > >>> >> > real concern about the API I have, without actually trying to
> > build
> > >>> >> > something with it.
> > >>> >> >
> > >>> >> >
> > >>> >> >
> > >>> >> >
> > >>> >> >
> > >>> >> > On Thu, Sep 22, 2022 at 5:40 AM Antoine Pitrou <
> > [email protected]>
> > >>> >> wrote:
> > >>> >> >
> > >>> >> >>
> > >>> >> >> Hello,
> > >>> >> >>
> > >>> >> >> I would urge people to review the proposed ADBC APIs,
> especially
> > >>> the Go
> > >>> >> >> and Java APIs which probably benefitted from less feedback than
> > the
> > >>> C
> > >>> >> one.
> > >>> >> >>
> > >>> >> >> Regards
> > >>> >> >>
> > >>> >> >> Antoine.
> > >>> >> >>
> > >>> >> >>
> > >>> >> >> Le 21/09/2022 à 17:40, David Li a écrit :
> > >>> >> >> > Hello,
> > >>> >> >> >
> > >>> >> >> > We have been discussing [1] standard interfaces for
> Arrow-based
> > >>> >> database
> > >>> >> >> access and have been working on implementations of the proposed
> > >>> >> interfaces
> > >>> >> >> [2], all under the name "ADBC". This proposal aims to provide a
> > >>> unified
> > >>> >> >> client abstraction across Arrow-native database protocols (like
> > >>> Flight
> > >>> >> SQL)
> > >>> >> >> and non-Arrow database protocols, which can then be used by
> Arrow
> > >>> >> projects
> > >>> >> >> like Dataset/Acero and ecosystem projects like Ibis.
> > >>> >> >> >
> > >>> >> >> > For details, see the RFC here:
> > >>> >> >> https://github.com/apache/arrow/pull/14079
> > >>> >> >> >
> > >>> >> >> > I would like to propose that the Arrow project adopt this
> RFC,
> > >>> along
> > >>> >> >> with apache/arrow-adbc commit 7866a56 [3], as version 1.0.0 of
> > the
> > >>> ADBC
> > >>> >> API
> > >>> >> >> standard.
> > >>> >> >> >
> > >>> >> >> > Please vote to adopt the specification as described above.
> > (This
> > >>> is
> > >>> >> not
> > >>> >> >> a vote to release any components.)
> > >>> >> >> >
> > >>> >> >> > This vote will be open for at least 72 hours.
> > >>> >> >> >
> > >>> >> >> > [ ] +1 Adopt the ADBC specification
> > >>> >> >> > [ ]  0
> > >>> >> >> > [ ] -1 Do not adopt the specification because...
> > >>> >> >> >
> > >>> >> >> > Thanks to the DuckDB and R DBI projects for providing
> feedback
> > on
> > >>> and
> > >>> >> >> implementations of the proposal.
> > >>> >> >> >
> > >>> >> >> > [1]:
> > >>> https://lists.apache.org/thread/cq7t9s5p7dw4vschylhwsfgqwkr5fmf2
> > >>> >> >> > [2]: https://github.com/apache/arrow-adbc
> > >>> >> >> > [3]:
> > >>> >> >>
> > >>> >>
> > >>>
> >
> https://github.com/apache/arrow-adbc/commit/7866a566f5b7b635267bfb7a87ea49b01dfe89fa
> > >>> >> >> >
> > >>> >> >> > Thank you,
> > >>> >> >> > David
> > >>> >> >>
> > >>> >>
> > >>>
> > >>
> >
>

Re: [VOTE] Adopt ADBC database client connectivity specification

Reply via email to