Re: [VOTE] Adopt ADBC database client connectivity specification

David Li Thu, 22 Sep 2022 11:38:05 -0700

I suppose the separator would have to be known to the client somehow (perhaps 
as metadata) - you'd have the same problem in the opposite direction if the 
result were a list right? You wouldn't be able to concatenate the parts 
together without knowing a safe separator to use.


On Thu, Sep 22, 2022, at 14:23, Gavin Ray wrote:
> Wait, what happens if a datasource's spec allows dots as valid identifiers?
>
> On Thu, Sep 22, 2022 at 2:22 PM Gavin Ray <ray.gavi...@gmail.com> wrote:
>
>> Ah okay, yeah that's a reasonable angle too haha
>>
>>
>> On Thu, Sep 22, 2022 at 1:59 PM David Li <lidav...@apache.org> wrote:
>>
>>> Frankly it was from a "not drastically refactoring things" perspective :)
>>>
>>> At least for Arrow: list[utf8] is effectively a utf8 array with an extra
>>> array of offsets, so there's relatively little overhead. (In particular,
>>> there's not an extra allocation per array; there's just an overall
>>> allocation of a bitmap/offsets buffer.)
>>>
>>> On Thu, Sep 22, 2022, at 13:46, Gavin Ray wrote:
>>> > I suppose you're thinking from a memory/performance perspective right?
>>> > Allocating a dot character is a lot better than allocating multiple
>>> arrays
>>> >
>>> > Yeah I don't see why not -- this could even be a library internal where
>>> the
>>> > fact that it's dotted is an implementation detail
>>> > Then in the Java implementation or whatnot, you can call
>>> > ".getFullyQualifiedTableName()" which will do the allocating parse to a
>>> > List<String> for you, or whatnot
>>> >
>>> > The array was mostly for convenience's sake (our API is JSON and not
>>> > particularly performance-oriented)
>>> >
>>> > On Thu, Sep 22, 2022 at 1:40 PM David Li <lidav...@apache.org> wrote:
>>> >
>>> >> Ah, interesting…
>>> >>
>>> >> A self-recursive schema wouldn't work in Arrow's schema system, so it'd
>>> >> have to be the latter solution. Or, would it work to have a dotted
>>> name in
>>> >> the schema name column? Would parsing that back out (for applications
>>> that
>>> >> want to work with the full hierarchy) be too much trouble?
>>> >>
>>> >> On Thu, Sep 22, 2022, at 13:14, Gavin Ray wrote:
>>> >> > Antoine, I can't comment on the Go code (not qualified) but to me,
>>> the
>>> >> > "verification" test
>>> >> > examples look like a mixture between JDBC and Java FlightSQL driver
>>> >> usage,
>>> >> > and seem solid.
>>> >> >
>>> >> > There was one reservation I had about the ability to handle
>>> datasource
>>> >> > namespacing that I brought up early on in the proposal discussions
>>> >> > (David responded to it but I got busy and forgot to reply again)
>>> >> >
>>> >> > If you have a datasource which provides possibly arbitrary levels of
>>> >> schema
>>> >> > namespace (something like Apache Calcite, for example)
>>> >> > How do you represent the table/schema names?
>>> >> >
>>> >> > Suppose I have a service with a DB layout like this:
>>> >> >
>>> >> > / foo
>>> >> >     / bar
>>> >> >         / baz
>>> >> >             /qux
>>> >> >               / table1
>>> >> >                 - column1
>>> >> >
>>> >> > At my dayjob, we have a technology which is very similar to
>>> >> > ADBC/FlightSQL
>>> >> > (would be great to adopt Substrait + ADBC once they're mature enough)
>>> >> > -
>>> >> >
>>> >>
>>> https://github.com/hasura/graphql-engine/blob/master/dc-agents/README.md#data-connectors
>>> >> > -
>>> >> >
>>> >>
>>> https://techcrunch.com/2022/06/28/hasura-now-lets-developers-turn-any-data-source-into-a-graphql-api/
>>> >> >
>>> >> > We wound up having to redesign the specification to handle
>>> datasources
>>> >> that
>>> >> > don't fit the "database-schema-table" or "database-table" mould
>>> >> >
>>> >> > In the ADBC schema for schema metadata, it looks like it expects a
>>> >> > single
>>> >> > "schema" struct:
>>> >> >
>>> >>
>>> https://github.com/apache/arrow-adbc/blob/7866a566f5b7b635267bfb7a87ea49b01dfe89fa/java/core/src/main/java/org/apache/arrow/adbc/core/StandardSchemas.java#L132-L152
>>> >> >
>>> >> > If you want to be flexible, IMO it would be good to either:
>>> >> >
>>> >> > 1. Have DB_SCHEMA_SCHEMA be self-recursive, so that schemas (with or
>>> >> > without tables) can be nested arbitrarily deep underneath each other
>>> >> >       - Fully-Qualified-Table-Name (FQTN) can then be computed by
>>> walking
>>> >> > up from a table and concating the schema name until the root schema
>>> is
>>> >> > reached
>>> >> >
>>> >> > 2. Make "catalog" and "schema" go away entirely, and tables just
>>> have a
>>> >> > FQTN that is an array, a database is a collection of tables
>>> >> >      - You can compute what would have been the catalog + schema
>>> >> hierarchy
>>> >> > by doing a .reduce() over the list of tables and
>>> >> >
>>> >> > Or maybe there is another, better way. But that's my $0.02 and the
>>> only
>>> >> > real concern about the API I have, without actually trying to build
>>> >> > something with it.
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Thu, Sep 22, 2022 at 5:40 AM Antoine Pitrou <anto...@python.org>
>>> >> wrote:
>>> >> >
>>> >> >>
>>> >> >> Hello,
>>> >> >>
>>> >> >> I would urge people to review the proposed ADBC APIs, especially
>>> the Go
>>> >> >> and Java APIs which probably benefitted from less feedback than the
>>> C
>>> >> one.
>>> >> >>
>>> >> >> Regards
>>> >> >>
>>> >> >> Antoine.
>>> >> >>
>>> >> >>
>>> >> >> Le 21/09/2022 à 17:40, David Li a écrit :
>>> >> >> > Hello,
>>> >> >> >
>>> >> >> > We have been discussing [1] standard interfaces for Arrow-based
>>> >> database
>>> >> >> access and have been working on implementations of the proposed
>>> >> interfaces
>>> >> >> [2], all under the name "ADBC". This proposal aims to provide a
>>> unified
>>> >> >> client abstraction across Arrow-native database protocols (like
>>> Flight
>>> >> SQL)
>>> >> >> and non-Arrow database protocols, which can then be used by Arrow
>>> >> projects
>>> >> >> like Dataset/Acero and ecosystem projects like Ibis.
>>> >> >> >
>>> >> >> > For details, see the RFC here:
>>> >> >> https://github.com/apache/arrow/pull/14079
>>> >> >> >
>>> >> >> > I would like to propose that the Arrow project adopt this RFC,
>>> along
>>> >> >> with apache/arrow-adbc commit 7866a56 [3], as version 1.0.0 of the
>>> ADBC
>>> >> API
>>> >> >> standard.
>>> >> >> >
>>> >> >> > Please vote to adopt the specification as described above. (This
>>> is
>>> >> not
>>> >> >> a vote to release any components.)
>>> >> >> >
>>> >> >> > This vote will be open for at least 72 hours.
>>> >> >> >
>>> >> >> > [ ] +1 Adopt the ADBC specification
>>> >> >> > [ ]  0
>>> >> >> > [ ] -1 Do not adopt the specification because...
>>> >> >> >
>>> >> >> > Thanks to the DuckDB and R DBI projects for providing feedback on
>>> and
>>> >> >> implementations of the proposal.
>>> >> >> >
>>> >> >> > [1]:
>>> https://lists.apache.org/thread/cq7t9s5p7dw4vschylhwsfgqwkr5fmf2
>>> >> >> > [2]: https://github.com/apache/arrow-adbc
>>> >> >> > [3]:
>>> >> >>
>>> >>
>>> https://github.com/apache/arrow-adbc/commit/7866a566f5b7b635267bfb7a87ea49b01dfe89fa
>>> >> >> >
>>> >> >> > Thank you,
>>> >> >> > David
>>> >> >>
>>> >>
>>>
>>

Re: [VOTE] Adopt ADBC database client connectivity specification

Reply via email to