Wait, what happens if a datasource's spec allows dots as valid identifiers?
On Thu, Sep 22, 2022 at 2:22 PM Gavin Ray <ray.gavi...@gmail.com> wrote: > Ah okay, yeah that's a reasonable angle too haha > > > On Thu, Sep 22, 2022 at 1:59 PM David Li <lidav...@apache.org> wrote: > >> Frankly it was from a "not drastically refactoring things" perspective :) >> >> At least for Arrow: list[utf8] is effectively a utf8 array with an extra >> array of offsets, so there's relatively little overhead. (In particular, >> there's not an extra allocation per array; there's just an overall >> allocation of a bitmap/offsets buffer.) >> >> On Thu, Sep 22, 2022, at 13:46, Gavin Ray wrote: >> > I suppose you're thinking from a memory/performance perspective right? >> > Allocating a dot character is a lot better than allocating multiple >> arrays >> > >> > Yeah I don't see why not -- this could even be a library internal where >> the >> > fact that it's dotted is an implementation detail >> > Then in the Java implementation or whatnot, you can call >> > ".getFullyQualifiedTableName()" which will do the allocating parse to a >> > List<String> for you, or whatnot >> > >> > The array was mostly for convenience's sake (our API is JSON and not >> > particularly performance-oriented) >> > >> > On Thu, Sep 22, 2022 at 1:40 PM David Li <lidav...@apache.org> wrote: >> > >> >> Ah, interesting… >> >> >> >> A self-recursive schema wouldn't work in Arrow's schema system, so it'd >> >> have to be the latter solution. Or, would it work to have a dotted >> name in >> >> the schema name column? Would parsing that back out (for applications >> that >> >> want to work with the full hierarchy) be too much trouble? >> >> >> >> On Thu, Sep 22, 2022, at 13:14, Gavin Ray wrote: >> >> > Antoine, I can't comment on the Go code (not qualified) but to me, >> the >> >> > "verification" test >> >> > examples look like a mixture between JDBC and Java FlightSQL driver >> >> usage, >> >> > and seem solid. >> >> > >> >> > There was one reservation I had about the ability to handle >> datasource >> >> > namespacing that I brought up early on in the proposal discussions >> >> > (David responded to it but I got busy and forgot to reply again) >> >> > >> >> > If you have a datasource which provides possibly arbitrary levels of >> >> schema >> >> > namespace (something like Apache Calcite, for example) >> >> > How do you represent the table/schema names? >> >> > >> >> > Suppose I have a service with a DB layout like this: >> >> > >> >> > / foo >> >> > / bar >> >> > / baz >> >> > /qux >> >> > / table1 >> >> > - column1 >> >> > >> >> > At my dayjob, we have a technology which is very similar to >> >> > ADBC/FlightSQL >> >> > (would be great to adopt Substrait + ADBC once they're mature enough) >> >> > - >> >> > >> >> >> https://github.com/hasura/graphql-engine/blob/master/dc-agents/README.md#data-connectors >> >> > - >> >> > >> >> >> https://techcrunch.com/2022/06/28/hasura-now-lets-developers-turn-any-data-source-into-a-graphql-api/ >> >> > >> >> > We wound up having to redesign the specification to handle >> datasources >> >> that >> >> > don't fit the "database-schema-table" or "database-table" mould >> >> > >> >> > In the ADBC schema for schema metadata, it looks like it expects a >> >> > single >> >> > "schema" struct: >> >> > >> >> >> https://github.com/apache/arrow-adbc/blob/7866a566f5b7b635267bfb7a87ea49b01dfe89fa/java/core/src/main/java/org/apache/arrow/adbc/core/StandardSchemas.java#L132-L152 >> >> > >> >> > If you want to be flexible, IMO it would be good to either: >> >> > >> >> > 1. Have DB_SCHEMA_SCHEMA be self-recursive, so that schemas (with or >> >> > without tables) can be nested arbitrarily deep underneath each other >> >> > - Fully-Qualified-Table-Name (FQTN) can then be computed by >> walking >> >> > up from a table and concating the schema name until the root schema >> is >> >> > reached >> >> > >> >> > 2. Make "catalog" and "schema" go away entirely, and tables just >> have a >> >> > FQTN that is an array, a database is a collection of tables >> >> > - You can compute what would have been the catalog + schema >> >> hierarchy >> >> > by doing a .reduce() over the list of tables and >> >> > >> >> > Or maybe there is another, better way. But that's my $0.02 and the >> only >> >> > real concern about the API I have, without actually trying to build >> >> > something with it. >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > On Thu, Sep 22, 2022 at 5:40 AM Antoine Pitrou <anto...@python.org> >> >> wrote: >> >> > >> >> >> >> >> >> Hello, >> >> >> >> >> >> I would urge people to review the proposed ADBC APIs, especially >> the Go >> >> >> and Java APIs which probably benefitted from less feedback than the >> C >> >> one. >> >> >> >> >> >> Regards >> >> >> >> >> >> Antoine. >> >> >> >> >> >> >> >> >> Le 21/09/2022 à 17:40, David Li a écrit : >> >> >> > Hello, >> >> >> > >> >> >> > We have been discussing [1] standard interfaces for Arrow-based >> >> database >> >> >> access and have been working on implementations of the proposed >> >> interfaces >> >> >> [2], all under the name "ADBC". This proposal aims to provide a >> unified >> >> >> client abstraction across Arrow-native database protocols (like >> Flight >> >> SQL) >> >> >> and non-Arrow database protocols, which can then be used by Arrow >> >> projects >> >> >> like Dataset/Acero and ecosystem projects like Ibis. >> >> >> > >> >> >> > For details, see the RFC here: >> >> >> https://github.com/apache/arrow/pull/14079 >> >> >> > >> >> >> > I would like to propose that the Arrow project adopt this RFC, >> along >> >> >> with apache/arrow-adbc commit 7866a56 [3], as version 1.0.0 of the >> ADBC >> >> API >> >> >> standard. >> >> >> > >> >> >> > Please vote to adopt the specification as described above. (This >> is >> >> not >> >> >> a vote to release any components.) >> >> >> > >> >> >> > This vote will be open for at least 72 hours. >> >> >> > >> >> >> > [ ] +1 Adopt the ADBC specification >> >> >> > [ ] 0 >> >> >> > [ ] -1 Do not adopt the specification because... >> >> >> > >> >> >> > Thanks to the DuckDB and R DBI projects for providing feedback on >> and >> >> >> implementations of the proposal. >> >> >> > >> >> >> > [1]: >> https://lists.apache.org/thread/cq7t9s5p7dw4vschylhwsfgqwkr5fmf2 >> >> >> > [2]: https://github.com/apache/arrow-adbc >> >> >> > [3]: >> >> >> >> >> >> https://github.com/apache/arrow-adbc/commit/7866a566f5b7b635267bfb7a87ea49b01dfe89fa >> >> >> > >> >> >> > Thank you, >> >> >> > David >> >> >> >> >> >> >