Re: How to handle the fact that different databases have different ideas of what a "db"/"schema" are?

Gavin Ray Mon, 31 Jan 2022 17:25:15 -0800

Jacques, thank you. I appreciate the weigh-in with your experience, and that
link is brilliant.


This is exactly what I am trying to do. I tried to write something like
this but
it felt shaky, I wasn't confident in it.

It took a JDBC "DataSource" and then used metadata to iterate
catalogs/schemas
and add them to itself as a child "JdbcSchema", returning them in
"getSubschemaMap()". Creating a wrapping default/root catalog if necessary.

The code here is a much better version of what I was trying to do it seems.
I am
going to unabashedly steal the overall implementation/design patterns here
=)

On Mon, Jan 31, 2022 at 8:06 PM Jacques Nadeau <[email protected]> wrote:

> If I recall correctly, my experience is you have to code per underlying
> database. I believe there are only a few patterns but I don't think there
> is sufficient odbc/jdbc info to answer behavior reliably (or at least in a
> way that feels correct/native to each database). For example, I believe
> some databases require catalog selection at the connection level and so
> while a catalog concept exists, you have to use different connections for
> each catalog whereas other databases expose catalog within a connection.
>
> This is all quite old thinking. I remember writing some simpler logic for
> this here years ago at [1]. Note that code is long sense changed but wanted
> to call it.
>
> [1]
>
> https://github.com/apache/drill/blob/18a1ae4d31cd502d2f792b331fefeb0ed2106c53/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStoragePlugin.java#L301
>
>
> On Thu, Jan 27, 2022 at 7:38 PM Gavin Ray <[email protected]> wrote:
>
> > The filesystem hierarchy is a great analogy, I understand it much better
> > now I think -- thank you.
> >
> > This seems like a problem with potentially very brittle solutions. Using
> > your explanation I was able to get it to work,
> > with very terrible logic that says "If catalogs are all null, then it's
> > one-level so make a fake "root" schema to hold it in the rootSchema"
> >
> > But this doesn't feel so good. Not 100% sure what the best thing to do
> here
> > is, but at least you've cleared up what's going on.
> >
> > On Thu, Jan 27, 2022 at 6:41 PM Julian Hyde <[email protected]>
> > wrote:
> >
> > > Let’s not worry about the names, and say that some DBs have two
> namespace
> > > levels and others have just one.
> > >
> > > Calcite’s word for a namespace is ’schema’. Calcite schemas are
> arranged
> > > in a hierarchy, like a filesystem, so there is no preferred depth. Any
> > > schema can contain both tables and (sub)schemas. So you can easily
> built
> > a
> > > one- or two-level namespace structure, or whatever you want.
> > >
> > > Calcite’s catalog has a single ‘root schema’ (analogous to the root
> > > directory, ‘/‘ in file systems), and you can get to anything else from
> > > there.
> > >
> > > In JDBC parlance, a a level 1 namespace is called ‘catalog’, and a
> level
> > 2
> > > namespace is a ’schema’. If a DB has a one-level namespace then catalog
> > > will be null, or the empty string, or something.
> > >
> > > If you’re running an Avatica JDBC server backed by a particular Calcite
> > > root schema, and you want your database to look like a one-level or
> > > two-level database, we probably don’t make it particularly easy.
> > >
> > > Julian
> > >
> > >
> > > > On Jan 27, 2022, at 7:25 AM, Gavin Ray <[email protected]>
> wrote:
> > > >
> > > > My RDBMS experience is nearly exclusively Postgres
> > > > While working on this project, I've made the assumption that the
> > > structure
> > > > of a database is:
> > > >
> > > > Database -> Schema -> Table
> > > >
> > > > It turns out that this isn't accurate. In MySQL for instance,
> "Schema"
> > is
> > > > an alias for "DB".
> > > > From the below StackOverflow answer, it seems like this is all over
> the
> > > > place:
> > > >
> > > > https://stackoverflow.com/a/7944489/13485494
> > > >
> > > > I have a "CalciteSchemaManager" object which has a "rootSchema" to
> > which
> > > > all datasources are attached
> > > > This "rootSchema" is used to generate the GraphQL API and types
> > > >
> > > > It seems like I have two options, and I'm not sure which is a better
> > > design:
> > > >
> > > > 1. Force all datasources to conform to (Database -> Schema -> Table)
> > > >
> > > > This means that adding a new MySQL database, would generate
> ("mysql_db"
> > > ->
> > > > "root" (fake schema) -> "some_table")
> > > > Adding a CSV schema too, would be something like ("csv_datasource" ->
> > > > "root" -> "some_csv_file")
> > > >
> > > > 2. Have an irregular data shape. Datasources can be of arbitrary
> > > sub-schema
> > > > depth.
> > > >
> > > > Example Postgres: ("pg_db_1" -> "public" -> "user")
> > > > Example MySQL:   ("mysql_db_1" -> "user")
> > > > Example CSV: ("some_csv_file") or maybe ("csv_source_1" ->
> > > "some_csv_file")
> > > >
> > > > What do you folks think I ought to do?
> > > > Thank you =)
> > >
> > >
> >
>

Re: How to handle the fact that different databases have different ideas of what a "db"/"schema" are?

Reply via email to