Hi, folks!
I'd like to call a discussion on SIP-95:
https://github.com/apache/superset/issues/22862
The proposal calls for a "catalog" selector in SQL Lab, where in this
context a "catalog" is "a collection of schemas". If I remember
correctly this is called:
- A "catalog" in Presto/Trino;
- A "database" in Postgres;
- A "project" in BigQuery.
I'd like to increase the scope of SIP-95 by introducing catalogs not
only in SQL Lab, but throughout the whole application. For example, when
adding a dataset the user would be able to choose a database, then a
catalog, then a schema, and finally a table.
Last year, while working on security issues related to schemas, I
started adding the foundational support for catalogs in Superset. DB
engine specs already have these attributes:
- supports_catalog
- supports_dynamic_catalog (can the catalog be changed on a per-query
basis?)
- get_catalog_names()
Additionally, many of the methods in the DB engine specs already have
`catalog` in their signatures, together with `schema`.
Note that one of the biggest challenge in supporting catalogs is that
the SQLAlchemy URI needs to be modified depending on the selected
catalog. This will have to be done via `adjust_engine_params` in each DB
engine spec that we want to support, but the base implementation is
there already.
The remaining work includes:
1. Refactoring the data permissions to include catalogs, similar to how
it works today for databases and schemas.
2. Introducing UI inputs for selecting catalogs when creating a dataset
or in SQL Lab.
For (2), the work needed overlaps with SIP-111 ("Proposal for improved
database, schema, and table selection UI in SQL Lab sidebar",
https://github.com/apache/superset/issues/26395), which hasn't been
officially discussed yet.
Thanks,
--Beto