Hi Feng, I think this feature makes a lot of sense. If I understand correctly, what you are looking for is lazy catalog initialization.
However, I have some concerns about introducing CatalogProvider, which delegates catalog management to users. It may be hard to avoid conflicts and duplicates between CatalogProvider and CatalogManager. Is it possible to have a built-in CatalogProvider to instantiate catalogs lazily? An idea in my mind is to introduce another catalog registration API without instantiating the catalog, e.g., registerCatalog(String catalogName, Map<String, String> catalogProperties). The catalog information is stored in CatalogManager as pure strings. The catalog is instantiated and initialized when used. This new API is very similar to other pure-string metadata registration, such as "createTable(String path, TableDescriptor descriptor)" and "createFunction(String path, String className, List<ResourceUri> resourceUris)". Can this approach satisfy your requirement? Best, Jark On Mon, 6 Feb 2023 at 22:53, Timo Walther <twal...@apache.org> wrote: > Hi Feng, > > this is indeed a good proposal. > > 1) It makes sense to improve the catalog listing for platform providers. > > 2) Other feedback from the past has shown that users would like to avoid > the default in-memory catalog and offer their catalog before a > TableEnvironment session starts. > > 3) Also we might reconsider whether a default catalog and default > database make sense. Or whether this can be disabled and SHOW CATALOGS > can be used for listing first without having a default catalog. > > What do you think about option 2 and 3? > > In any case, I would propose we pass a CatalogProvider to > EnvironmentSettings and only allow a single instance. Catalogs should > never shadow other catalogs. > > We could also use the org.apache.flink.table.factories.Factory infra and > allow catalog providers via pure string properties. Not sure if we need > this in the first version though. > > Cheers, > Timo > > > On 06.02.23 11:21, Feng Jin wrote: > > Hi everyone, > > > > The original discussion address is > > https://issues.apache.org/jira/browse/FLINK-30126 > > > > Currently, Flink has access to many systems, including kafka, hive, > > iceberg, hudi, elasticsearch, mysql... The corresponding catalog name > > might be: > > kafka_cluster1, kafka_cluster2, hive_cluster1, hive_cluster2, > > iceberg_cluster2, elasticsearch_cluster1, mysql_database1_xxx, > > mysql_database2_xxxx > > > > As the platform of the Flink SQL job, we need to maintain the meta > > information of each system of the company, and when the Flink job > > starts, we need to register the catalog with the Flink table > > environment, so that users can use any table through the > > env.executeSql interface. > > > > When we only have a small number of catalogs, we can register like > > this, but when there are thousands of catalogs, I think that there > > needs to be a dynamic loading mechanism that we can register catalog > > when needed, speed up the initialization of the table environment, and > > avoid the useless catalog registration process. > > > > Preliminary thoughts: > > > > A new CatalogProvider interface can be added: > > It contains two interfaces: > > * listCatalogs() interface, which can list all the interfaces that the > > interface can provide > > * getCatalog() interface, which can get a catalog instance by catalog > name. > > > > ```java > > public interface CatalogProvider { > > > > default void initialize(ClassLoader classLoader, ReadableConfig > config) {} > > > > Optional<Catalog> getCatalog(String catalogName); > > > > Set<String> listCatalogs(); > > } > > ``` > > > > > > The corresponding implementation in CatalogManager is as follows: > > > > ```java > > public CatalogManager { > > private @Nullable CatalogProvider catalogProvider; > > > > private Map<String, Catalog> catalogs; > > > > public void setCatalogProvider(CatalogProvider catalogProvider) { > > this.catalogProvider = catalogProvider; > > } > > > > public Optional<Catalog> getCatalog(String catalogName) { > > // If there is no corresponding catalog in catalogs, > > // get catalog by catalogProvider > > if (catalogProvider != null) { > > Optional<Catalog> catalog = > catalogProvider.getCatalog(catalogName); > > } > > } > > > > } > > ``` > > > > > > > > Possible problems: > > > > 1. Catalog name conflict, how to choose when the registered catalog > > and the catalog provided by catalog-provider conflict? > > I prefer tableEnv-registered ones over catalogs provided by the > > catalog-provider. If the user wishes to reference the catalog provided > > by the catalog-provider, they can unregister the catalog in tableEnv > > through the `unregisterCatalog` interface. > > > > 2. Number of CatalogProviders, is it possible to have multiple > > catalogProvider implementations? > > I don't have a good idea of this at the moment. If multiple > > catalogProviders are supported, it brings much more convenience, But > > there may be catalog name conflicts between different > > catalogProviders. > > > > > > > > Looking forward to your reply, any feedback is appreciated! > > > > > > Best. > > > > Feng Jin > > > >