I have assigned permission to you. 

Best,
Jark

> 2023年2月10日 17:26,Feng Jin <jinfeng1...@gmail.com> 写道:
> 
> I am very happy to do it, please help me to add editing permission, my
> jira id is  hackergin
> 
> Thanks
> 
> Best,
> Feng
> 
> On Fri, Feb 10, 2023 at 4:02 PM Jark Wu <imj...@gmail.com> wrote:
>> 
>> Thank you Feng,
>> 
>> Feel free to start a FLIP proposal if you are interested. Looking forward to 
>> it!
>> 
>> Best,
>> Jark
>> 
>>> 2023年2月10日 15:44,Feng Jin <jinfeng1...@gmail.com> 写道:
>>> 
>>> @Shengkai
>>>> About the catalog jar hot updates
>>> 
>>> Currently we do not have a similar requirement, but if the catalog
>>> management interface is opened, this can indeed realize the hot
>>> loading of the catalog jar
>>> 
>>> 
>>>> do we need to instantiate the Catalog immediately or defer to the usage
>>> 
>>> I think this can be the same as before .
>>> 
>>> 
>>> 
>>> @Jark
>>>> There only can be a single catalog manager in TableEnvironment.
>>> 
>>> big +1 for this.  This can avoid conflicts and also meet the catalog
>>> persistence requirements.
>>> 
>>> 
>>> Best,
>>> Feng
>>> 
>>> On Fri, Feb 10, 2023 at 3:09 PM Jark Wu <imj...@gmail.com> wrote:
>>>> 
>>>> Hi Feng,
>>>> 
>>>> It's still easy to conflict and be inconsistent even if we have only one
>>>> CatalogProvider, because CatalogProvider only provides readable interfaces
>>>> (listCatalogs, getCatalog). For example, you may register a catalog X, but
>>>> can't list it because it's not in the external metadata service.
>>>> 
>>>> To avoid catalog conflicts and keep consistent, we can extract the catalog
>>>> management logic as a pluggable interface, including listCatalog,
>>>> getCatalog, registerCatalog, unregisterCatalog, etc. The
>>>> current CatalogManager is a default in-memory implementation, you can
>>>> replace it with user-defined managers, such as
>>>> - file-based: which manages catalog information on local files, just like
>>>> how Presto/Trino manages catalogs
>>>> - metaservice-based: which manages catalog information on external
>>>> metadata service.
>>>> 
>>>> There only can be a single catalog manager in TableEnvironment. This
>>>> guarantees data consistency and avoids conflicts. This approach can address
>>>> another pain point of Flink SQL: the catalog information is not persisted.
>>>> 
>>>> Can this approach satisfy your requirements?
>>>> 
>>>> Best,
>>>> Jark
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Fri, 10 Feb 2023 at 11:21, Shengkai Fang <fskm...@gmail.com> wrote:
>>>> 
>>>>> Hi Feng.
>>>>> 
>>>>> I think your idea is very interesting!
>>>>> 
>>>>> 1. I just wonder after initializing the Catalog, will the Session reuse 
>>>>> the
>>>>> same Catalog instance or build a new one for later usage? If we reuse the
>>>>> same Catalog, I think it's more like lazy initialization. I am a
>>>>> little prone to rebuild a new one because it's easier for us to catalog 
>>>>> jar
>>>>> hot updates.
>>>>> 
>>>>> 2. Users use the `CREATE CATALOG` statement in the CatalogManager. In this
>>>>> case, do we need to instantiate the Catalog immediately or defer to the
>>>>> usage?
>>>>> 
>>>>> Best,
>>>>> Shengkai
>>>>> 
>>>>> Feng Jin <jinfeng1...@gmail.com> 于2023年2月9日周四 20:13写道:
>>>>> 
>>>>>> Thanks for your reply.
>>>>>> 
>>>>>> @Timo
>>>>>> 
>>>>>>> 2) avoid  the default in-memory catalog and offer their catalog before
>>>>>> a  TableEnvironment session starts
>>>>>>> 3) whether this can be disabled and SHOW CATALOGS  can be used for
>>>>>> listing first without having a default catalog.
>>>>>> 
>>>>>> 
>>>>>> Regarding 2 and 3, I think this problem can be solved by introducing
>>>>>> catalog providers, and users can control some default catalog
>>>>>> behavior.
>>>>>> 
>>>>>> 
>>>>>>> We could also use the org.apache.flink.table.factories.Factory infra
>>>>>> and  allow catalog providers via pure string properties
>>>>>> 
>>>>>> I think this is also very useful. In our usage scenarios, it is
>>>>>> usually multi-cluster management, and it is also necessary to pass
>>>>>> different configurations through parameters.
>>>>>> 
>>>>>> 
>>>>>> @Jark @Huang
>>>>>> 
>>>>>>> About the lazy catalog initialization
>>>>>> 
>>>>>> Our needs may be different. If these properties already exist in an
>>>>>> external system, especially when there may be thousands of these
>>>>>> catalog properties, I don’t think it is necessary to register all
>>>>>> these properties in the Flink env at startup, but we need is that we
>>>>>> can register a catalog  when it needs and we can get the properties
>>>>>> from the external meta system .
>>>>>> 
>>>>>> 
>>>>>>> It may be hard to avoid conflicts  and duplicates between
>>>>>> CatalogProvider and CatalogManager
>>>>>> 
>>>>>> It is indeed easy to conflict. My idea is that if we separate the
>>>>>> catalog management of the current CatalogManager as the default
>>>>>> CatalogProvider behavior, at the same time, only one CatalogProvider
>>>>>> exists in a Flink Env.  This may avoid catalog conflicts.
>>>>>> 
>>>>>> 
>>>>>> Best,
>>>>>> Feng
>>>>>> 
>>>>>> On Tue, Feb 7, 2023 at 1:01 PM Hang Ruan <ruanhang1...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi Feng,
>>>>>>> I agree with what Jark said. I think what you are looking for is lazy
>>>>>>> initialization.
>>>>>>> 
>>>>>>> I don't think we should introduce the new interface CatalogProvider for
>>>>>>> lazy initialization. What we should do is to store the catalog
>>>>> properties
>>>>>>> and initialize the catalog when we need it. Could you please introduce
>>>>>> some
>>>>>>> other scenarios that we need the CatalogProvider besides the lazy
>>>>>>> initialization?
>>>>>>> 
>>>>>>> If we really need the CatalogProvider, I think it is better to be a
>>>>>> single
>>>>>>> instance. Multiple instances are difficult to manage and there are name
>>>>>>> conflicts among providers.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Hang
>>>>>>> 
>>>>>>> Jark Wu <imj...@gmail.com> 于2023年2月7日周二 10:48写道:
>>>>>>> 
>>>>>>>> Hi Feng,
>>>>>>>> 
>>>>>>>> I think this feature makes a lot of sense. If I understand correctly,
>>>>>> what
>>>>>>>> you are looking for is lazy catalog initialization.
>>>>>>>> 
>>>>>>>> However, I have some concerns about introducing CatalogProvider,
>>>>> which
>>>>>>>> delegates catalog management to users. It may be hard to avoid
>>>>>> conflicts
>>>>>>>> and duplicates between CatalogProvider and CatalogManager. Is it
>>>>>> possible
>>>>>>>> to have a built-in CatalogProvider to instantiate catalogs lazily?
>>>>>>>> 
>>>>>>>> An idea in my mind is to introduce another catalog registration API
>>>>>>>> without instantiating the catalog, e.g., registerCatalog(String
>>>>>>>> catalogName, Map<String, String> catalogProperties). The catalog
>>>>>>>> information is stored in CatalogManager as pure strings. The catalog
>>>>> is
>>>>>>>> instantiated and initialized when used.
>>>>>>>> 
>>>>>>>> This new API is very similar to other pure-string metadata
>>>>>> registration,
>>>>>>>> such as "createTable(String path, TableDescriptor descriptor)" and
>>>>>>>> "createFunction(String path, String className, List<ResourceUri>
>>>>>>>> resourceUris)".
>>>>>>>> 
>>>>>>>> Can this approach satisfy your requirement?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Jark
>>>>>>>> 
>>>>>>>> On Mon, 6 Feb 2023 at 22:53, Timo Walther <twal...@apache.org>
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Feng,
>>>>>>>>> 
>>>>>>>>> this is indeed a good proposal.
>>>>>>>>> 
>>>>>>>>> 1) It makes sense to improve the catalog listing for platform
>>>>>> providers.
>>>>>>>>> 
>>>>>>>>> 2) Other feedback from the past has shown that users would like to
>>>>>> avoid
>>>>>>>>> the default in-memory catalog and offer their catalog before a
>>>>>>>>> TableEnvironment session starts.
>>>>>>>>> 
>>>>>>>>> 3) Also we might reconsider whether a default catalog and default
>>>>>>>>> database make sense. Or whether this can be disabled and SHOW
>>>>>> CATALOGS
>>>>>>>>> can be used for listing first without having a default catalog.
>>>>>>>>> 
>>>>>>>>> What do you think about option 2 and 3?
>>>>>>>>> 
>>>>>>>>> In any case, I would propose we pass a CatalogProvider to
>>>>>>>>> EnvironmentSettings and only allow a single instance. Catalogs
>>>>> should
>>>>>>>>> never shadow other catalogs.
>>>>>>>>> 
>>>>>>>>> We could also use the org.apache.flink.table.factories.Factory
>>>>> infra
>>>>>> and
>>>>>>>>> allow catalog providers via pure string properties. Not sure if we
>>>>>> need
>>>>>>>>> this in the first version though.
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> Timo
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 06.02.23 11:21, Feng Jin wrote:
>>>>>>>>>> Hi everyone,
>>>>>>>>>> 
>>>>>>>>>> The original discussion address is
>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-30126
>>>>>>>>>> 
>>>>>>>>>> Currently, Flink has access to many systems, including kafka,
>>>>> hive,
>>>>>>>>>> iceberg, hudi, elasticsearch, mysql...  The corresponding catalog
>>>>>> name
>>>>>>>>>> might be:
>>>>>>>>>> kafka_cluster1, kafka_cluster2, hive_cluster1, hive_cluster2,
>>>>>>>>>> iceberg_cluster2, elasticsearch_cluster1,  mysql_database1_xxx,
>>>>>>>>>> mysql_database2_xxxx
>>>>>>>>>> 
>>>>>>>>>> As the platform of the Flink SQL job, we need to maintain the
>>>>> meta
>>>>>>>>>> information of each system of the company, and when the Flink job
>>>>>>>>>> starts, we need to register the catalog with the Flink table
>>>>>>>>>> environment, so that users can use any table through the
>>>>>>>>>> env.executeSql interface.
>>>>>>>>>> 
>>>>>>>>>> When we only have a small number of catalogs, we can register
>>>>> like
>>>>>>>>>> this, but when there are thousands of catalogs, I think that
>>>>> there
>>>>>>>>>> needs to be a dynamic loading mechanism that we can register
>>>>>> catalog
>>>>>>>>>> when needed, speed up the initialization of the table
>>>>> environment,
>>>>>> and
>>>>>>>>>> avoid the useless catalog registration process.
>>>>>>>>>> 
>>>>>>>>>> Preliminary thoughts:
>>>>>>>>>> 
>>>>>>>>>> A new CatalogProvider interface can be added:
>>>>>>>>>> It contains two interfaces:
>>>>>>>>>> * listCatalogs() interface, which can list all the interfaces
>>>>> that
>>>>>> the
>>>>>>>>>> interface can provide
>>>>>>>>>> * getCatalog() interface,  which can get a catalog instance by
>>>>>> catalog
>>>>>>>>> name.
>>>>>>>>>> 
>>>>>>>>>> ```java
>>>>>>>>>> public interface CatalogProvider {
>>>>>>>>>> 
>>>>>>>>>>    default void initialize(ClassLoader classLoader,
>>>>>> ReadableConfig
>>>>>>>>> config) {}
>>>>>>>>>> 
>>>>>>>>>>    Optional<Catalog> getCatalog(String catalogName);
>>>>>>>>>> 
>>>>>>>>>>    Set<String> listCatalogs();
>>>>>>>>>> }
>>>>>>>>>> ```
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> The corresponding implementation in CatalogManager is as follows:
>>>>>>>>>> 
>>>>>>>>>> ```java
>>>>>>>>>> public CatalogManager {
>>>>>>>>>>    private @Nullable CatalogProvider catalogProvider;
>>>>>>>>>> 
>>>>>>>>>>    private Map<String, Catalog> catalogs;
>>>>>>>>>> 
>>>>>>>>>>    public void setCatalogProvider(CatalogProvider
>>>>>> catalogProvider) {
>>>>>>>>>>        this.catalogProvider = catalogProvider;
>>>>>>>>>>    }
>>>>>>>>>> 
>>>>>>>>>>    public Optional<Catalog> getCatalog(String catalogName) {
>>>>>>>>>>        // If there is no corresponding catalog in catalogs,
>>>>>>>>>>        // get catalog by catalogProvider
>>>>>>>>>>        if (catalogProvider != null) {
>>>>>>>>>>            Optional<Catalog> catalog =
>>>>>>>>> catalogProvider.getCatalog(catalogName);
>>>>>>>>>>        }
>>>>>>>>>>    }
>>>>>>>>>> 
>>>>>>>>>> }
>>>>>>>>>> ```
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Possible problems:
>>>>>>>>>> 
>>>>>>>>>> 1. Catalog name conflict, how to choose when the registered
>>>>> catalog
>>>>>>>>>> and the catalog provided by catalog-provider conflict?
>>>>>>>>>> I prefer tableEnv-registered ones over catalogs provided by the
>>>>>>>>>> catalog-provider. If the user wishes to reference the catalog
>>>>>> provided
>>>>>>>>>> by the catalog-provider, they can unregister the catalog in
>>>>>> tableEnv
>>>>>>>>>> through the `unregisterCatalog` interface.
>>>>>>>>>> 
>>>>>>>>>> 2. Number of CatalogProviders, is it possible to have multiple
>>>>>>>>>> catalogProvider implementations?
>>>>>>>>>> I don't have a good idea of this at the moment. If multiple
>>>>>>>>>> catalogProviders are supported, it brings much more convenience,
>>>>>> But
>>>>>>>>>> there may be catalog name conflicts between different
>>>>>>>>>> catalogProviders.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Looking forward to your reply, any feedback is appreciated!
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Best.
>>>>>>>>>> 
>>>>>>>>>> Feng Jin
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>> 

Reply via email to