Hi Feng,

Thanks for raising this FLIP. I am still confused after completely reading
the thread with following questions:

1. Naming confusion - registerCatalog() and addCatalog() have no big
difference based on their names. One of them is responsible for data
persistence. How about persistCatalog()?
2. As you mentioned that Map<String, Catalog> catalogs is used as a cache
and catalogStore is used for data persistence. I would suggest describing
their purpose conceptually and clearly in the FLIP. Some common cache
features should be implemented, i.e. data in the cache and the store should
be consistent. Same Catalog instance should be found in the store and in
the cache(either it has been initialized or it will be lazy initialized)
for the same catalog name. The consistency will be taken care of while
updating the catalog.
3. As the above discussion moves forward, the option 2 solution looks more
like a replacement of option 1, because, afaiu, issues mentioned
previously with option 1 are not solved yet. Do you still want to propose
both options and ask for suggestions for both of them?
4. After you updated the FLIP, there are some inconsistent descriptions in
the content.  Would you like to clean them up? Thanks!

Best regards,
Jing


On Fri, Apr 7, 2023 at 9:24 AM Feng Jin <jinfeng1...@gmail.com> wrote:

> hi Shammon
>
> Thank you for your response, and I completely agree with your point of
> view.
> Initially, I may have over complicated the whole issue. First and foremost,
> we need to consider the persistence of the Catalog's Configuration.
> If we only need to provide persistence for Catalog Configuration, we can
> add a toConfiguration method to the Catalog interface.
> This method can convert a Catalog instance to a Map<String, String>
> properties, and the default implementation will throw an exception.
>
> public interface Catalog {
>    /**
>    * Returns a map containing the properties of the catalog object.
>    *
>    * @return a map containing the properties of the catalog object
>    * @throws UnsupportedOperationException if the implementing class does
> not override
>    * the default implementation of this method
>    */
>   default Map<String, String> toProperties() {
>     throw new UnsupportedOperationException("Please implement toProperties
> for this catalog");
>   }
> }
>
> The specific process is as follows:
>
> 1.  If the user has configured a CatalogStore, the toConfiguration() method
> will be called when registering a Catalog instance with
> registerCatalog(String catalogName, Catalog catalog). The Catalog instance
> will be converted to a Map<String, String> properties and saved to the
> CatalogStore. If some Catalog instances do not implement this method, an
> exception will be thrown.
>
> 2.  If the user does not use a CatalogStore, the toConfiguration() method
> will not be called, ensuring consistency with the original process.
>
> 3. Saving both the Map<String, Catalog> catalogs and the CatalogStore at
> the same time can also avoid conflicts
>
>
> For lazy initialization:
> we can start from the Catalog itself and provide a dedicated Catalog
> implementation for lazy loading, such as ConfigurationCatalog
>
> public class ConfigurationCatalog implements Catalog {
> ConfigurationCatalog(Map<String, String> properties) {
> }
> }
>
> I have added this design to the FLIP.
>
>
> Best,
> Feng
>
> On Thu, Apr 6, 2023 at 10:03 AM Shammon FY <zjur...@gmail.com> wrote:
>
> > Hi Feng
> >
> > Thanks for your answer.
> >
> > I have no questions about the functionality of `CatalogStore`, but the
> > different operations of multiple `registerCatalog` or `storeCatalog` in
> > `CatalogManager`. The implementation in Trino is also the same, the
> > `CatalogManager` in trino only has one `createCatalog`, which will save
> the
> > catalog to memory and `CatalogStore`.
> >
> > I think we don't need to add another `registerCatalog` or `addCatalog` to
> > save a catalog in `Map<String, Catalog> catalogs` or `CatalogStore
> > catalogStore`.  As you mentioned before, the `Map<String, Catalog>
> > catalogs` is a cache in `CatalogManager`. How about saving the catalog in
> > `Map<String, Catalog> catalogs` and `CatalogStore catalogStore` together
> > when it is registered or added in `CatalogManager`?
> >
> > Best,
> > Shammon FY
> >
> >
> > On Tue, Apr 4, 2023 at 5:22 PM Feng Jin <jinfeng1...@gmail.com> wrote:
> >
> > > Thank you for your reply. I am very sorry for the misunderstanding
> caused
> > > by my deviation from the original discussion.
> > >
> > > @Shammon
> > > > I found there is a pre-discussion [1] for this FLIP
> > > Yes, there was indeed such a discussion before.  However, in designing
> > the
> > > whole solution, I found that the logic of CatalogManager itself doesn't
> > > need to change much. *We cannot only persist Catalog instances
> > themselves*,
> > > so exposing only registerCatalog(String catalogName, Catalog catalog)
> > might
> > > not be enough to save Catalogs, because in the end we still need to
> save
> > > the configurations corresponding to the Catalog instances.  Therefore,
> I
> > > decided to introduce the CatalogStore interface for configuration
> > > persistence. Regarding this part, I also took inspiration from Trino's
> > > implementation[1].
> > >
> > >
> > > @yuxia
> > >
> > > > 1: The mechanism of handling catalog with the same name looks a
> little
> > of
> > > complex to me.
> > > Thank you for the suggestion. I will provide a detailed description and
> > > code examples for this part, and add it to the FLIP.
> > >
> > > > 2: TBH, the method name `addCatalog` still looks confused to me.
> IIUC,
> > > this method is for storing catalog to CatalogStore, how about renaming
> it
> > > to `storeCatalog`? It's very personal opinion, you can decide to take
> it
> > or
> > > not by your self.
> > > StoreCatalog looks more intuitive to me. I don't see any problem with
> it.
> > >
> > > > 3.For CREATE CATALOG statement, which method will be called?
> > > `registerCatalog` or `addCatalog`? I'm wondering whether user can add a
> > > catalog to store with SQL stement.
> > > For CREATE CATALOG, my original design was to add it directly to the
> > > CatalogStore, but this would disrupt the existing logic. Therefore, I
> > think
> > > we can do both: save the configuration to the CatalogStore and
> initialize
> > > the Catalog instance at the same time
> > >
> > > > 3. Is it really neccessary to provide a default implmentation for
> > > interface `CatalogStoreFactory`?
> > > I think it is necessary, otherwise we would need to introduce an
> > additional
> > > Map to store the configuration for lazy loading.
> > >
> > > > 4: About asynchronous registration for catalog.
> > > I don't think registerCatalog(String catalogName, Catalog catalog) can
> be
> > > made into an asynchronous interface because Catalog is already an
> > instance.
> > >
> > > > It looks more like lazy initialization for catalog than asynchronous
> > > registration, right?
> > > Indeed, my description was inaccurate. It should be lazy registration
> > > instead of asynchronous registration. I have already updated the title
> of
> > > the FLIP.
> > >
> > >
> > >
> > >
> > > [1].
> > >
> > >
> >
> https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/connector/CatalogStore.java
> > >
> > >
> > > On Tue, Apr 4, 2023 at 4:27 PM Shammon FY <zjur...@gmail.com> wrote:
> > >
> > > > Hi Feng
> > > >
> > > > I think if there is a `registerCatalog` method in `CatalogManager`,
> it
> > > will
> > > > confuse users whether a method named `addCatalog` or `storeCatalog`
> is
> > > > added.
> > > >
> > > > And as you mentioned, the memory catalog is a `cache`, I think the
> > > concept
> > > > of `cache` should not be exposed to users.
> > > >
> > > > I found there is a pre-discussion [1] for this FLIP. Please correct
> me
> > if
> > > > I'm wrong, IIUC, the conclusion of that discussion is to use
> > > > `CatalogManager` as an interface and implement it for different
> stores
> > > such
> > > > as memory, file and external system.
> > > >
> > > > I think there is a gap between the current FLIP design and that
> > > conclusion.
> > > > What about the proposal of the discussion in thread [1] ?
> > > >
> > > >
> > > > [1] https://lists.apache.org/thread/9bnjblgd9wvrl75lkm84oo654c4lqv70
> > > >
> > > >
> > > > Best,
> > > > Shammon FY
> > > >
> > > >
> > > > On Tue, Apr 4, 2023 at 3:41 PM yuxia <luoyu...@alumni.sjtu.edu.cn>
> > > wrote:
> > > >
> > > > > Thanks Feng for driving this FLIP.
> > > > > I have few comments:
> > > > > 1: The mechanism of handling catalog with the same name looks a
> > little
> > > of
> > > > > complex to me. I think it'll be better to explain it in the java
> doc
> > of
> > > > > these methods and give a brief example in this FLIP.
> > > > >
> > > > > 2: TBH, the method name `addCatalog` still looks confused to me.
> > IIUC,
> > > > > this method is for storing catalog to CatalogStore, how about
> > renaming
> > > it
> > > > > to `storeCatalog`? It's very personal opinion, you can decide to
> take
> > > it
> > > > or
> > > > > not by your self.
> > > > >
> > > > > 3: For CREATE CATALOG statement, which method will be called?
> > > > > `registerCatalog` or `addCatalog`? I'm wondering whether user can
> > add a
> > > > > catalog to store with SQL stement.
> > > > >
> > > > >
> > > > > 3: Is it really neccessary to provide a default implmentation for
> > > > > interface `CatalogStoreFactory`?
> > > > >
> > > > >
> > > > > 4: About asynchronous registration for catalog.
> > > > >
> > > > > > When creating a catalog with CREATE CATALOG, the asynchronous
> > > > > registration method is used by default.
> > > > > If asynchronous registration is the default behavior, it there any
> > way
> > > > > that user can switch to synchronous registration just like before?
> > > > > Will both method `addCatalog` and `registerCatalog` be asynchronous
> > > > > registration?
> > > > >
> > > > > IIUC, in asynchronous registration, it may well that CREATE CATALOG
> > > > > executes successfully, but then the following CREATE TABLE
> statement
> > > will
> > > > > fail for the catalog fail to open.
> > > > > I think it's a break change which should be highlighted in this
> FLIP,
> > > may
> > > > > be in compatibility part.
> > > > >
> > > > >
> > > > > BTW, by saying asynchronous registration, I would like to expect
> > there
> > > > > will be an executor to open or register catalog in the background,
> > but
> > > > from
> > > > > your previous comments,
> > > > > "the Catalog instance will be initialized if it has not been
> > > initialized
> > > > > yet. If the initialization process fails, these statements will not
> > be
> > > > > executed successfully."
> > > > > It looks more like lazy initialization for catalog than
> asynchronous
> > > > > registration, right?
> > > > >
> > > > >
> > > > > Best regards,
> > > > > Yuxia
> > > > >
> > > > > ----- 原始邮件 -----
> > > > > 发件人: "Feng Jin" <jinfeng1...@gmail.com>
> > > > > 收件人: "dev" <dev@flink.apache.org>
> > > > > 发送时间: 星期一, 2023年 4 月 03日 下午 3:27:45
> > > > > 主题: Re: [DISCUSS] FLIP 295: Support persistence of Catalog
> > > configuration
> > > > > and asynchronous registration
> > > > >
> > > > > Hi everyone, Thank you all for your interest in this DISCUSS.
> > > > >
> > > > > @Shammon
> > > > > > How to handle a catalog with the same name that exists for both
> of
> > > > them?
> > > > >
> > > > > I believe this is a crucial point. Based on my current personal
> > > > > understanding, the Map<String, Catalog> catalogs will serve as a
> > cache
> > > > > for instantiated catalogs and have the highest priority.
> > > > >
> > > > > There are three methods that can affect the Map<String, Catalog>
> > > > catalogs:
> > > > >
> > > > > 1. registerCatalog(String catalogName, Catalog catalog)
> > > > >
> > > > > This method puts the catalog instance into the Map<String, Catalog>
> > > > > catalogs.
> > > > >
> > > > > 2. unregisterCatalog(String catalogName)
> > > > >
> > > > > This method removes the catalog instance corresponding to
> catalogName
> > > > > from the Map<String, Catalog> catalogs.
> > > > >
> > > > > 3. getCatalog(String catalogName)
> > > > >
> > > > > This method first retrieves the corresponding catalog instance from
> > > > > the Map<String, Catalog> catalogs. If the catalog does not exist,
> it
> > > > > retrieves the corresponding configuration from the CatalogStore,
> > > > > initializes it, and puts the initialized Catalog instance into the
> > > > > Map<String, Catalog> catalogs.
> > > > >
> > > > > The following two methods only modify the configuration in the
> > > > > CatalogStore:
> > > > >
> > > > > 1. addCatalog(String catalogName, Map<String, String> properties)
> > > > >
> > > > > This method saves the properties to the catalogStore and checks
> > > > > whether there is a catalogName with the same name.
> > > > >
> > > > > 2. removeCatalog(String catalogName)
> > > > > This method removes the specified configuration of the specified
> > > > > catalogName in the catalogStore.
> > > > >
> > > > > The following are possible conflict scenarios:
> > > > >
> > > > > 1. When the corresponding catalogName already exists in the
> > > > > CatalogStore but not in the Map<String, Catalog>, the
> > > > > registerCatalog(String catalogName, Catalog catalog) method can
> > > > > succeed and be directly saved to the Map<String, Catalog> catalogs.
> > > > >
> > > > > 2. When the corresponding catalogName already exists in both the
> > > > > CatalogStore and the Map<String, Catalog>, the
> registerCatalog(String
> > > > > catalogName, Catalog catalog) method will fail.
> > > > >
> > > > > 3. When the corresponding catalogName already exists in the
> > > > > Map<String, Catalog>, the addCatalog(String catalogName,
> Map<String,
> > > > > String> properties) method can directly save the properties to the
> > > > > catalogStore, but the getCatalog(String catalogName) method will
> not
> > > > > use the new properties for initialization because the corresponding
> > > > > catalog instance already exists in catalogs and will be
> prioritized.
> > > > > Therefore, using the unregisterCatalog(String catalogName) method
> to
> > > > > remove the instance corresponding to the original catalogName is
> > > > > necessary.
> > > > >
> > > > >
> > > > >
> > > > > > I think it will confuse users that `registerCatalog(String
> > > > > catalogName,Catalog catalog)` in the `Map<String, Catalog>
> catalogs`
> > > and
> > > > > `registerCatalog(String catalogName, Map<String, String>
> properties)
> > > > >
> > > > > This could potentially lead to confusion. I suggest changing the
> > > > > method name, perhaps to addCatalog(String catalogName, Map<String,
> > > > > String> properties), as previously mentioned
> > > > >
> > > > >
> > > > > @Hang
> > > > > > add `registerCatalog(String catalogName,Catalog catalog,
> > > > > boolean lazyInit)` method
> > > > >
> > > > > Since a catalog is already an instance, adding the "lazyInit"
> > > > > parameter to the registerCatalog(String catalogName, Catalog
> catalog)
> > > > > method may not necessarily result in lazy initialization.
> > > > >
> > > > > > Do we need to think about encryption
> > > > >
> > > > > I think encryption is necessary, but perhaps the encryption logic
> > > > > should be implemented in the CatalogStore when the data is actually
> > > > > saved. For instance, the FileCatalogStore could encrypt the data
> when
> > > > > it is saved to a file, while the MemoryCatalogStore does not
> require
> > > > > encryption.
> > > > >
> > > > > >  Do we really need the `MemoryCatalogStore`?
> > > > >
> > > > > I think it is necessary to have a MemoryCatalogStore as the default
> > > > > implementation, which saves Catalog configurations in memory.
> > > > > Otherwise, if we want to implement asynchronous loading of Catalog,
> > we
> > > > > would need to introduce additional cache.
> > > > >
> > > > >
> > > > > @Xianxun
> > > > >
> > > > > > What if asynchronous registration failed?
> > > > >
> > > > > This is also a critical concern.  When executing DDL, DQL, or DML
> > > > > statements that reference a specific Catalog, the Catalog instance
> > > > > will be initialized if it has not been initialized yet. If the
> > > > > initialization process fails, these statements will not be executed
> > > > > successfully.
> > > > >
> > > > > > nt. The Map<String, Catalog> catalogs and CatalogStore
> catalogstore
> > > in
> > > > > the CatalogManager all have the same catalog name, but correponding
> > to
> > > > > different catalog instances.
> > > > >
> > > > > This issue can be resolved by referring to the previous responses.
> > The
> > > > > key principle is that the Catalog that has been most recently used
> > > > > will be given priority for subsequent use.
> > > > >
> > > > > On Sun, Apr 2, 2023 at 10:58 PM Xianxun Ye <
> yesorno828...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi Feng,
> > > > > >
> > > > > > Thanks for driving this Flip, I do believe this Flip could be
> > helpful
> > > > > for users. The firm I work for also manages a lot of catalogs, and
> > the
> > > > > submission of tasks becomes slow because of loading a number of
> > > catalogs.
> > > > > We obtain the catalogs in the user's Flink SQL through regular
> > > expression
> > > > > to avoid loading all catalogs to improve the speed of task
> > submission.
> > > > > After reading flip-295, I have some questions:
> > > > > >
> > > > > > 1. When creating a catalog with CREATE CATALOG, the asynchronous
> > > > > registration method is used by default. What if asynchronous
> > > registration
> > > > > failed? And how to implement asynchronous registration?
> > > > > >
> > > > > > 2. I also have the same question with Shammon’s first comment.
> The
> > > > > Map<String, Catalog> catalogs and CatalogStore catalogstore in the
> > > > > CatalogManager all have the same catalog name, but correponding to
> > > > > different catalog instances. For example: catalog1-> jdbcCatalog in
> > the
> > > > > catalogs, but catalog1 -> hiveCatalog in the catalogstore. Would
> this
> > > > case
> > > > > happen? Maybe we should define `catalogs` and `catalogstore` more
> > > > clearly.
> > > > > >
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Xianxun
> > > > > >
> > > > > > > 2023年3月31日 11:13,Hang Ruan <ruanhang1...@gmail.com> 写道:
> > > > > > >
> > > > > > > MemoryCatalogStore
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to