Hi Feng,

Thanks for working on the FLIP. There are still some NIT issues in the FLIP
like:

1. Optional<CatalogStore> catalogStore has been used as CatalogStore
instead of Optional in the code example. It should be fine to use it as
pseudo code for now and update it after you submit the PR.
2. addCatalog(...) is still used somewhere in the rejected section which
should be persistContext(...) to keep it consistent.

Speaking of the conflict issues in the multi-instance scenarios, I am not
sure if this is the intended behaviour. If Map<String, Catalog> catalogs is
used as a cache, it should be invalid, once the related catalog has been
removed from the CatalogStore by another instance. Did I miss something?

Best regards,
Jing

On Thu, Apr 13, 2023 at 4:40 PM Feng Jin <jinfeng1...@gmail.com> wrote:

> Hi Jing,Shammon
> Thanks for your reply.
>
> @Jing
>
> > How about persistCatalog()?
> I think this is a good function name, I have updated it in the
> documentation.
>
> >    Some common cache features should be implemented
> Thank you for the suggestion. If alternative 1 turns out to be more
> appropriate later, I will improve this part of the content.
>
> > As the above discussion moves forward, the option 2 solution looks more
> like a replacement of option 1
> Yes, after discussing with Shammon offline, we think that solution 2 might
> be more suitable and also avoid any inconsistency issues.
>
> > There are some inconsistent descriptions in the content.  Would you like
> to clean them up?
> I will do my best to improve the document and appreciate your suggestions.
>
>
>
> @Shammon
> > can you put the unselected option in `Rejected Alternatives`
> Sure, I have moved it to the `Rejected Alternatives`.
>
>
>
> Best
> Feng
>
>
>
> On Thu, Apr 13, 2023 at 8:52 AM Shammon FY <zjur...@gmail.com> wrote:
>
> > Hi Feng
> >
> > Thanks for your update.
> >
> > I found there are two options in `Proposed Changes`, can you put the
> > unselected option in `Rejected Alternatives`? I think this may help us
> > better understand your proposal
> >
> >
> > Best,
> > Shammon FY
> >
> >
> > On Thu, Apr 13, 2023 at 4:49 AM Jing Ge <j...@ververica.com.invalid>
> > wrote:
> >
> > > Hi Feng,
> > >
> > > Thanks for raising this FLIP. I am still confused after completely
> > reading
> > > the thread with following questions:
> > >
> > > 1. Naming confusion - registerCatalog() and addCatalog() have no big
> > > difference based on their names. One of them is responsible for data
> > > persistence. How about persistCatalog()?
> > > 2. As you mentioned that Map<String, Catalog> catalogs is used as a
> cache
> > > and catalogStore is used for data persistence. I would suggest
> describing
> > > their purpose conceptually and clearly in the FLIP. Some common cache
> > > features should be implemented, i.e. data in the cache and the store
> > should
> > > be consistent. Same Catalog instance should be found in the store and
> in
> > > the cache(either it has been initialized or it will be lazy
> initialized)
> > > for the same catalog name. The consistency will be taken care of while
> > > updating the catalog.
> > > 3. As the above discussion moves forward, the option 2 solution looks
> > more
> > > like a replacement of option 1, because, afaiu, issues mentioned
> > > previously with option 1 are not solved yet. Do you still want to
> propose
> > > both options and ask for suggestions for both of them?
> > > 4. After you updated the FLIP, there are some inconsistent descriptions
> > in
> > > the content.  Would you like to clean them up? Thanks!
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > On Fri, Apr 7, 2023 at 9:24 AM Feng Jin <jinfeng1...@gmail.com> wrote:
> > >
> > > > hi Shammon
> > > >
> > > > Thank you for your response, and I completely agree with your point
> of
> > > > view.
> > > > Initially, I may have over complicated the whole issue. First and
> > > foremost,
> > > > we need to consider the persistence of the Catalog's Configuration.
> > > > If we only need to provide persistence for Catalog Configuration, we
> > can
> > > > add a toConfiguration method to the Catalog interface.
> > > > This method can convert a Catalog instance to a Map<String, String>
> > > > properties, and the default implementation will throw an exception.
> > > >
> > > > public interface Catalog {
> > > >    /**
> > > >    * Returns a map containing the properties of the catalog object.
> > > >    *
> > > >    * @return a map containing the properties of the catalog object
> > > >    * @throws UnsupportedOperationException if the implementing class
> > does
> > > > not override
> > > >    * the default implementation of this method
> > > >    */
> > > >   default Map<String, String> toProperties() {
> > > >     throw new UnsupportedOperationException("Please implement
> > > toProperties
> > > > for this catalog");
> > > >   }
> > > > }
> > > >
> > > > The specific process is as follows:
> > > >
> > > > 1.  If the user has configured a CatalogStore, the toConfiguration()
> > > method
> > > > will be called when registering a Catalog instance with
> > > > registerCatalog(String catalogName, Catalog catalog). The Catalog
> > > instance
> > > > will be converted to a Map<String, String> properties and saved to
> the
> > > > CatalogStore. If some Catalog instances do not implement this method,
> > an
> > > > exception will be thrown.
> > > >
> > > > 2.  If the user does not use a CatalogStore, the toConfiguration()
> > method
> > > > will not be called, ensuring consistency with the original process.
> > > >
> > > > 3. Saving both the Map<String, Catalog> catalogs and the CatalogStore
> > at
> > > > the same time can also avoid conflicts
> > > >
> > > >
> > > > For lazy initialization:
> > > > we can start from the Catalog itself and provide a dedicated Catalog
> > > > implementation for lazy loading, such as ConfigurationCatalog
> > > >
> > > > public class ConfigurationCatalog implements Catalog {
> > > > ConfigurationCatalog(Map<String, String> properties) {
> > > > }
> > > > }
> > > >
> > > > I have added this design to the FLIP.
> > > >
> > > >
> > > > Best,
> > > > Feng
> > > >
> > > > On Thu, Apr 6, 2023 at 10:03 AM Shammon FY <zjur...@gmail.com>
> wrote:
> > > >
> > > > > Hi Feng
> > > > >
> > > > > Thanks for your answer.
> > > > >
> > > > > I have no questions about the functionality of `CatalogStore`, but
> > the
> > > > > different operations of multiple `registerCatalog` or
> `storeCatalog`
> > in
> > > > > `CatalogManager`. The implementation in Trino is also the same, the
> > > > > `CatalogManager` in trino only has one `createCatalog`, which will
> > save
> > > > the
> > > > > catalog to memory and `CatalogStore`.
> > > > >
> > > > > I think we don't need to add another `registerCatalog` or
> > `addCatalog`
> > > to
> > > > > save a catalog in `Map<String, Catalog> catalogs` or `CatalogStore
> > > > > catalogStore`.  As you mentioned before, the `Map<String, Catalog>
> > > > > catalogs` is a cache in `CatalogManager`. How about saving the
> > catalog
> > > in
> > > > > `Map<String, Catalog> catalogs` and `CatalogStore catalogStore`
> > > together
> > > > > when it is registered or added in `CatalogManager`?
> > > > >
> > > > > Best,
> > > > > Shammon FY
> > > > >
> > > > >
> > > > > On Tue, Apr 4, 2023 at 5:22 PM Feng Jin <jinfeng1...@gmail.com>
> > wrote:
> > > > >
> > > > > > Thank you for your reply. I am very sorry for the
> misunderstanding
> > > > caused
> > > > > > by my deviation from the original discussion.
> > > > > >
> > > > > > @Shammon
> > > > > > > I found there is a pre-discussion [1] for this FLIP
> > > > > > Yes, there was indeed such a discussion before.  However, in
> > > designing
> > > > > the
> > > > > > whole solution, I found that the logic of CatalogManager itself
> > > doesn't
> > > > > > need to change much. *We cannot only persist Catalog instances
> > > > > themselves*,
> > > > > > so exposing only registerCatalog(String catalogName, Catalog
> > catalog)
> > > > > might
> > > > > > not be enough to save Catalogs, because in the end we still need
> to
> > > > save
> > > > > > the configurations corresponding to the Catalog instances.
> > > Therefore,
> > > > I
> > > > > > decided to introduce the CatalogStore interface for configuration
> > > > > > persistence. Regarding this part, I also took inspiration from
> > > Trino's
> > > > > > implementation[1].
> > > > > >
> > > > > >
> > > > > > @yuxia
> > > > > >
> > > > > > > 1: The mechanism of handling catalog with the same name looks a
> > > > little
> > > > > of
> > > > > > complex to me.
> > > > > > Thank you for the suggestion. I will provide a detailed
> description
> > > and
> > > > > > code examples for this part, and add it to the FLIP.
> > > > > >
> > > > > > > 2: TBH, the method name `addCatalog` still looks confused to
> me.
> > > > IIUC,
> > > > > > this method is for storing catalog to CatalogStore, how about
> > > renaming
> > > > it
> > > > > > to `storeCatalog`? It's very personal opinion, you can decide to
> > take
> > > > it
> > > > > or
> > > > > > not by your self.
> > > > > > StoreCatalog looks more intuitive to me. I don't see any problem
> > with
> > > > it.
> > > > > >
> > > > > > > 3.For CREATE CATALOG statement, which method will be called?
> > > > > > `registerCatalog` or `addCatalog`? I'm wondering whether user can
> > > add a
> > > > > > catalog to store with SQL stement.
> > > > > > For CREATE CATALOG, my original design was to add it directly to
> > the
> > > > > > CatalogStore, but this would disrupt the existing logic.
> > Therefore, I
> > > > > think
> > > > > > we can do both: save the configuration to the CatalogStore and
> > > > initialize
> > > > > > the Catalog instance at the same time
> > > > > >
> > > > > > > 3. Is it really neccessary to provide a default implmentation
> for
> > > > > > interface `CatalogStoreFactory`?
> > > > > > I think it is necessary, otherwise we would need to introduce an
> > > > > additional
> > > > > > Map to store the configuration for lazy loading.
> > > > > >
> > > > > > > 4: About asynchronous registration for catalog.
> > > > > > I don't think registerCatalog(String catalogName, Catalog
> catalog)
> > > can
> > > > be
> > > > > > made into an asynchronous interface because Catalog is already an
> > > > > instance.
> > > > > >
> > > > > > > It looks more like lazy initialization for catalog than
> > > asynchronous
> > > > > > registration, right?
> > > > > > Indeed, my description was inaccurate. It should be lazy
> > registration
> > > > > > instead of asynchronous registration. I have already updated the
> > > title
> > > > of
> > > > > > the FLIP.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > [1].
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/connector/CatalogStore.java
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 4, 2023 at 4:27 PM Shammon FY <zjur...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Hi Feng
> > > > > > >
> > > > > > > I think if there is a `registerCatalog` method in
> > `CatalogManager`,
> > > > it
> > > > > > will
> > > > > > > confuse users whether a method named `addCatalog` or
> > `storeCatalog`
> > > > is
> > > > > > > added.
> > > > > > >
> > > > > > > And as you mentioned, the memory catalog is a `cache`, I think
> > the
> > > > > > concept
> > > > > > > of `cache` should not be exposed to users.
> > > > > > >
> > > > > > > I found there is a pre-discussion [1] for this FLIP. Please
> > correct
> > > > me
> > > > > if
> > > > > > > I'm wrong, IIUC, the conclusion of that discussion is to use
> > > > > > > `CatalogManager` as an interface and implement it for different
> > > > stores
> > > > > > such
> > > > > > > as memory, file and external system.
> > > > > > >
> > > > > > > I think there is a gap between the current FLIP design and that
> > > > > > conclusion.
> > > > > > > What about the proposal of the discussion in thread [1] ?
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > https://lists.apache.org/thread/9bnjblgd9wvrl75lkm84oo654c4lqv70
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Shammon FY
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 4, 2023 at 3:41 PM yuxia <
> > luoyu...@alumni.sjtu.edu.cn>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Feng for driving this FLIP.
> > > > > > > > I have few comments:
> > > > > > > > 1: The mechanism of handling catalog with the same name
> looks a
> > > > > little
> > > > > > of
> > > > > > > > complex to me. I think it'll be better to explain it in the
> > java
> > > > doc
> > > > > of
> > > > > > > > these methods and give a brief example in this FLIP.
> > > > > > > >
> > > > > > > > 2: TBH, the method name `addCatalog` still looks confused to
> > me.
> > > > > IIUC,
> > > > > > > > this method is for storing catalog to CatalogStore, how about
> > > > > renaming
> > > > > > it
> > > > > > > > to `storeCatalog`? It's very personal opinion, you can decide
> > to
> > > > take
> > > > > > it
> > > > > > > or
> > > > > > > > not by your self.
> > > > > > > >
> > > > > > > > 3: For CREATE CATALOG statement, which method will be called?
> > > > > > > > `registerCatalog` or `addCatalog`? I'm wondering whether user
> > can
> > > > > add a
> > > > > > > > catalog to store with SQL stement.
> > > > > > > >
> > > > > > > >
> > > > > > > > 3: Is it really neccessary to provide a default implmentation
> > for
> > > > > > > > interface `CatalogStoreFactory`?
> > > > > > > >
> > > > > > > >
> > > > > > > > 4: About asynchronous registration for catalog.
> > > > > > > >
> > > > > > > > > When creating a catalog with CREATE CATALOG, the
> asynchronous
> > > > > > > > registration method is used by default.
> > > > > > > > If asynchronous registration is the default behavior, it
> there
> > > any
> > > > > way
> > > > > > > > that user can switch to synchronous registration just like
> > > before?
> > > > > > > > Will both method `addCatalog` and `registerCatalog` be
> > > asynchronous
> > > > > > > > registration?
> > > > > > > >
> > > > > > > > IIUC, in asynchronous registration, it may well that CREATE
> > > CATALOG
> > > > > > > > executes successfully, but then the following CREATE TABLE
> > > > statement
> > > > > > will
> > > > > > > > fail for the catalog fail to open.
> > > > > > > > I think it's a break change which should be highlighted in
> this
> > > > FLIP,
> > > > > > may
> > > > > > > > be in compatibility part.
> > > > > > > >
> > > > > > > >
> > > > > > > > BTW, by saying asynchronous registration, I would like to
> > expect
> > > > > there
> > > > > > > > will be an executor to open or register catalog in the
> > > background,
> > > > > but
> > > > > > > from
> > > > > > > > your previous comments,
> > > > > > > > "the Catalog instance will be initialized if it has not been
> > > > > > initialized
> > > > > > > > yet. If the initialization process fails, these statements
> will
> > > not
> > > > > be
> > > > > > > > executed successfully."
> > > > > > > > It looks more like lazy initialization for catalog than
> > > > asynchronous
> > > > > > > > registration, right?
> > > > > > > >
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Yuxia
> > > > > > > >
> > > > > > > > ----- 原始邮件 -----
> > > > > > > > 发件人: "Feng Jin" <jinfeng1...@gmail.com>
> > > > > > > > 收件人: "dev" <dev@flink.apache.org>
> > > > > > > > 发送时间: 星期一, 2023年 4 月 03日 下午 3:27:45
> > > > > > > > 主题: Re: [DISCUSS] FLIP 295: Support persistence of Catalog
> > > > > > configuration
> > > > > > > > and asynchronous registration
> > > > > > > >
> > > > > > > > Hi everyone, Thank you all for your interest in this DISCUSS.
> > > > > > > >
> > > > > > > > @Shammon
> > > > > > > > > How to handle a catalog with the same name that exists for
> > both
> > > > of
> > > > > > > them?
> > > > > > > >
> > > > > > > > I believe this is a crucial point. Based on my current
> personal
> > > > > > > > understanding, the Map<String, Catalog> catalogs will serve
> as
> > a
> > > > > cache
> > > > > > > > for instantiated catalogs and have the highest priority.
> > > > > > > >
> > > > > > > > There are three methods that can affect the Map<String,
> > Catalog>
> > > > > > > catalogs:
> > > > > > > >
> > > > > > > > 1. registerCatalog(String catalogName, Catalog catalog)
> > > > > > > >
> > > > > > > > This method puts the catalog instance into the Map<String,
> > > Catalog>
> > > > > > > > catalogs.
> > > > > > > >
> > > > > > > > 2. unregisterCatalog(String catalogName)
> > > > > > > >
> > > > > > > > This method removes the catalog instance corresponding to
> > > > catalogName
> > > > > > > > from the Map<String, Catalog> catalogs.
> > > > > > > >
> > > > > > > > 3. getCatalog(String catalogName)
> > > > > > > >
> > > > > > > > This method first retrieves the corresponding catalog
> instance
> > > from
> > > > > > > > the Map<String, Catalog> catalogs. If the catalog does not
> > exist,
> > > > it
> > > > > > > > retrieves the corresponding configuration from the
> > CatalogStore,
> > > > > > > > initializes it, and puts the initialized Catalog instance
> into
> > > the
> > > > > > > > Map<String, Catalog> catalogs.
> > > > > > > >
> > > > > > > > The following two methods only modify the configuration in
> the
> > > > > > > > CatalogStore:
> > > > > > > >
> > > > > > > > 1. addCatalog(String catalogName, Map<String, String>
> > properties)
> > > > > > > >
> > > > > > > > This method saves the properties to the catalogStore and
> checks
> > > > > > > > whether there is a catalogName with the same name.
> > > > > > > >
> > > > > > > > 2. removeCatalog(String catalogName)
> > > > > > > > This method removes the specified configuration of the
> > specified
> > > > > > > > catalogName in the catalogStore.
> > > > > > > >
> > > > > > > > The following are possible conflict scenarios:
> > > > > > > >
> > > > > > > > 1. When the corresponding catalogName already exists in the
> > > > > > > > CatalogStore but not in the Map<String, Catalog>, the
> > > > > > > > registerCatalog(String catalogName, Catalog catalog) method
> can
> > > > > > > > succeed and be directly saved to the Map<String, Catalog>
> > > catalogs.
> > > > > > > >
> > > > > > > > 2. When the corresponding catalogName already exists in both
> > the
> > > > > > > > CatalogStore and the Map<String, Catalog>, the
> > > > registerCatalog(String
> > > > > > > > catalogName, Catalog catalog) method will fail.
> > > > > > > >
> > > > > > > > 3. When the corresponding catalogName already exists in the
> > > > > > > > Map<String, Catalog>, the addCatalog(String catalogName,
> > > > Map<String,
> > > > > > > > String> properties) method can directly save the properties
> to
> > > the
> > > > > > > > catalogStore, but the getCatalog(String catalogName) method
> > will
> > > > not
> > > > > > > > use the new properties for initialization because the
> > > corresponding
> > > > > > > > catalog instance already exists in catalogs and will be
> > > > prioritized.
> > > > > > > > Therefore, using the unregisterCatalog(String catalogName)
> > method
> > > > to
> > > > > > > > remove the instance corresponding to the original catalogName
> > is
> > > > > > > > necessary.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > I think it will confuse users that `registerCatalog(String
> > > > > > > > catalogName,Catalog catalog)` in the `Map<String, Catalog>
> > > > catalogs`
> > > > > > and
> > > > > > > > `registerCatalog(String catalogName, Map<String, String>
> > > > properties)
> > > > > > > >
> > > > > > > > This could potentially lead to confusion. I suggest changing
> > the
> > > > > > > > method name, perhaps to addCatalog(String catalogName,
> > > Map<String,
> > > > > > > > String> properties), as previously mentioned
> > > > > > > >
> > > > > > > >
> > > > > > > > @Hang
> > > > > > > > > add `registerCatalog(String catalogName,Catalog catalog,
> > > > > > > > boolean lazyInit)` method
> > > > > > > >
> > > > > > > > Since a catalog is already an instance, adding the "lazyInit"
> > > > > > > > parameter to the registerCatalog(String catalogName, Catalog
> > > > catalog)
> > > > > > > > method may not necessarily result in lazy initialization.
> > > > > > > >
> > > > > > > > > Do we need to think about encryption
> > > > > > > >
> > > > > > > > I think encryption is necessary, but perhaps the encryption
> > logic
> > > > > > > > should be implemented in the CatalogStore when the data is
> > > actually
> > > > > > > > saved. For instance, the FileCatalogStore could encrypt the
> > data
> > > > when
> > > > > > > > it is saved to a file, while the MemoryCatalogStore does not
> > > > require
> > > > > > > > encryption.
> > > > > > > >
> > > > > > > > >  Do we really need the `MemoryCatalogStore`?
> > > > > > > >
> > > > > > > > I think it is necessary to have a MemoryCatalogStore as the
> > > default
> > > > > > > > implementation, which saves Catalog configurations in memory.
> > > > > > > > Otherwise, if we want to implement asynchronous loading of
> > > Catalog,
> > > > > we
> > > > > > > > would need to introduce additional cache.
> > > > > > > >
> > > > > > > >
> > > > > > > > @Xianxun
> > > > > > > >
> > > > > > > > > What if asynchronous registration failed?
> > > > > > > >
> > > > > > > > This is also a critical concern.  When executing DDL, DQL, or
> > DML
> > > > > > > > statements that reference a specific Catalog, the Catalog
> > > instance
> > > > > > > > will be initialized if it has not been initialized yet. If
> the
> > > > > > > > initialization process fails, these statements will not be
> > > executed
> > > > > > > > successfully.
> > > > > > > >
> > > > > > > > > nt. The Map<String, Catalog> catalogs and CatalogStore
> > > > catalogstore
> > > > > > in
> > > > > > > > the CatalogManager all have the same catalog name, but
> > > correponding
> > > > > to
> > > > > > > > different catalog instances.
> > > > > > > >
> > > > > > > > This issue can be resolved by referring to the previous
> > > responses.
> > > > > The
> > > > > > > > key principle is that the Catalog that has been most recently
> > > used
> > > > > > > > will be given priority for subsequent use.
> > > > > > > >
> > > > > > > > On Sun, Apr 2, 2023 at 10:58 PM Xianxun Ye <
> > > > yesorno828...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi Feng,
> > > > > > > > >
> > > > > > > > > Thanks for driving this Flip, I do believe this Flip could
> be
> > > > > helpful
> > > > > > > > for users. The firm I work for also manages a lot of
> catalogs,
> > > and
> > > > > the
> > > > > > > > submission of tasks becomes slow because of loading a number
> of
> > > > > > catalogs.
> > > > > > > > We obtain the catalogs in the user's Flink SQL through
> regular
> > > > > > expression
> > > > > > > > to avoid loading all catalogs to improve the speed of task
> > > > > submission.
> > > > > > > > After reading flip-295, I have some questions:
> > > > > > > > >
> > > > > > > > > 1. When creating a catalog with CREATE CATALOG, the
> > > asynchronous
> > > > > > > > registration method is used by default. What if asynchronous
> > > > > > registration
> > > > > > > > failed? And how to implement asynchronous registration?
> > > > > > > > >
> > > > > > > > > 2. I also have the same question with Shammon’s first
> > comment.
> > > > The
> > > > > > > > Map<String, Catalog> catalogs and CatalogStore catalogstore
> in
> > > the
> > > > > > > > CatalogManager all have the same catalog name, but
> correponding
> > > to
> > > > > > > > different catalog instances. For example: catalog1->
> > jdbcCatalog
> > > in
> > > > > the
> > > > > > > > catalogs, but catalog1 -> hiveCatalog in the catalogstore.
> > Would
> > > > this
> > > > > > > case
> > > > > > > > happen? Maybe we should define `catalogs` and `catalogstore`
> > more
> > > > > > > clearly.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > >
> > > > > > > > > Xianxun
> > > > > > > > >
> > > > > > > > > > 2023年3月31日 11:13,Hang Ruan <ruanhang1...@gmail.com> 写道:
> > > > > > > > > >
> > > > > > > > > > MemoryCatalogStore
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to