Hi, Lincoln

Thanks for join this discussion.

After rethinking, I think your suggestion is make sense, although currently
deleting the workflow on the Scheduler and relying only on the
RefreshHandler is enough, if in the future we support cascading deletion,
the DeleteWorkflowOperation can provide the necessary information without
the need to provide a new interface.

I've updated the public interface section of FLIP.

Best,
Ron

Lincoln Lee <lincoln.8...@gmail.com> 于2024年4月30日周二 21:27写道:

> Thanks Ron for starting this flip! It will complete the user story for
> flip-435[1].
>
> Regarding the WorkflowOperation, I have a question about whether we
> should add Delete/DropWorkflowOperation as well for when the
> Materialized Table is dropped or refresh mode changed from full to
> continuous?
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-435%3A+Introduce+a+New+Materialized+Table+for+Simplifying+Data+Pipelines?src=contextnavpagetreemode
>
>
> Best,
> Lincoln Lee
>
>
> <lorenzo.affe...@ververica.com.invalid> 于2024年4月30日周二 15:37写道:
>
> > Hello Ron, thank you for your detailed answers!
> >
> > For the Visitor pattern, I thought about it the other way around, so that
> > operations visit the scheduler, and not vice-versa :) In this way
> > operations can get the required information in order to be executed in a
> > tailored way.
> >
> > Thank you for your effort, but, as you say:
> > > furthermore, I think the current does not see the benefits of the time,
> > simpler instead of better, similar to the design of
> > CatalogModificationEvent[2] and CatalogModificationListener[3], the
> > developer only needs instanceof judgment.
> >
> > In java, most of the times, `instanceof` is considered an anti-pattern,
> > that's why I was also thinking about a command pattern (every operations
> > defines an `execute` method). However, I also understand this part is not
> > crucial for the FLIP under discussion, and the implementation details can
> > simply wait for the PRs to come.
> >
> > > After discussing with Shengkai offline, there is no need for this REST
> > API
> > to support multiple tables to be refreshed at the same time, so it would
> be
> > more appropriate to put the materialized table identifier in the path of
> > the URL, thanks for the suggestion.
> >
> > Very good!
> >
> > Thank you!
> > On Apr 29, 2024 at 05:04 +0200, Ron Liu <ron9....@gmail.com>, wrote:
> > > Hi, Lorenzo
> > >
> > > > I have a question there: how can the gateway update the
> refreshHandler
> > in
> > > the Catalog before getting it from the scheduler?
> > >
> > > The refreshHandler in CatalogMateriazedTable is null before getting it
> > from
> > > the scheduler, you can look at the CatalogMaterializedTable.Builder[1]
> > for
> > > more details.
> > >
> > > > You have a typo here: WorkflowScheudler -> WorkflowScheduler :)
> > >
> > > Fix it now, thanks very much.
> > >
> > > > For the operations part, I still think that the FLIP would benefit
> from
> > > providing a specific pattern for operations. You could either propose a
> > > command pattern [1] or a visitor pattern (where the scheduler visits
> the
> > > operation to get relevant info) [2] for those operations at your
> choice.
> > >
> > > Thank you for your input, I find it very useful. I tried to understand
> > your
> > > thinking through code and implemented the following pseudo code using
> the
> > > visitor design pattern:
> > > 1. first defined WorkflowOperationVisitor, providing several overloaded
> > > visit methods.
> > >
> > > public interface WorkflowOperationVisitor {
> > >
> > > <T extends RefreshHandler> T visit(CreateWorkflowOperation
> > > createWorkflowOperation);
> > >
> > > void visit(ModifyWorkflowOperation operation);
> > > }
> > >
> > > 2. then in the WorkflowOperation add the accept method.
> > >
> > > @PublicEvolving
> > > public interface WorkflowOperation {
> > >
> > > void accept(WorkflowOperationVisitor visitor);
> > > }
> > >
> > >
> > > 3. in the WorkflowScheduler call the implementation class of
> > > WorkflowOperationVisitor, complete the corresponding operations.
> > >
> > > I recognize this design pattern purely from a code design point of
> view,
> > > but from the point of our specific scenario:
> > > 1. For CreateWorkflowOperation, the visit method needs to return
> > > RefreshHandler, for ModifyWorkflowOperation, such as suspend and
> resume,
> > > the visit method doesn't need to return RefreshHandler. parameter,
> > > currently for different WorkflowOperation,
> > WorkflowOperationVisitor#accept
> > > can't be unified, so I think visitor may not be applicable here.
> > >
> > > 2. In addition, I think using the visitor pattern will add complexity
> to
> > > the WorkflowScheduler implementer, which needs to implement one more
> > > interface WorkflowOperationVisitor, this interface is not for the
> engine
> > to
> > > use, so I don't see any benefit from this design at the moment.
> > >
> > > 3. furthermore, I think the current does not see the benefits of the
> > time,
> > > simpler instead of better, similar to the design of
> > > CatalogModificationEvent[2] and CatalogModificationListener[3], the
> > > developer only needs instanceof judgment.
> > >
> > > To summarize, I don't think there is a need to introduce command or
> > visitor
> > > pattern at present.
> > >
> > > > About the REST API, I will wait for your offline discussion :)
> > >
> > > After discussing with Shengkai offline, there is no need for this REST
> > API
> > > to support multiple tables to be refreshed at the same time, so it
> would
> > be
> > > more appropriate to put the materialized table identifier in the path
> of
> > > the URL, thanks for the suggestion.
> > >
> > > [1]
> > >
> >
> https://github.com/apache/flink/blob/e412402ca4dfc438e28fb990dc53ea7809430aee/flink-table/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogMaterializedTable.java#L264
> > > [2]
> > >
> >
> https://github.com/apache/flink/blob/b1544e4e513d2b75b350c20dbb1c17a8232c22fd/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/listener/CatalogModificationEvent.java#L28
> > > [3]
> > >
> >
> https://github.com/apache/flink/blob/b1544e4e513d2b75b350c20dbb1c17a8232c22fd/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/catalog/listener/CatalogModificationListener.java#L31
> > >
> > > Best,
> > > Ron
> > >
> > > Ron Liu <ron9....@gmail.com> 于2024年4月28日周日 23:53写道:
> > >
> > > > Hi, Shengkai
> > > >
> > > > Thanks for your feedback and suggestion, it looks very useful for
> this
> > > > proposal, regarding your question I made the following optimization:
> > > >
> > > > > > *WorkflowScheduler*
> > > > > > 1. How to get the exception details if `modifyRefreshWorkflow`
> > fails?
> > > > > > 2. Could you give us an example about how to configure the
> > scheduler?
> > > >
> > > > 1. Added a new WorkflowException, WorkflowScheduler's related method
> > > > signature will throw WorkflowException, when creating or modifying
> > Workflow
> > > > encountered an exception, so that the framework will sense and deal
> > with it.
> > > >
> > > > 2. Added a new Configuration section, introduced a new Option, and
> > gave an
> > > > example of how to define the Scheduler in flink-conf.yaml.
> > > >
> > > > > > *SQL Gateway*
> > > > > > 1. SqlGatewayService requires Session as the input, but the REST
> > API
> > > > doesn't need any Session information.
> > > > > > 2. Use "-" instead of "_" in the REST URI and camel case for
> > fields in
> > > > request/response
> > > > > > 3. Do we need scheduleTime and scheduleTimeFormat together?
> > > >
> > > > 1. If it is designed as a synchronous API, it may lead to network
> > jitter,
> > > > thread resource exhaustion and other problems, which I have not
> > considered
> > > > before. The asynchronous API, although increasing the cost of use for
> > the
> > > > user, is friendly to the SqlGatewayService, as well as the Client
> > thread
> > > > resources. In summary as discussed offline, so I also tend to think
> > that
> > > > all APIs of SqlGateway should be unified, and all should be
> > asynchronous
> > > > APIs, and bound to session. I have updated the REST API section in
> > FLIP.
> > > >
> > > > 2. thanks for the reminder, it has been updated
> > > >
> > > > 3. After rethinking, I think it can indeed be simpler, there is no
> > need to
> > > > pass in a custom time format, scheduleTime can be unified to the SQL
> > > > standard timestamp format: 'yyyy-MM-dd HH:mm:ss', it is able to
> > satisfy the
> > > > time related needs of materialized table.
> > > >
> > > > Based on your feedback, I have optimized and updated the FLIP related
> > > > section.
> > > >
> > > > Best,
> > > > Ron
> > > >
> > > >
> > > > Shengkai Fang <fskm...@gmail.com> 于2024年4月28日周日 15:47写道:
> > > >
> > > > > > Hi, Liu.
> > > > > >
> > > > > > Thanks for your proposal. I have some questions about the FLIP:
> > > > > >
> > > > > > *WorkflowScheduler*
> > > > > >
> > > > > > 1. How to get the exception details if `modifyRefreshWorkflow`
> > fails?
> > > > > > 2. Could you give us an example about how to configure the
> > scheduler?
> > > > > >
> > > > > > *SQL Gateway*
> > > > > >
> > > > > > 1. SqlGatewayService requires Session as the input, but the REST
> > API
> > > > > > doesn't need any Session information.
> > > > > >
> > > > > > From the perspective of a gateway developer, I tend to unify the
> > API of
> > > > > > the
> > > > > > SQL gateway, binding all concepts to the session. On the one
> hand,
> > this
> > > > > > approach allows us to reduce maintenance and understanding costs,
> > as we
> > > > > > only need to maintain one set of architecture to complete basic
> > concepts.
> > > > > > On the other hand, the benefits of an asynchronous architecture
> are
> > > > > > evident: we maintain state on the server side. If the request is
> a
> > long
> > > > > > connection, even in the face of network layer jitter, we can
> still
> > find
> > > > > > the
> > > > > > original result through session and operation handles.
> > > > > >
> > > > > > Using asynchronous APIs may increase the development cost for
> > users, but
> > > > > > from a platform perspective, if a request remains in a blocking
> > state for
> > > > > > a
> > > > > > long time, it also becomes a burden on the platform's JVM. This
> is
> > because
> > > > > > thread switching and maintenance require certain resources.
> > > > > >
> > > > > > 2. Use "-" instead of "_" in the REST URI and camel case for
> > fields in
> > > > > > request/response
> > > > > >
> > > > > > Please follow the Flink REST Design.
> > > > > >
> > > > > > 3. Do we need scheduleTime and scheduleTimeFormat together?
> > > > > >
> > > > > > I think we can use SQL timestamp format or ISO timestamp format.
> > It is not
> > > > > > necessary to pass time in any specific format.
> > > > > >
> > > > > > https://en.wikipedia.org/wiki/ISO_8601
> > > > > >
> > > > > > Best,
> > > > > > Shengkai
> > > > > >
> > > >
> >
>

Reply via email to