Re: [DISCUSS] FLIP-292: Support configuring state TTL at operator level for Table API & SQL programs

Martijn Visser Tue, 28 Mar 2023 07:49:28 -0700

Hi Jane,

Thanks for creating the FLIP. In general I'm not a fan of using the query
plan for enabling these kinds of use cases. It introduces a different way
of submitting SQL jobs in our already extensive list of possibilities,
making things complicated. I would have a preference for using hints, given
that we explicitly mention hints for "Operator resource constraints" [1].
For me, that feels like a more natural fit for this use case.


I would like to get @Timo Walther <[email protected]> his opinion on this
topic too.

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/

On Mon, Mar 27, 2023 at 10:22 PM Jing Ge <[email protected]> wrote:

> Hi Jane,
>
> Thanks for clarifying it. As far as I am concerned, the issue is where to
> keep the user's job metadata, i.e. SQL script (to make the discussion
> easier, let's ignore config). As long as FLIP-190 is only used for
> migration/upgrade, SQL is the single source of truth. Once the compiled
> plan has been modified, in this case ttls, the user's job metadata will be
> distributed into two different places. Each time when the SQL needs
> changes, extra effort will be required to take care of the modification in
> the compiled plan.
>
> Examples:
>
> 1. If we try to start the same SQL with a new Flink cluster (one type of
> "restart") without knowing the modified compiled plan. The old
> performance issue will rise again. This might happen when multiple users
> are working on the same project who run a working SQL job, get performance
> issues, and have no clue since nothing has been changed. Or one user is
> working on many SQL jobs who might lose the overview of which SQL job has
> modified plans or not.
> 2. If a SQL has been changed in a backwards compatible way and (re)start
> with a given savepoint(NO_CLAIM), the version2 json plan has to be made
> based on version1, as I mentioned previously, which means each time when
> the SQL got changed, the related compiled plan need modification too.
> Beyond that , it would also be easily forgotten to do it if there were no
> connection between the SQL and the related modified compiled plan. The SQL
> job will have the performance issue again after the change.
> 3. Another scenario would be running a backwards compatible SQL job with an
> upgraded FLink version, additional upgrade logic or guideline should be
> developed for e.g. ttl modification in the compiled plan, because upgraded
> Flink engine underneath might lead to a different ttl setting.
> 4. The last scenario is just like you described that SQL has been changed
> significantly so that the compiled operators will be changed too. The easy
> way is to start a fresh new tuning. But since there was a tuning for the
> last SQL. User has to compare both compiled plans and copy/paste some ttls
> that might still work.
>
> A virtualization tool could help but might not reduce those efforts
> significantly, since the user behaviour is changed enormously.
>
> I was aware that the json string might be large. Doing(EXECUTE PLAN 'json
> plan as string') is intended to avoid dealing with files for most common
> cases where the json string has common length.
>
> Anyway, it should be fine, if it is only recommended for advanced use cases
> where users are aware of those efforts.
>
> Best regards,
> Jing
>
> On Sat, Mar 25, 2023 at 3:54 PM Jane Chan <[email protected]> wrote:
>
> > Hi Leonard, Jing and Shengkai,
> >
> > Thanks so much for your insightful comments. Here are my thoughts
> >
> > @Shengkai
> > > 1. How the Gateway users use this feature? As far as I know, the
> EXEUCTE
> > PLAN only supports local file right now. Is it possible to extend this
> > syntax to allow for reading plan files from remote file systems?
> >
> > Nice catch! Currently, the "COMPILE PLAN" and "EXECUTE PLAN" statements
> > only support a local file path without the scheme (see
> > TableEnvironmentImpl.java#L773
> > <
> https://github.com/apache/flink/blob/80ee512f00a9a8873926626d66cdcc97164c4595/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/internal/TableEnvironmentImpl.java#L773
> >).
> > It's reasonable to extend the support to Flink's FileSystem. Besides, the
> > JSON plan should also be added to the resource cleaning mechanism for the
> > Gateway mode, just like we do with the "ADD JAR" operation, cleaning it
> up
> > when the session ends. I will take your suggestion and make changes to
> FLIP.
> >
> > > 2. I would like to inquire if there are any limitations on this
> feature?
> > I have encountered several instances where the data did not expire in the
> > upstream operator, but it expired in the downstream operator, resulting
> in
> > abnormal calculation results or direct exceptions thrown by the operator
> > (e.g. rank operator). Can we limit that the expiration time of downstream
> > operator data should be greater than or equal to the expiration time of
> > upstream operator data?
> >
> > This is an excellent point. In fact, the current state TTL is based on
> the
> > initialization time of each operator, which is inherently unaligned. The
> > probability of such unalignment is magnified now that fine-grained
> > operator-level TTL is supported. While on the other hand, this FLIP is
> not
> > the root cause of this issue. To systematically solve the problem of TTL
> > unalignment between operators, I understand that we need a larger FLIP to
> > accomplish this. And I'll mention this point in the FLIP doc. WDYT?
> >
> > Back to your suggestions, in most scenarios, the TTL between multiple
> > state operators should be non-monotonically decreasing, but there may be
> > some exceptions, such as the SinkUpsertMaterializer introduced to solve
> the
> > changelog disorder problem. It may not be appropriate if we block it at
> the
> > implementation level. But it does happen that the users misconfigure the
> > TTL, so in this case, my idea is that, since FLIP-280
> > <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-280%3A+Introduce+EXPLAIN+PLAN_ADVICE+to+provide+SQL+advice
> >
> > introduces an experimental feature "EXPLAIN PLAN_ADVICE", and FLIP-190
> > <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489#FLIP190:SupportVersionUpgradesforTableAPI&SQLPrograms-EXPLAIN>
> also
> > introduces a new syntax "EXPLAIN PLAN FOR '/foo/bar/sql.json'", what if
> we
> > add a new plan analyzer, which will analyze the compiled plan to perform
> > detection. The analyzer gives a warning attached to the optimized
> physical
> > plan when the TTL of the predecessor is larger than the TTL of the
> > posterior.  Will it draw the user's attention and make troubleshooting
> > easier?
> >
> > @Leonard and @Jing
> > You both expressed the same concern about the high cost of understanding
> > and changing the behavior of users using SQL. IMO as opposed to the usual
> > features, fine-grained TTL configuration is a feature for advanced
> users. I
> > draw a pic to illustrate this. You can see this pic to estimate the
> funnel
> > conversion rate, from SQL jobs that involve stateful and TTL-controlled
> > operators to jobs that require only one TTL configuration to meet the
> > requirements, to jobs that eventually require multiple TTL
> configurations,
> > which is in a decreasing distribution. The first and second-tier users
> > should not feel bothered about this.
> > [image: image.png]
> > We will explain in detail in the documentation how to use this feature,
> > how to do it, and it is a feature that needs to be used carefully. Also,
> in
> > conjunction with FLIP-280 and FLIP-190, we can print out the
> SQL-optimized
> > physical and execution plan for the JSON file (with tree style just like
> > the normal EXPLAIN statement), would this help the advanced users
> > understand the compiled JSON plan represents?
> >
> >
> > @Jing
> > > One thing I didn't fully understand. I might be wrong. Could those ttl
> > configs be survived when SQL jobs are restarted? I have to always call
> the
> > EXECUTE PLAN every time when the job needs to be restarted?
> >
> > If it's a new SQL job and has never been submitted before, and users want
> > to enable the fine-grained state TTL control, then they will first use
> > COMPILE PLAN statement to generate the JSON file and modify the stateful
> > operator's state metadata as needed, then submit the job via EXECUTE PLAN
> > statement. By the word "restarted", I assume there are historical
> instances
> > before and users want to restore from some checkpoints or savepoints.
> > Without SQL changes, users can directly use Flink CLI $ bin/flink run -s
> > :savepointPath -restoreMode :mode -n [:runArgs]
> > <
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#resuming-from-savepoints>
> to
> > resume/restart the job with savepoint. In this situation, the customized
> > TTL is still in effect.
> >
> > > Does that mean that, once I modified the compiled sql plan, the json
> > file will become the sql job? If I am not mistaken, the compiled sql plan
> > introduced by FLIP-190 is only used for SQL job migration/update. Common
> > stages that Flink uses to produce the execution plan from SQL does not
> > contain the compiling step.
> >
> > I want to explain briefly SQL processing and what FLIP-190 achieves. All
> > SQL jobs go through the following three steps to run, no matter
> > with/without FLIP-190
> > <1> parsing into AST and then Operation by the parser;
> > <2> optimizing the original rel with rule-based and cost-based optimizers
> > into physical rel nodes and then exec nodes by the planner;
> > <3> transforming exec nodes to transformations and then generating
> > JobGraph and streamGraph to run.
> >
> > FLIP-190 serializes the result of step <2> as a side output in JSON
> format
> > and dumps it into a file. The file serves as a hooker to allow you to
> make
> > some changes (such as performing the plan/state migration or tuning state
> > TTL for stateful operators), and then continue with step <3>. From this
> > point, I'd like to say FLIP-190 is introducing a mechanism/possibility to
> > allow some advanced configuration to happen during the intermediate step,
> > not just a use case for migration/upgrade.
> >
> > > In case that the original SQL script has been changed, we need to
> > compile a version2 sql plan and copy the ttl configs from version1 SQL
> plan
> > to version2 and drop version1. This means we have to keep the compiled
> json
> > file and create a link with the original SQL script. I am not sure if I
> > understood it correctly, it seems like a lot of maintenance effort.
> > > The regular working process for Flink SQL users is changed, from only
> > dealing with SQL like scripts to moving between SQL like scripts and file
> > modifications back and forth. This is a big change for user behaviours.
> >
> > In fact, it's not just a copy-paste thing. SQL changes may result in more
> > stateful operators or existing stateful operators being deleted, so the
> > user cannot simply copy the configuration from the previous JSON file.
> What
> > they should do is carefully consider whether they still need to enable
> > fine-grained state TTL configuration for the current new version of SQL,
> > and in which operators they need to configure, and how long the TTL
> should
> > be, and modify the new JSON file accordingly.
> >
> > > One option could be that we upgrade/extend the COMPILE PLAN to allow
> > users update ttl for operators at the script level. But I am not sure if
> it
> > is possible to point out specific operators at this level. Another option
> > is to print out the result of COMPILE PLAN and enable EXECUTE PLAN 'json
> > plan as string'. Third option is to leverage a data platform to
> virtualize
> > the compiled sql plan and provide related interactions for updating ttl
> and
> > submit(execute) the modified compiled sql plan.
> >
> > The 1st option might not be feasible. SQL syntax is not easy to extend
> > especially for things beyond ANSI SQL standard. While for the 2nd option,
> > in terms of practicality, given that JSON strings can be very long, I
> don't
> > think it's as convenient as the EXECUTE PLAN
> > '/foo/bar/compiled-plan.json' statement, which is already supported by
> > FLIP-190. I agree with the 3rd option, and just as @Yun mentioned before,
> > nothing better than a graphical IDE. I think this should be a very
> helpful
> > experience improvement for advanced users who want to tune fine-grained
> > configurations (not just state TTL) based on an optimized exec plan, and
> > deserves another FLIP. WDYT?
> >
> > Best,
> > Jane
> >
> > On Sat, Mar 25, 2023 at 7:27 AM Jing Ge <[email protected]>
> > wrote:
> >
> >> Thanks Jane for driving this FLIP.
> >>
> >> The FLIP is quite interesting. Since the execution plan has finer
> >> granularity than the plain SQL script, Hints at SQL level might not be
> >> able
> >> to touch specific operators, which turns out that the idea of leveraging
> >> the compiled execution plan is brilliant.
> >>
> >> However, there are some concerns that might need to be considered.
> >>
> >> - One thing I didn't fully understand. I might be wrong. Could those ttl
> >> configs be survived when SQL jobs are restarted? Does that mean that,
> once
> >> I modified the compiled sql plan, the json file will become the sql
> job? I
> >> have to always call the EXECUTE PLAN every time when the job needs to be
> >> restarted? In case that the original SQL script has been changed, we
> need
> >> to compile a version2 sql plan and copy the ttl configs from version1
> sql
> >> plan to version2 and drop version1. This means we have to keep the
> >> compiled
> >> json file and create a link with the original SQL script. I am not sure
> if
> >> I understood it correctly, it seems like a lot of maintenance effort.
> >> - If I am not mistaken, the compiled sql plan introduced by FLIP-190 is
> >> only used for SQL job migration/update. Common stages that Flink uses to
> >> produce the execution plan from SQL does not contain the compiling step.
> >> This makes one tool do two different jobs[1], upgrade + ttl tuning.
> >> and tighten the dependency on compiling sql plans. Flink SQL users have
> to
> >> deal with a compiled sql plan for performance optimization that is not
> >> designed for it.
> >> - The regular working process for Flink SQL users is changed, from only
> >> dealing with SQL like scripts to moving between SQL like scripts and
> file
> >> modifications back and forth. This is a big change for user behaviours.
> >> One
> >> option could be that we upgrade/extend the COMPILE PLAN to allow users
> >> update ttl for operators at the script level. But I am not sure if it is
> >> possible to point out specific operators at this level. Another option
> is
> >> to print out the result of COMPILE PLAN and enable EXECUTE PLAN 'json
> plan
> >> as string'. Third option is to leverage a data platform to virtualize
> the
> >> compiled sql plan and provide related interactions for updating ttl and
> >> submit(execute) the modified compiled sql plan.
> >>
> >> On the other side, there is one additional benefit with this proposal:
> we
> >> could fine tune SQL jobs while we migrate/upgrade them. That is nice!
> >>
> >> Best regards,
> >> Jing
> >>
> >> [1] https://en.wikipedia.org/wiki/Single-responsibility_principle
> >>
> >> On Fri, Mar 24, 2023 at 4:02 PM Leonard Xu <[email protected]> wrote:
> >>
> >> > Thanks Jane for the proposal.
> >> >
> >> > TTL of state is an execution phase configuration, serialized json
> graph
> >> > file is the graph for execution phase, supporting the operator level
> >> state
> >> > TTL in the execution json file makes sense to me.
> >> >
> >> > From the user's perspective, I have two concerns:
> >> > 1. By modifying the execution graph node configuration, this raises
> the
> >> > cost for users to understand, especially for SQL users.
> >> > 2. Submitting a SQL job through `exec plan json file` is not so
> >> intuitive
> >> > as users cannot see the SQL detail of the job
> >> >
> >> > Best,
> >> > Leonard
> >> >
> >> > On Fri, Mar 24, 2023 at 5:07 PM Shengkai Fang <[email protected]>
> >> wrote:
> >> >
> >> > > Hi, Jane.
> >> > >
> >> > > Thanks for driving this FLIP and this feature are very useful to
> many
> >> > > users. But I have two problems about the FLIP:
> >> > >
> >> > > 1. How the Gateway users use this feature? As far as I know, the
> >> EXEUCTE
> >> > > PLAN only supports local file right now.  Is it possible to extend
> >> this
> >> > > syntax to allow for reading plan files from remote file systems?
> >> > >
> >> > > 2. I would like to inquire if there are any limitations on this
> >> feature?
> >> > I
> >> > > have encountered several instances where the data did not expire in
> >> the
> >> > > upstream operator, but it expired in the downstream operator,
> >> resulting
> >> > in
> >> > > abnormal calculation results or direct exceptions thrown by the
> >> operator
> >> > > (e.g. rank operator). Can we limit that the expiration time of
> >> downstream
> >> > > operator data should be greater than or equal to the expiration time
> >> of
> >> > > upstream operator data?
> >> > >
> >> > > Best,
> >> > > Shengkai
> >> > >
> >> > > Yun Tang <[email protected]> 于2023年3月24日周五 14:50写道：
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > From my point of view, I am a bit against using SQL hint to set
> >> state
> >> > TTL
> >> > > > as FlinkSQL could be translated to several stateful operators. If
> we
> >> > want
> >> > > > to let different state could have different TTL configs within one
> >> > > > operator, the SQL hint solution could not work. A better way is to
> >> > allow
> >> > > a
> >> > > > graphical IDE to display the stateful operators and let users
> >> configure
> >> > > > them. And the IDE submits the json plan to Flink to run jobs.
> >> > > >
> >> > > > For the details of the structure of ExecNodes, since the state
> name
> >> is
> >> > > > unique in the underlying state layer, shall we introduce the
> "index"
> >> > tag
> >> > > to
> >> > > > identify the state config?
> >> > > > What will happen with the conditions below:
> >> > > > 1st run:
> >> > > >    {
> >> > > >      "index": 0,
> >> > > >      "ttl": "259200000 ms",
> >> > > >      "name": "join-lef-state"
> >> > > >    },
> >> > > >    {
> >> > > >      "index": 1,
> >> > > >      "ttl": "86400000 ms",
> >> > > >      "name": "join-right-state"
> >> > > >    }
> >> > > >
> >> > > > 2nd run:
> >> > > >    {
> >> > > >      "index": 0,
> >> > > >      "ttl": "86400000 ms",
> >> > > >      "name": "join-right-state"
> >> > > >    },
> >> > > >    {
> >> > > >      "index": 1,
> >> > > >      "ttl": "259200000 ms",
> >> > > >      "name": "join-lef-state"
> >> > > >    }
> >> > > >
> >> > > > Best
> >> > > > Yun Tang
> >> > > > ________________________________
> >> > > > From: Jane Chan <[email protected]>
> >> > > > Sent: Friday, March 24, 2023 11:57
> >> > > > To: [email protected] <[email protected]>
> >> > > > Subject: Re: [DISCUSS] FLIP-292: Support configuring state TTL at
> >> > > operator
> >> > > > level for Table API & SQL programs
> >> > > >
> >> > > > Hi Shammon and Shuo,
> >> > > >
> >> > > > Thanks for your valuable comments!
> >> > > >
> >> > > > Some thoughts:
> >> > > >
> >> > > > @Shuo
> >> > > > > I think it's more properly to say that hint does not affect the
> >> > > > equivalenceof execution plans (hash agg vs sort agg), not the
> >> > equivalence
> >> > > > of execution
> >> > > > results, e.g., users can set 'scan.startup.mode' for kafka
> >> connector by
> >> > > > dynamic table option, which
> >> > > > also "intervene in the calculation of data results".
> >> > > >
> >> > > > IMO, the statement that "hint should not interfere with the
> >> calculation
> >> > > > results", means it should not interfere with internal computation.
> >> On
> >> > the
> >> > > > other hand, 'scan.startup.mode' interferes with the ingestion of
> the
> >> > > data.
> >> > > > I think these two concepts are different, but of course, this is
> >> just
> >> > my
> >> > > > opinion and welcome other views.
> >> > > >
> >> > > > > I think the final shape of state ttl configuring may like the
> >> that,
> >> > > > userscan define operator state ttl using SQL HINT (assumption...),
> >> but
> >> > it
> >> > > > may
> >> > > > affects more than one stateful operators inside the same query
> >> block,
> >> > > then
> >> > > > users can further configure a specific one by modifying the
> compiled
> >> > json
> >> > > > plan...
> >> > > >
> >> > > > Setting aside the issue of semantics, setting TTL from a higher
> >> level
> >> > > seems
> >> > > > to be attractive. This means that users only need to configure
> >> > > > 'table.exec.state.ttl' through the existing hint syntax to achieve
> >> the
> >> > > > effect. Everything is a familiar formula. But is it really the
> case?
> >> > > Hints
> >> > > > apply to a very broad range. Let me give an example.
> >> > > >
> >> > > > Suppose a user wants to set different TTLs for the two streams in
> a
> >> > > stream
> >> > > > join query. Where should the hints be written?
> >> > > >
> >> > > > -- the original query before configuring state TTL
> >> > > > create temporary view view1 as select .... from my_table_1;
> >> > > > create temporary view view2 as select .... from my_table_2;
> >> > > > create temporary view joined_view as
> >> > > > select view1.*, view2.* from my_view_1 a join my_view_2 b on
> >> > a.join_key =
> >> > > > b.join_key;
> >> > > >
> >> > > > Option 1: declaring hints at the very beginning of the table scan
> >> > > >
> >> > > > -- should he or she write hints when declaring the first temporary
> >> > view?
> >> > > > create temporary view view1 as select .... from my_table_1
> >> > > > /*+(OPTIONS('table.exec.state.ttl'
> >> > > > = 'foo'))*/;
> >> > > > create temporary view view2 as select .... from my_table_2
> >> > > > /*+(OPTIONS('table.exec.state.ttl'
> >> > > > = 'bar'))*/;
> >> > > > create temporary view joined_view as
> >> > > > select view1.*, view2.* from my_view_1 a join my_view_2 b on
> >> > a.join_key =
> >> > > > b.join_key;
> >> > > >
> >> > > > Option 2: declaring hints when performing the join
> >> > > >
> >> > > > -- or should he or she write hints when declaring the join
> temporary
> >> > > view?
> >> > > > create temporary view view1 as select .... from my_table_1;
> >> > > > create temporary view view2 as select .... from my_table_2;
> >> > > > create temporary view joined_view as
> >> > > > select view1.*, view2.* from my_view_1
> >> > > /*+(OPTIONS('table.exec.state.ttl' =
> >> > > > 'foo'))*/ a join my_view_2 /*+(OPTIONS('table.exec.state.ttl' =
> >> > > 'bar'))*/ b
> >> > > > on a.join_key = b.join_key;
> >> > > >
> >> > > > From the user's point of view, does he or she needs to care about
> >> the
> >> > > > difference between these two kinds of style? Users might think the
> >> two
> >> > > may
> >> > > > be equivalent; but in reality, as developers, how do we define the
> >> > range
> >> > > in
> >> > > > which hint starts and ends to take effect?
> >> > > >
> >> > > > Consider the following two assumptions
> >> > > >
> >> > > > 1. Assuming the hint takes effect from the moment it is declared
> and
> >> > > > applies to any subsequent stateful operators until it is
> overridden
> >> by
> >> > a
> >> > > > new hint.
> >> > > > If this is the assumption, it's clear that Option 1 and Option 2
> are
> >> > > > different because a ChangelogNormalize node can appear between
> scan
> >> and
> >> > > > join. Meanwhile, which stream's TTL to apply to the following
> query
> >> > after
> >> > > > the stream join? It is unclear if the user does not explicitly set
> >> it.
> >> > > > Should the engine make a random decision?
> >> > > >
> >> > > > 2. Assuming that the scope of the hint only applies to the current
> >> > query
> >> > > > block and does not extend to the next operator.
> >> > > > In this case, the first way of setting the hint will not work
> >> because
> >> > it
> >> > > > cannot be brought to the join operator. Users must choose the
> second
> >> > way
> >> > > to
> >> > > > configure. Are users willing to remember this strange constraint
> on
> >> SQL
> >> > > > writing style? Does this indicate a new learning cost?
> >> > > >
> >> > > > The example above is used to illustrate that while this approach
> may
> >> > seem
> >> > > > simple and direct, it actually has many limitations and may
> produce
> >> > > > unexpected behavior. Will users still find it attractive? IMO
> *hints
> >> > only
> >> > > > work for a very limited situation where the query is very simple,
> >> and
> >> > its
> >> > > > scope is more coarse and not operator-level*. Maybe it deserves
> >> another
> >> > > > FLIP to discuss whether we need a multiple-level state TTL
> >> > configuration
> >> > > > mechanism and how to properly implement it.
> >> > > >
> >> > > > @Shammon
> >> > > > > Generally, Flink jobs support two types
> >> > > > of submission: SQL and jar. If users want to use `TTL on Operator`
> >> for
> >> > > SQL
> >> > > > jobs, they need to edit the json file which is not supported by
> >> general
> >> > > job
> >> > > > submission systems such as flink sql-client, apache kyuubi, apache
> >> > > > streampark and .etc. Users need to download the file and edit it
> >> > > manually,
> >> > > > but they may not have the permissions to the storage system such
> as
> >> > HDFS
> >> > > in
> >> > > > a real production environment. From this perspective, I think it
> is
> >> > > > necessary to provide a way similar to
> >> > > > hits that users can configure the `TTL on Operator` in their sqls
> >> which
> >> > > > help users to use it conveniently.
> >> > > >
> >> > > > IIUC, SQL client supports the statement "EXECUTE PLAN
> >> > > > 'file:/foo/bar/example.json'". While I think there is not much
> >> evidence
> >> > > to
> >> > > > say we should choose to use hints, just because users cannot touch
> >> > their
> >> > > > development environment. As a reply to @Shuo,  the TTL set through
> >> hint
> >> > > way
> >> > > > is not at the operator level. And whether it is really
> "convenient"
> >> > needs
> >> > > > more discussion.
> >> > > >
> >> > > > > I agree with @Shuo's idea that for complex cases, users can
> >> combine
> >> > > hits
> >> > > > and `json plan` to configure `TTL on Operator` better.
> >> > > >
> >> > > > Suppose users can configure TTL through
> >> > > > <1> SET 'table.exec.state.ttl' = 'foo';
> >> > > > <2> Modify the compiled JSON plan;
> >> > > > <3> Use hints (personally I'm strongly against this way, but let's
> >> take
> >> > > it
> >> > > > into consideration).
> >> > > > IMO if the user can configure the same parameter in so many ways,
> >> then
> >> > > the
> >> > > > complex case only makes things worse. Who has higher priority and
> >> who
> >> > > > overrides who?
> >> > > >
> >> > > > Best,
> >> > > > Jane
> >> > > >
> >> > > >
> >> > > > On Fri, Mar 24, 2023 at 11:00 AM Shammon FY <[email protected]>
> >> wrote:
> >> > > >
> >> > > > > Hi jane
> >> > > > >
> >> > > > > Thanks for initializing this discussion. Configure TTL per
> >> operator
> >> > can
> >> > > > > help users manage state more effectively.
> >> > > > >
> >> > > > > I think the `compiled json plan` proposal may need to consider
> the
> >> > > impact
> >> > > > > on the user's submission workflow. Generally, Flink jobs support
> >> two
> >> > > > types
> >> > > > > of submission: SQL and jar. If users want to use `TTL on
> Operator`
> >> > for
> >> > > > SQL
> >> > > > > jobs, they need to edit the json file which is not supported by
> >> > general
> >> > > > job
> >> > > > > submission systems such as flink sql-client, apache kyuubi,
> apache
> >> > > > > streampark and .etc. Users need to download the file and edit it
> >> > > > manually,
> >> > > > > but they may not have the permissions to the storage system such
> >> as
> >> > > HDFS
> >> > > > in
> >> > > > > a real production environment.
> >> > > > >
> >> > > > > From this perspective, I think it is necessary to provide a way
> >> > similar
> >> > > > to
> >> > > > > hits that users can configure the `TTL on Operator` in their
> sqls
> >> > which
> >> > > > > help users to use it conveniently. At the same time, I agree
> with
> >> > > @Shuo's
> >> > > > > idea that for complex cases, users can combine hits and `json
> >> plan`
> >> > to
> >> > > > > configure `TTL on Operator` better. What do you think? Thanks
> >> > > > >
> >> > > > >
> >> > > > > Best,
> >> > > > > Shammon FY
> >> > > > >
> >> > > > >
> >> > > > > On Thu, Mar 23, 2023 at 9:58 PM Shuo Cheng <[email protected]>
> >> > wrote:
> >> > > > >
> >> > > > > > Correction: “users can set 'scan.startup.mode' for kafka
> >> connector”
> >> > > ->
> >> > > > > > “users
> >> > > > > > can set 'scan.startup.mode' for kafka connector by dynamic
> table
> >> > > > option”
> >> > > > > >
> >> > > > > > Shuo Cheng <[email protected]>于2023年3月23日 周四21:50写道：
> >> > > > > >
> >> > > > > > > Hi Jane,
> >> > > > > > > Thanks for driving this, operator level state ttl is
> >> absolutely a
> >> > > > > desired
> >> > > > > > > feature. I would share my opinion as following:
> >> > > > > > >
> >> > > > > > > If the scope of this proposal is limited as an enhancement
> for
> >> > > > compiled
> >> > > > > > > json plan, it makes sense. I think it does not conflict with
> >> > > > > configuring
> >> > > > > > > state ttl
> >> > > > > > > in other ways, e.g., SQL HINT or something else, because
> they
> >> > just
> >> > > > work
> >> > > > > > in
> >> > > > > > > different level, SQL Hint works in the exact entrance of SQL
> >> API,
> >> > > > while
> >> > > > > > > compiled json plan is the intermediate results for SQL.
> >> > > > > > > I think the final shape of state ttl configuring may like
> the
> >> > that,
> >> > > > > users
> >> > > > > > > can define operator state ttl using SQL HINT
> (assumption...),
> >> but
> >> > > it
> >> > > > > may
> >> > > > > > > affects more than one stateful operators inside the same
> query
> >> > > block,
> >> > > > > > then
> >> > > > > > > users can further configure a specific one by modifying the
> >> > > compiled
> >> > > > > json
> >> > > > > > > plan...
> >> > > > > > >
> >> > > > > > > In a word, this proposal is in good shape as an enhancement
> >> for
> >> > > > > compiled
> >> > > > > > > json plan, and it's orthogonal with other ways like SQL Hint
> >> > which
> >> > > > > works
> >> > > > > > in
> >> > > > > > > a higher level.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Nips:
> >> > > > > > >
> >> > > > > > > > "From the SQL semantic perspective, hints cannot intervene
> >> in
> >> > the
> >> > > > > > > calculation of data results."
> >> > > > > > > I think it's more properly to say that hint does not affect
> >> the
> >> > > > > > > equivalence of execution plans (hash agg vs sort agg), not
> the
> >> > > > > > equivalence
> >> > > > > > > of execution results, e.g., users can set
> 'scan.startup.mode'
> >> for
> >> > > > kafka
> >> > > > > > > connector, which also "intervene in the calculation of data
> >> > > results".
> >> > > > > > >
> >> > > > > > > Sincerely,
> >> > > > > > > Shuo
> >> > > > > > >
> >> > > > > > > On Tue, Mar 21, 2023 at 7:52 PM Jane Chan <
> >> [email protected]
> >> > >
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > >> Hi devs,
> >> > > > > > >>
> >> > > > > > >> I'd like to start a discussion on FLIP-292: Support
> >> configuring
> >> > > > state
> >> > > > > > TTL
> >> > > > > > >> at operator level for Table API & SQL programs [1].
> >> > > > > > >>
> >> > > > > > >> Currently, we only support job-level state TTL
> configuration
> >> via
> >> > > > > > >> 'table.exec.state.ttl'. However, users may expect a
> >> fine-grained
> >> > > > state
> >> > > > > > TTL
> >> > > > > > >> control to optimize state usage. Hence we propose to
> >> > > > > > serialize/deserialize
> >> > > > > > >> the state TTL as metadata of the operator's state to/from
> the
> >> > > > compiled
> >> > > > > > >> JSON
> >> > > > > > >> plan, to achieve the goal that specifying different state
> TTL
> >> > when
> >> > > > > > >> transforming the exec node to stateful operators.
> >> > > > > > >>
> >> > > > > > >> Look forward to your opinions!
> >> > > > > > >>
> >> > > > > > >> [1]
> >> > > > > > >>
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240883951
> >> > > > > > >>
> >> > > > > > >> Best Regards,
> >> > > > > > >> Jane Chan
> >> > > > > > >>
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] FLIP-292: Support configuring state TTL at operator level for Table API & SQL programs

Reply via email to