Re: [External] [DISCUSS] FLIP-292: Support configuring state TTL at operator level for Table API & SQL programs

Jane Chan Mon, 03 Apr 2023 07:42:57 -0700

Hi Yisha,

Thanks for your detailed explanation. Here are my thoughts.


> I'm interested in how to design a graphical interface to help users to
maintain their custom fine-grained configuration between their job versions.

Regarding the graphical IDE, this was mentioned to address the concern of
editing less human-readable compiled JSON files. As far as I know, Flink
previously provided a visualizer tool (https://flink.apache.org/visualizer/,
but now retired) that takes a JSON representation of the job execution plan
and visualizes it as a graph with complete annotations of execution
strategies. It could have been extended to support the stateful operator's
TTL edition and export to the new JSON text if it was still online.

Meanwhile, I'm unsure whether this usability issue is indispensable to the
community. I understand that many usability features, such as version
control of SQL text and graphical SQL editor, are outside the scope of
community features. Therefore, If everyone feels that visual editing of
compiled plans is an essential part of the community feature, then I
suggest that we should discuss a possible restart of this function in
another FLIP while also enhancing it, such as allowing users to edit
operators with TTL or other configs, and then exporting as a new JSON file.

By "between job versions", I assume it refers to the query evolution. I
think this is beyond the scope of this FLIP since query and schema changes
may result in entirely different jobs, while this FLIP aims to achieve the
operator-level TTL configuration per query.

Best regards,
Jane

On Thu, Mar 30, 2023 at 10:03 PM 周伊莎 <zhouyi...@bytedance.com.invalid>
wrote:

> Hi Jane,
>
> Thanks for your detailed response.
>
> You mentioned that there are 10k+ SQL jobs in your production
> > environment, but only ~100 jobs' migration involves plan editing. Is 10k+
> > the number of total jobs, or the number of jobs that use stateful
> > computation and need state migration?
> >
>
> 10k is the number of SQL jobs that enable periodic checkpoint. And
> surely if users change their sql which result in changes of the plan, they
> need to do state migration.
>
> - You mentioned that "A truth that can not be ignored is that users
> > usually tend to give up editing TTL(or operator ID in our case) instead
> of
> > migrating this configuration between their versions of one given job." So
> > what would users prefer to do if they're reluctant to edit the operator
> > ID? Would they submit the same SQL as a new job with a higher version to
> > re-accumulating the state from the earliest offset?
>
>
> You're exactly right. People will tend to re-accumulate the state from a
> given offset by changing the namespace of their checkpoint.
> Namespace is an internal concept and restarting the sql job in a new
> namespace can be simply understood as submitting a new job.
>
> Back to your suggestions, I noticed that FLIP-190 [3] proposed the
> > following syntax to perform plan migration
>
>
> The 'plan migration'  I said in my last reply may be inaccurate.  It's more
> like 'query evolution'. In other word, if a user submitted a sql job with a
> configured compiled plan, and then
> he changes the sql,  the compiled plan changes too, how to move the
> configuration in the old plan to the new plan.
> IIUC, FLIP-190 aims to solve issues in flink version upgrades and leave out
> the 'query evolution' which is a fundamental change to the query. E.g.
> adding a filter condition, a different aggregation.
> And I'm really looking forward to a solution for query evolution.
>
> And I'm also curious about how to use the hint
> > approach to cover cases like
> >
> > - configuring TTL for operators like ChangelogNormalize,
> > SinkUpsertMaterializer, etc., these operators are derived by the planner
> > implicitly
> > - cope with two/multiple input stream operator's state TTL, like join,
> > and other operations like row_number, rank, correlate, etc.
>
>
>  Actually, in our company , we make operators in the query block where the
> hint locates all affected by that hint. For example,
>
> INSERT INTO sink
> > SELECT /*+ STATE_TTL('1D') */
> >    id,
> >    name,
> >    num
> > FROM (
> >    SELECT
> >        *,
> >        ROW_NUMBER() OVER (PARTITION BY id ORDER BY num DESC) as row_num
> >    FROM (
> >        SELECT
> >            *
> >        FROM (
> >            SELECT
> >                id,
> >                name,
> >                max(num) as num
> >            FROM source1
> >            GROUP BY
> >                id, name, TUMBLE(proc, INTERVAL '1' MINUTE)
> >        )
> >        GROUP BY
> >            id, name, num
> >    )
> > )
> > WHERE row_num = 1
> >
>
> In the SQL above, the state TTL of Rank and Agg will be all configured as 1
> day.  If users want to set different TTL for Rank and Agg, they can just
> make these two queries located in two different query blocks.
> It looks quite rough but straightforward enough.  For each side of join
> operator, one of my users proposed a syntax like below:
>
> > /*+
> JOIN_TTL('tables'='left_talbe,right_table','left_ttl'='100000','right_ttl'='10000')
> */
> >
> > We haven't accepted this proposal now, maybe we could find some better
> design for this kind of case. Just for your information.
>
> I think if we want to utilize hints to support fine-grained configuration,
> we can open a new FLIP to discuss it.
> BTW, personally, I'm interested in how to design a graphical interface to
> help users to maintain their custom fine-grained configuration between
> their job versions.
>
> Best regards,
> Yisha
>

Re: [External] [DISCUSS] FLIP-292: Support configuring state TTL at operator level for Table API & SQL programs

Reply via email to