Re: [DISCUSS] Preventing Mockito usage for the new code with Checkstyle

2023-04-27 Thread weijie guo
+1 for introducing this rule for junit4 and mockito.

Best regards,

Weijie


Alexander Fedulov  于2023年4月26日周三 23:50写道:

> +1 for the proposal,
>
> Best,
> Alex
>
> On Wed, 26 Apr 2023 at 15:50, Chesnay Schepler  wrote:
>
> > * adds a note to not include "import " in the regex" *
> >
> > On 26/04/2023 11:22, Maximilian Michels wrote:
> > > If we ban Mockito imports, I can still write tests using the full
> > > qualifiers, right?
> > >
> > > For example:
> > >
> >
> org.mockito.Mockito.when(somethingThatShouldHappen).thenReturn(somethingThatNeverActuallyHappens)
> > >
> > > Just kidding, +1 on the proposal.
> > >
> > > -Max
> > >
> > > On Wed, Apr 26, 2023 at 9:02 AM Panagiotis Garefalakis
> > >  wrote:
> > >> Thanks for bringing this up!  +1 for the proposal
> > >>
> > >> @Jing Ge -- we don't necessarily need to completely migrate to Junit5
> > (even
> > >> though it would be ideal).
> > >> We could introduce the checkstyle rule and add suppressions for the
> > >> existing problematic paths (as we do today for other rules e.g.,
> > >> AvoidStarImport)
> > >>
> > >> Cheers,
> > >> Panagiotis
> > >>
> > >> On Tue, Apr 25, 2023 at 11:48 PM Weihua Hu 
> > wrote:
> > >>
> > >>> Thanks for driving this.
> > >>>
> > >>> +1 for Mockito and Junit4.
> > >>>
> > >>> A clarity checkstyle will be of great help to new developers.
> > >>>
> > >>> Best,
> > >>> Weihua
> > >>>
> > >>>
> > >>> On Wed, Apr 26, 2023 at 1:47 PM Jing Ge 
> > >>> wrote:
> > >>>
> >  This is a great idea, thanks for bringing this up. +1
> > 
> >  Also +1 for Junit4. If I am not mistaken, it could only be done
> after
> > the
> >  Junit5 migration is done.
> > 
> >  @Chesnay thanks for the hint. Do we have any doc about it? If not,
> it
> > >>> might
> >  deserve one. WDYT?
> > 
> >  Best regards,
> >  Jing
> > 
> >  On Wed, Apr 26, 2023 at 5:13 AM Lijie Wang <
> wangdachui9...@gmail.com>
> >  wrote:
> > 
> > > Thanks for driving this. +1 for the proposal.
> > >
> > > Can we also prevent Junit4 usage in new code by this way?Because
> >  currently
> > > we are aiming to migrate our codebase to JUnit 5.
> > >
> > > Best,
> > > Lijie
> > >
> > > Piotr Nowojski  于2023年4月25日周二 23:02写道:
> > >
> > >> Ok, thanks for the clarification.
> > >>
> > >> Piotrek
> > >>
> > >> wt., 25 kwi 2023 o 16:38 Chesnay Schepler 
> > > napisał(a):
> > >>> The checkstyle rule would just ban certain imports.
> > >>> We'd add exclusions for all existing usages as we did when
> >  introducing
> > >>> other rules.
> > >>> So far we usually disabled checkstyle rules for a specific files.
> > >>>
> > >>> On 25/04/2023 16:34, Piotr Nowojski wrote:
> >  +1 to the idea.
> > 
> >  How would this checkstyle rule work? Are you suggesting to start
> > > with a
> >  number of exclusions? On what level will those exclusions be?
> Per
> > > file?
> > >>> Per
> >  line?
> > 
> >  Best,
> >  Piotrek
> > 
> >  wt., 25 kwi 2023 o 13:18 David Morávek 
> >  napisał(a):
> > > Hi Everyone,
> > >
> > > A long time ago, the community decided not to use Mockito-based
> > > tests
> > > because those are hard to maintain. This is already baked in
> our
> > > Code
> > >>> Style
> > > and Quality Guide [1].
> > >
> > > Because we still have Mockito imported into the code base, it's
> >  very
> > >>> easy
> > > for newcomers to unconsciously introduce new tests violating
> the
> > > code
> > >>> style
> > > because they're unaware of the decision.
> > >
> > > I propose to prevent Mockito usage with a Checkstyle rule for a
> >  new
> > >>> code,
> > > which would eventually allow us to eliminate it. This could
> also
> > >> prevent
> > > some wasted work and unnecessary feedback cycles during
> reviews.
> > >
> > > WDYT?
> > >
> > > [1]
> > >
> > >
> > >>>
> >
> https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#avoid-mockito---use-reusable-test-implementations
> > > Best,
> > > D.
> > >
> > >>>
> >
> >
>


[jira] [Created] (FLINK-31952) Support 'EXPLAIN' statement for CompiledPlan

2023-04-27 Thread Jane Chan (Jira)
Jane Chan created FLINK-31952:
-

 Summary: Support 'EXPLAIN' statement for CompiledPlan
 Key: FLINK-31952
 URL: https://issues.apache.org/jira/browse/FLINK-31952
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.18.0
Reporter: Jane Chan
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Planning Flink 2.0

2023-04-27 Thread David Morávek
Hi,

Great to see this topic moving forward; I agree it's long overdue.

I keep thinking about 2.0 as a chance to eliminate things that didn't work,
make the feature set denser, and fix rough edges and APIs that hold us back.

Some items in the doc (Key Features section) don't tick these boxes for me,
as they could also be implemented in the 1x branch. We should consider
whether we need a backward incompatible release to introduce each feature.
This should help us to keep the discussion more focused.

Best,
D.


On Wed, Apr 26, 2023 at 2:33 PM DONG Weike  wrote:

> Hi,
>
> It is thrilling to see the foreseeable upcoming rollouts of Flink 2.x
> releases, and I believe that this roadmap can take Flink to the next stage
> of a top-of-notch unified streaming & batch computing engine.
>
> Given that all of the existing user programs are written and run in Flink
> 1.x versions as for now, and some of them are very complex and rely on
> various third-party connectors written with legacy APIs, one thing that I
> have concerns about is if, one day in the future, the community decides
> that new features are only given to 2.x releases, could the last release of
> Flink 1.x be converted as an LTS version (backporting severe bug fixes and
> critical security patches), so that existing users could have enough time
> to wait for third-party connectors to upgrade, test their programs on the
> Flink APIs, and avoid sudden loss of community support.
>
> Just my two cents : )
>
> Best,
> Weike
>
> 
> 发件人: Xintong Song 
> 发送时间: 2023年4月26日 20:01
> 收件人: dev 
> 主题: Re: [DISCUSS] Planning Flink 2.0
>
> @Chesnay
>
>
> > Technically this implies that every minor release may contain breaking
> > changes, which is exactly what users don't want.
>
>
> It's not necessary to introduce the breaking chagnes immediately upon
> reaching the minimum guaranteed stable time. If there are multiple changes
> waiting for the stable time, we can still gather them in 1 minor release.
> But I see your point, from the user's perspective, the mechanism does not
> provide any guarantees for the compatibility of minor releases.
>
> What problems to do you see in creating major releases every N years?
> >
>
> It might not be concrete problem, but I'm a bit concerned by the
> uncertainty. I assume N should not be too small, e.g., at least 3. I'd
> expect the decision to ship a major release would be made based on
> comprehensive considerations over the situations at that time. Making a
> decision now that we would ship a major release 3 years later seems a bit
> agressive to me.
>
> We need to figure out what this release means for connectors
> > compatibility-wise.
> >
>
> +1
>
>
> > What process are you thinking of for deciding what breaking changes to
> > make? The obvious choice would be FLIPs, but I'm worried that this will
> > overload the mailing list / wiki for lots of tiny changes.
> >
>
> This should be a community decision. What I have in mind would be: (1)
> collect a wish list on wiki, (2) schedule a series of online meetings (like
> the release syncs) to get an agreed set of must-have items, (3) develop and
> polish the detailed plans of items via FLIPs, and (4) if the plan for a
> must-have item does not work out then go back to (2) for an update. I'm
> also open to other opinions.
>
> Would we wait a few months for people to prepare/agree on changes so we
> > reduce the time we need to merge things into 2 branches?
> >
>
> That's what I had in mind. Hopefully after 1.18.
>
> @Max
>
> When I look at
> >
> https://docs.google.com/document/d/1_PMGl5RuDQGlV99_gL3y7OiRsF0DgCk91Coua6hFXhE/edit
> > , I'm a bit skeptical we will even be able to reach all these goals. I
> > think we have to prioritize and try to establish a deadline. Otherwise we
> > will end up never releasing 2.0.
>
>
> Sorry for the confusion. I should have explain this more clearly. We are
> not planning to finish all the items in the list. It's more like a
> brainstorm, a list of candidates. We are also expecting to collect more
> ideas from the community. And after collecting the ideas, we should
> prioritize them and decide on a subset of must-have items, following the
> consensus decision making.
>
> +1 on Flink 2.0 by May 2024 (not a hard deadline but I think having a
> > deadline helps).
> >
>
> I agree that having a deadline helps. I proposed mid 2024, which is similar
> to but not as explicit as what you proposed. We may start with having a
> deadline for deciding the must-have items (e.g., by the end of June).  That
> should make it easier for estimating the overall time needed for preparing
> the release.
>
> Best,
>
> Xintong
>
>
>
> On Wed, Apr 26, 2023 at 6:57 PM Gyula Fóra  wrote:
>
> > +1 to everything Max said.
> >
> > Gyula
> >
> > On Wed, 26 Apr 2023 at 11:42, Maximilian Michels  wrote:
> >
> > > Thanks for starting the discussion, Jark and Xingtong!
> > >
> > > Flink 2.0 is long overdue. In the past, the expectations for such 

[jira] [Created] (FLINK-31953) FLIP-288: Enable Dynamic Partition Discovery by Default in Kafka Source

2023-04-27 Thread Hongshun Wang (Jira)
Hongshun Wang created FLINK-31953:
-

 Summary: FLIP-288: Enable Dynamic Partition Discovery by Default 
in Kafka Source
 Key: FLINK-31953
 URL: https://issues.apache.org/jira/browse/FLINK-31953
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka
Reporter: Hongshun Wang
 Fix For: kafka-4.0.0


This improvement implements 
{{[[FLIP-288|https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source]|https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source]}}

 to enable partition discovery by default and set EARLIEST offset strategy for 
later discovered partitions.
h4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31954) Prevent Mockito for the new code with Checkstyle

2023-04-27 Thread Jira
David Morávek created FLINK-31954:
-

 Summary: Prevent Mockito for the new code with Checkstyle
 Key: FLINK-31954
 URL: https://issues.apache.org/jira/browse/FLINK-31954
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: David Morávek


Based on [https://lists.apache.org/thread/xl456044hmxk87mwq02p4m22yp3b04sc] 
discussion.

 

We'll set up a Checkstyle rule that disallows Mockito usage and create a 
one-off suppression list for the existing violations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31955) Prevent JUnit 4 usage for the new code with Checkstyle

2023-04-27 Thread Jira
David Morávek created FLINK-31955:
-

 Summary: Prevent JUnit 4 usage for the new code with Checkstyle
 Key: FLINK-31955
 URL: https://issues.apache.org/jira/browse/FLINK-31955
 Project: Flink
  Issue Type: Improvement
  Components: Build System
 Environment: Based on 
[https://lists.apache.org/thread/xl456044hmxk87mwq02p4m22yp3b04sc] discussion.

 

We'll set up a Checkstyle rule that disallows JUnit 4 usage and create a 
one-off suppression list for the existing violations.
Reporter: David Morávek






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Preventing Mockito usage for the new code with Checkstyle

2023-04-27 Thread David Morávek
Thanks, everyone, for participating. There seems to be a broad consensus,
so I'll move forward. I've created [1] and [2] to track this.

[1] https://issues.apache.org/jira/browse/FLINK-31954
[2] https://issues.apache.org/jira/browse/FLINK-31955

Best,
D.

On Thu, Apr 27, 2023 at 9:25 AM weijie guo 
wrote:

> +1 for introducing this rule for junit4 and mockito.
>
> Best regards,
>
> Weijie
>
>
> Alexander Fedulov  于2023年4月26日周三 23:50写道:
>
> > +1 for the proposal,
> >
> > Best,
> > Alex
> >
> > On Wed, 26 Apr 2023 at 15:50, Chesnay Schepler 
> wrote:
> >
> > > * adds a note to not include "import " in the regex" *
> > >
> > > On 26/04/2023 11:22, Maximilian Michels wrote:
> > > > If we ban Mockito imports, I can still write tests using the full
> > > > qualifiers, right?
> > > >
> > > > For example:
> > > >
> > >
> >
> org.mockito.Mockito.when(somethingThatShouldHappen).thenReturn(somethingThatNeverActuallyHappens)
> > > >
> > > > Just kidding, +1 on the proposal.
> > > >
> > > > -Max
> > > >
> > > > On Wed, Apr 26, 2023 at 9:02 AM Panagiotis Garefalakis
> > > >  wrote:
> > > >> Thanks for bringing this up!  +1 for the proposal
> > > >>
> > > >> @Jing Ge -- we don't necessarily need to completely migrate to
> Junit5
> > > (even
> > > >> though it would be ideal).
> > > >> We could introduce the checkstyle rule and add suppressions for the
> > > >> existing problematic paths (as we do today for other rules e.g.,
> > > >> AvoidStarImport)
> > > >>
> > > >> Cheers,
> > > >> Panagiotis
> > > >>
> > > >> On Tue, Apr 25, 2023 at 11:48 PM Weihua Hu 
> > > wrote:
> > > >>
> > > >>> Thanks for driving this.
> > > >>>
> > > >>> +1 for Mockito and Junit4.
> > > >>>
> > > >>> A clarity checkstyle will be of great help to new developers.
> > > >>>
> > > >>> Best,
> > > >>> Weihua
> > > >>>
> > > >>>
> > > >>> On Wed, Apr 26, 2023 at 1:47 PM Jing Ge  >
> > > >>> wrote:
> > > >>>
> > >  This is a great idea, thanks for bringing this up. +1
> > > 
> > >  Also +1 for Junit4. If I am not mistaken, it could only be done
> > after
> > > the
> > >  Junit5 migration is done.
> > > 
> > >  @Chesnay thanks for the hint. Do we have any doc about it? If not,
> > it
> > > >>> might
> > >  deserve one. WDYT?
> > > 
> > >  Best regards,
> > >  Jing
> > > 
> > >  On Wed, Apr 26, 2023 at 5:13 AM Lijie Wang <
> > wangdachui9...@gmail.com>
> > >  wrote:
> > > 
> > > > Thanks for driving this. +1 for the proposal.
> > > >
> > > > Can we also prevent Junit4 usage in new code by this way?Because
> > >  currently
> > > > we are aiming to migrate our codebase to JUnit 5.
> > > >
> > > > Best,
> > > > Lijie
> > > >
> > > > Piotr Nowojski  于2023年4月25日周二 23:02写道:
> > > >
> > > >> Ok, thanks for the clarification.
> > > >>
> > > >> Piotrek
> > > >>
> > > >> wt., 25 kwi 2023 o 16:38 Chesnay Schepler 
> > > > napisał(a):
> > > >>> The checkstyle rule would just ban certain imports.
> > > >>> We'd add exclusions for all existing usages as we did when
> > >  introducing
> > > >>> other rules.
> > > >>> So far we usually disabled checkstyle rules for a specific
> files.
> > > >>>
> > > >>> On 25/04/2023 16:34, Piotr Nowojski wrote:
> > >  +1 to the idea.
> > > 
> > >  How would this checkstyle rule work? Are you suggesting to
> start
> > > > with a
> > >  number of exclusions? On what level will those exclusions be?
> > Per
> > > > file?
> > > >>> Per
> > >  line?
> > > 
> > >  Best,
> > >  Piotrek
> > > 
> > >  wt., 25 kwi 2023 o 13:18 David Morávek 
> > >  napisał(a):
> > > > Hi Everyone,
> > > >
> > > > A long time ago, the community decided not to use
> Mockito-based
> > > > tests
> > > > because those are hard to maintain. This is already baked in
> > our
> > > > Code
> > > >>> Style
> > > > and Quality Guide [1].
> > > >
> > > > Because we still have Mockito imported into the code base,
> it's
> > >  very
> > > >>> easy
> > > > for newcomers to unconsciously introduce new tests violating
> > the
> > > > code
> > > >>> style
> > > > because they're unaware of the decision.
> > > >
> > > > I propose to prevent Mockito usage with a Checkstyle rule
> for a
> > >  new
> > > >>> code,
> > > > which would eventually allow us to eliminate it. This could
> > also
> > > >> prevent
> > > > some wasted work and unnecessary feedback cycles during
> > reviews.
> > > >
> > > > WDYT?
> > > >
> > > > [1]
> > > >
> > > >
> > > >>>
> > >
> >
> https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#avoid-mockito---use-reusable-test-implementations
> > > > Best,
> > > > D.
> > > >
> > > >>>

[jira] [Created] (FLINK-31956) Extend the CompiledPlan to read from/write to Flink's FileSystem

2023-04-27 Thread Jane Chan (Jira)
Jane Chan created FLINK-31956:
-

 Summary: Extend the CompiledPlan to read from/write to Flink's 
FileSystem
 Key: FLINK-31956
 URL: https://issues.apache.org/jira/browse/FLINK-31956
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Client, Table SQL / Planner
Affects Versions: 1.18.0
Reporter: Jane Chan
 Fix For: 1.18.0


At present, COMPILE/EXECUTE PLAN FOR '${plan.json}' only supports writing 
to/reading from a local file without the scheme. We propose to extend the 
support for Flink's FileSystem.
{code:java}
-- before
COMPILE PLAN FOR '/tmp/foo/bar.json' 
EXECUTE PLAN FOR '/tmp/foo/bar.json' 

-- after
COMPILE PLAN FOR 'file:///tmp/foo/bar.json' 
COMPILE PLAN FOR 'hdfs:///tmp/foo/bar.json' 
COMPILE PLAN FOR 's3:///tmp/foo/bar.json' 
COMPILE PLAN FOR 'oss:///tmp/foo/bar.json'  
EXECUTE PLAN FOR 'file:///tmp/foo/bar.json'
EXECUTE PLAN FOR 'hdfs:///tmp/foo/bar.json'
EXECUTE PLAN FOR 's3:///tmp/foo/bar.json'
EXECUTE PLAN FOR 'oss:///tmp/foo/bar.json' {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31957) Add documentation for the user story

2023-04-27 Thread Jane Chan (Jira)
Jane Chan created FLINK-31957:
-

 Summary: Add documentation for the user story
 Key: FLINK-31957
 URL: https://issues.apache.org/jira/browse/FLINK-31957
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation, Table SQL / Planner
Affects Versions: 1.18.0
Reporter: Jane Chan
 Fix For: 1.18.0


Add documentation on how to use compiled plan to configure operator-level state 
TTL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31958) Table to DataStream allow partial fields

2023-04-27 Thread padavan (Jira)
padavan created FLINK-31958:
---

 Summary: Table to DataStream allow partial fields
 Key: FLINK-31958
 URL: https://issues.apache.org/jira/browse/FLINK-31958
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream, Table SQL / API
Reporter: padavan


Hello i have a Model with many many fields, example:

{{}}
{code:java}

{code}
{{public class UserModel { }}
{{public int userId; }}
{{public int count; }}
{{public int zip;}}
{{public LocalDateTime dt; }}
{{public LocalDateTime wStart; }}
{{public LocalDateTime wEnd; }}}

 

I work with Table API, select fields and convert Table to DataStream by Model. 
But problem what *i should select all fields if I don't even need it* or i will 
get exception
{quote}Column types of query result and sink for do not match. Cause: Different 
number of columns.
{quote}
And I just have to substitute fake data for the plugs...

 

I want simple use with only fields wich i have selected like:

{{}}
{code:java}

{code}
{{.select($("userId"), $("count").sum().as("count")); }}
{{DataStream dataStream = te.toDataStream(win, UserModel.class);}}

 

 

*Excepted:* 

Remove rule valdiation "Different number of columns.". If a column is not 
selected it is initialized by default(T) / Null



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31959) Correct the unaligned checkpoint type at checkpoint level

2023-04-27 Thread Rui Fan (Jira)
Rui Fan created FLINK-31959:
---

 Summary: Correct the unaligned checkpoint type at checkpoint level
 Key: FLINK-31959
 URL: https://issues.apache.org/jira/browse/FLINK-31959
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.16.1, 1.17.0, 1.18.0
Reporter: Rui Fan
Assignee: Rui Fan


FLINK-18851 added the checkpoint type in web UI to distinguish aligned 
checkpoint, unaligned checkpoint, savepoint and savepoint on cancel in {*}Flink 
1.12{*}.

It distinguishes between UC and AC based on whether UC is enabled or 
disabled.[1]

However, FLINK-19680, FLINK-19681 and FLINK-19682 introduced the 
alignment-timeout in {*}Flink 1.13{*}, and it has been changed to 
{{aligned-checkpoint-timeout.}}
{code:java}
When activated, each checkpoint will still begin as an aligned checkpoint, but 
when the global checkpoint duration exceeds the aligned-checkpoint-timeout, if 
the aligned checkpoint has not completed, then the checkpoint will proceed as 
an unaligned checkpoint.{code}
If UC and AC-timeout is enabled and the checkpoint is completed as aligned 
checkpoint. It should show the unaligned checkpoint instead of aligned 
checkpoint.

 

[1] 
https://github.com/apache/flink/blob/a3368635e3d06f764d144f8c8e2e06e499e79665/flink-runtime-web/web-dashboard/src/app/pages/job/checkpoints/detail/job-checkpoints-detail.component.ts#L118

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-306: Unified File Merging Mechanism for Checkpoints

2023-04-27 Thread Yuan Mei
Hey all,

Thanks @Zakelly for driving this effort and thanks everyone for the warm
discussion. Sorry for the late response.

As I and Zakelly have already discussed and reviewed the design carefully
when drafting this FLIP, I do not have additional inputs here. But I want
to highlight several points that I've been quoted and explain why I think
the current design is a reasonable and clean one.

*Why this FLIP is proposed*
File Flooding is a problem for Flink I've seen many people bring up
throughout the years, especially for large clusters. Unfortunately, there
are not yet accepted solutions for the most commonly used state backend
like RocksDB. This FLIP was originally targeted to address merging
SST(KeyedState) checkpoint files.

While we are comparing different design choices, we found that different
types of checkpoint files (OPState, Unaligned CP channel state, Changelog
incremental state) share similar considerations, for example, file
management, file merging granularity, and e.t.c. That's why we want to
abstract a unified framework for merging these different types of
checkpoint files and provide flexibility to choose between merging
efficiency, rescaling/restoring cost, File system capabilities (affecting
File visibility), and e.t.c.

*File Ownership moved from JM to TM, WHY*
One of the major differences in the proposed design is moving file
ownership from JM to TM. A lot of questions/concerns are coming from here,
let me answer them one by one:

*1. Why the current JM SharedRegistry is not enough and do we have to
introduce more complexity?*
SharedRegistry maintains the mapping between *a file -> max CP ID using the
file*
For merging files, we have to introduce another level of mapping *a file ->
checkpoint file segment (merged files)*
So yes, no matter what, the second level of mapping has to be managed
somewhere, either JM or TM.

*2. Why the **complexity (second level of mapping)** cannot be maintained
in JM?*
- As a centralized service, JM has already been complicated and overloaded.
As mentioned by @Yanfei Lei , "triggering checkpoints
can be delayed by discarding shared state when JM manages a large number of
files FLINK-26590". This ends up setting the JM thread pool to 500!
- As explained by @Zakelly in the previous thread, the contract "for
Checkpoint N, only re-use shared state handles that have been already
referenced by checkpoint N-1" is not guaranteed for the concurrent
checkpoint in the current JM-owned design.  This problem can not be
addressed without significant changes in how SharedRegistry and checkpoint
subsume work, which, I do not think is worth it since "concurrent_CP>1" is
not used that much in prod.

*3. We have similar discussions before, moving ownership from JM to TM, why
it is not adopted at that time? *
As mentioned by Yun and Piotr, we have had similar discussions to move
ownership from JM to TM when designing the changelog state backend. The
reason why we stuck to JM ownership at that time is mainly due to
engineering time/effort constraints.
This time, since we need an extra level of mapping, which complicates the
JM logic even further, we indeed need to shade the complexity within the TM
to avoid more communications between JM and TM.
Zakelly has already shared the code branch (about 2000 lines), and it is
simple.

*4. Cloud-Native Trend*
The current centralized file management framework contradicts the
cloud-native trend. That's also one of the reasons moving ownership from JM
to TM was first proposed. The proposed design and implementation is a
worthy try-out in this direction. I'd like to put some more effort in this
direction if this really turns out working well.

One more thing I want to mention is that the proposed design shaded all the
code changes and complexities in the newly introduced File management in
TM. That says without enabling File merging, the code path of File managing
remains the same as before. So it is also a safe change in this sense.

Best,
Yuan



On Wed, Apr 12, 2023 at 5:23 PM Zakelly Lan  wrote:

> Hi Yun,
>
> I reorganized our discussion and added a comparison table of state
> ownership with some previous designs. Please take a look at section
> "4.9. State ownership comparison with other designs".
>
> But I don't see them as alternatives since the design of state
> ownership is integrated with this FLIP. That is to say, we are
> providing a file merging solution including file management for new
> merged files, other ownership models are not feasible for the current
> merging plan. If the state ownership changes, the design of merging
> files at different granularities also needs to be changed accordingly.
> WDYT?
>
>
> Best regards,
> Zakelly
>
> On Tue, Apr 11, 2023 at 10:18 PM Yun Tang  wrote:
> >
> > Hi Zakelly,
> >
> > Since we already had some discussions on this topic in the doc I
> mentioned, could you please describe the difference in your FLIP?
> >
> > I think we should better have a comparing table across different opti

Re: [DISCUSS] FLIP 295: Support persistence of Catalog configuration and asynchronous registration

2023-04-27 Thread Jing Ge
Hi Feng,

Thanks for working on the FLIP. There are still some NIT issues in the FLIP
like:

1. Optional catalogStore has been used as CatalogStore
instead of Optional in the code example. It should be fine to use it as
pseudo code for now and update it after you submit the PR.
2. addCatalog(...) is still used somewhere in the rejected section which
should be persistContext(...) to keep it consistent.

Speaking of the conflict issues in the multi-instance scenarios, I am not
sure if this is the intended behaviour. If Map catalogs is
used as a cache, it should be invalid, once the related catalog has been
removed from the CatalogStore by another instance. Did I miss something?

Best regards,
Jing

On Thu, Apr 13, 2023 at 4:40 PM Feng Jin  wrote:

> Hi Jing,Shammon
> Thanks for your reply.
>
> @Jing
>
> > How about persistCatalog()?
> I think this is a good function name, I have updated it in the
> documentation.
>
> >Some common cache features should be implemented
> Thank you for the suggestion. If alternative 1 turns out to be more
> appropriate later, I will improve this part of the content.
>
> > As the above discussion moves forward, the option 2 solution looks more
> like a replacement of option 1
> Yes, after discussing with Shammon offline, we think that solution 2 might
> be more suitable and also avoid any inconsistency issues.
>
> > There are some inconsistent descriptions in the content.  Would you like
> to clean them up?
> I will do my best to improve the document and appreciate your suggestions.
>
>
>
> @Shammon
> > can you put the unselected option in `Rejected Alternatives`
> Sure, I have moved it to the `Rejected Alternatives`.
>
>
>
> Best
> Feng
>
>
>
> On Thu, Apr 13, 2023 at 8:52 AM Shammon FY  wrote:
>
> > Hi Feng
> >
> > Thanks for your update.
> >
> > I found there are two options in `Proposed Changes`, can you put the
> > unselected option in `Rejected Alternatives`? I think this may help us
> > better understand your proposal
> >
> >
> > Best,
> > Shammon FY
> >
> >
> > On Thu, Apr 13, 2023 at 4:49 AM Jing Ge 
> > wrote:
> >
> > > Hi Feng,
> > >
> > > Thanks for raising this FLIP. I am still confused after completely
> > reading
> > > the thread with following questions:
> > >
> > > 1. Naming confusion - registerCatalog() and addCatalog() have no big
> > > difference based on their names. One of them is responsible for data
> > > persistence. How about persistCatalog()?
> > > 2. As you mentioned that Map catalogs is used as a
> cache
> > > and catalogStore is used for data persistence. I would suggest
> describing
> > > their purpose conceptually and clearly in the FLIP. Some common cache
> > > features should be implemented, i.e. data in the cache and the store
> > should
> > > be consistent. Same Catalog instance should be found in the store and
> in
> > > the cache(either it has been initialized or it will be lazy
> initialized)
> > > for the same catalog name. The consistency will be taken care of while
> > > updating the catalog.
> > > 3. As the above discussion moves forward, the option 2 solution looks
> > more
> > > like a replacement of option 1, because, afaiu, issues mentioned
> > > previously with option 1 are not solved yet. Do you still want to
> propose
> > > both options and ask for suggestions for both of them?
> > > 4. After you updated the FLIP, there are some inconsistent descriptions
> > in
> > > the content.  Would you like to clean them up? Thanks!
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > On Fri, Apr 7, 2023 at 9:24 AM Feng Jin  wrote:
> > >
> > > > hi Shammon
> > > >
> > > > Thank you for your response, and I completely agree with your point
> of
> > > > view.
> > > > Initially, I may have over complicated the whole issue. First and
> > > foremost,
> > > > we need to consider the persistence of the Catalog's Configuration.
> > > > If we only need to provide persistence for Catalog Configuration, we
> > can
> > > > add a toConfiguration method to the Catalog interface.
> > > > This method can convert a Catalog instance to a Map
> > > > properties, and the default implementation will throw an exception.
> > > >
> > > > public interface Catalog {
> > > >/**
> > > >* Returns a map containing the properties of the catalog object.
> > > >*
> > > >* @return a map containing the properties of the catalog object
> > > >* @throws UnsupportedOperationException if the implementing class
> > does
> > > > not override
> > > >* the default implementation of this method
> > > >*/
> > > >   default Map toProperties() {
> > > > throw new UnsupportedOperationException("Please implement
> > > toProperties
> > > > for this catalog");
> > > >   }
> > > > }
> > > >
> > > > The specific process is as follows:
> > > >
> > > > 1.  If the user has configured a CatalogStore, the toConfiguration()
> > > method
> > > > will be called when registering a Catalog instance with
> > > > registerCatalog(String catalogName, Catalog catalog). Th

Re: [Discussion] - Release major Flink version to support JDK 17 (LTS)

2023-04-27 Thread Jing Ge
Thanks Tamir for the information. According to the latest comment of the
task FLINK-24998, this bug should be gone while using the latest JDK 17. I
was wondering whether it means that there are no more issues to stop us
releasing a major Flink version to support Java 17? Did I miss something?

Best regards,
Jing

On Thu, Apr 27, 2023 at 8:18 AM Tamir Sagi 
wrote:

> More details about the JDK bug here
> https://bugs.openjdk.org/browse/JDK-8277529
>
> Related Jira ticket
> https://issues.apache.org/jira/browse/FLINK-24998
>
> --
> *From:* Jing Ge via user 
> *Sent:* Monday, April 24, 2023 11:15 PM
> *To:* Chesnay Schepler 
> *Cc:* Piotr Nowojski ; Alexis Sarda-Espinosa <
> sarda.espin...@gmail.com>; Martijn Visser ;
> dev@flink.apache.org ; user 
> *Subject:* Re: [Discussion] - Release major Flink version to support JDK
> 17 (LTS)
>
>
> *EXTERNAL EMAIL*
>
>
> Thanks Chesnay for working on this. Would you like to share more info
> about the JDK bug?
>
> Best regards,
> Jing
>
> On Mon, Apr 24, 2023 at 11:39 AM Chesnay Schepler 
> wrote:
>
> As it turns out Kryo isn't a blocker; we ran into a JDK bug.
>
> On 31/03/2023 08:57, Chesnay Schepler wrote:
>
>
> https://github.com/EsotericSoftware/kryo/wiki/Migration-to-v5#migration-guide
>
> Kroy themselves state that v5 likely can't read v2 data.
>
> However, both versions can be on the classpath without classpath as v5
> offers a versioned artifact that includes the version in the package.
>
> It probably wouldn't be difficult to migrate a savepoint to Kryo v5,
> purely from a read/write perspective.
>
> The bigger question is how we expose this new Kryo version in the API. If
> we stick to the versioned jar we need to either duplicate all current
> Kryo-related APIs or find a better way to integrate other serialization
> stacks.
> On 30/03/2023 17:50, Piotr Nowojski wrote:
>
> Hey,
>
> > 1. The Flink community agrees that we upgrade Kryo to a later version,
> which means breaking all checkpoint/savepoint compatibility and releasing a
> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API support
> dropped. This is probably the quickest way, but would still mean that we
> expose Kryo in the Flink APIs, which is the main reason why we haven't been
> able to upgrade Kryo at all.
>
> This sounds pretty bad to me.
>
> Has anyone looked into what it would take to provide a smooth migration
> from Kryo2 -> Kryo5?
>
> Best,
> Piotrek
>
> czw., 30 mar 2023 o 16:54 Alexis Sarda-Espinosa 
> napisał(a):
>
> Hi Martijn,
>
> just to be sure, if all state-related classes use a POJO serializer, Kryo
> will never come into play, right? Given FLINK-16686 [1], I wonder how many
> users actually have jobs with Kryo and RocksDB, but even if there aren't
> many, that still leaves those who don't use RocksDB for
> checkpoints/savepoints.
>
> If Kryo were to stay in the Flink APIs in v1.X, is it impossible to let
> users choose between v2/v5 jars by separating them like log4j2 jars?
>
> [1] https://issues.apache.org/jira/browse/FLINK-16686
>
> Regards,
> Alexis.
>
> Am Do., 30. März 2023 um 14:26 Uhr schrieb Martijn Visser <
> martijnvis...@apache.org>:
>
> Hi all,
>
> I also saw a thread on this topic from Clayton Wohl [1] on this topic,
> which I'm including in this discussion thread to avoid that it gets lost.
>
> From my perspective, there's two main ways to get to Java 17:
>
> 1. The Flink community agrees that we upgrade Kryo to a later version,
> which means breaking all checkpoint/savepoint compatibility and releasing a
> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API support
> dropped. This is probably the quickest way, but would still mean that we
> expose Kryo in the Flink APIs, which is the main reason why we haven't been
> able to upgrade Kryo at all.
> 2. There's a contributor who makes a contribution that bumps Kryo, but
> either a) automagically reads in all old checkpoints/savepoints in using
> Kryo v2 and writes them to new snapshots using Kryo v5 (like is mentioned
> in the Kryo migration guide [2][3] or b) provides an offline tool that
> allows users that are interested in migrating their snapshots manually
> before starting from a newer version. That potentially could prevent the
> need to introduce a new Flink major version. In both scenarios, ideally the
> contributor would also help with avoiding the exposure of Kryo so that we
> will be in a better shape in the future.
>
> It would be good to get the opinion of the community for either of these
> two options, or potentially for another one that I haven't mentioned. If it
> appears that there's an overall agreement on the direction, I would propose
> that a FLIP gets created which describes the entire process.
>
> Looking forward to the thoughts of others, including the Users (therefore
> including the User ML).
>
> Best regards,
>
> Martijn
>
> [1]  https://lists.apache.org/thread/qcw8wy9dv8szxx9bh49nz7jnth22p1v2
> [2] https://lists.apache.org/thre

Re: [DISCUSS] FLIP 295: Support persistence of Catalog configuration and asynchronous registration

2023-04-27 Thread Feng Jin
Hi Jing


> There are still some NIT issues in the FLIP

Thank you very much for the careful review. I have already made the
relevant changes.


>  Speaking of the conflict issues in the multi-instance scenarios, I am
not
sure if this is the intended behaviour

Currently, there are conflicts in multiple scenarios with the current
design. I am thinking whether we should remove 'Map' and
make Cache the default behavior of InMemoryCatalogStore. This way, users
can implement their own CatalogStore to achieve multi-instance
non-conflicting scenarios. What do you think?



Best,
Feng

On Thu, Apr 27, 2023 at 9:03 PM Jing Ge  wrote:

> Hi Feng,
>
> Thanks for working on the FLIP. There are still some NIT issues in the FLIP
> like:
>
> 1. Optional catalogStore has been used as CatalogStore
> instead of Optional in the code example. It should be fine to use it as
> pseudo code for now and update it after you submit the PR.
> 2. addCatalog(...) is still used somewhere in the rejected section which
> should be persistContext(...) to keep it consistent.
>
> Speaking of the conflict issues in the multi-instance scenarios, I am not
> sure if this is the intended behaviour. If Map catalogs is
> used as a cache, it should be invalid, once the related catalog has been
> removed from the CatalogStore by another instance. Did I miss something?
>
> Best regards,
> Jing
>
> On Thu, Apr 13, 2023 at 4:40 PM Feng Jin  wrote:
>
> > Hi Jing,Shammon
> > Thanks for your reply.
> >
> > @Jing
> >
> > > How about persistCatalog()?
> > I think this is a good function name, I have updated it in the
> > documentation.
> >
> > >Some common cache features should be implemented
> > Thank you for the suggestion. If alternative 1 turns out to be more
> > appropriate later, I will improve this part of the content.
> >
> > > As the above discussion moves forward, the option 2 solution looks more
> > like a replacement of option 1
> > Yes, after discussing with Shammon offline, we think that solution 2
> might
> > be more suitable and also avoid any inconsistency issues.
> >
> > > There are some inconsistent descriptions in the content.  Would you
> like
> > to clean them up?
> > I will do my best to improve the document and appreciate your
> suggestions.
> >
> >
> >
> > @Shammon
> > > can you put the unselected option in `Rejected Alternatives`
> > Sure, I have moved it to the `Rejected Alternatives`.
> >
> >
> >
> > Best
> > Feng
> >
> >
> >
> > On Thu, Apr 13, 2023 at 8:52 AM Shammon FY  wrote:
> >
> > > Hi Feng
> > >
> > > Thanks for your update.
> > >
> > > I found there are two options in `Proposed Changes`, can you put the
> > > unselected option in `Rejected Alternatives`? I think this may help us
> > > better understand your proposal
> > >
> > >
> > > Best,
> > > Shammon FY
> > >
> > >
> > > On Thu, Apr 13, 2023 at 4:49 AM Jing Ge 
> > > wrote:
> > >
> > > > Hi Feng,
> > > >
> > > > Thanks for raising this FLIP. I am still confused after completely
> > > reading
> > > > the thread with following questions:
> > > >
> > > > 1. Naming confusion - registerCatalog() and addCatalog() have no big
> > > > difference based on their names. One of them is responsible for data
> > > > persistence. How about persistCatalog()?
> > > > 2. As you mentioned that Map catalogs is used as a
> > cache
> > > > and catalogStore is used for data persistence. I would suggest
> > describing
> > > > their purpose conceptually and clearly in the FLIP. Some common cache
> > > > features should be implemented, i.e. data in the cache and the store
> > > should
> > > > be consistent. Same Catalog instance should be found in the store and
> > in
> > > > the cache(either it has been initialized or it will be lazy
> > initialized)
> > > > for the same catalog name. The consistency will be taken care of
> while
> > > > updating the catalog.
> > > > 3. As the above discussion moves forward, the option 2 solution looks
> > > more
> > > > like a replacement of option 1, because, afaiu, issues mentioned
> > > > previously with option 1 are not solved yet. Do you still want to
> > propose
> > > > both options and ask for suggestions for both of them?
> > > > 4. After you updated the FLIP, there are some inconsistent
> descriptions
> > > in
> > > > the content.  Would you like to clean them up? Thanks!
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > >
> > > > On Fri, Apr 7, 2023 at 9:24 AM Feng Jin 
> wrote:
> > > >
> > > > > hi Shammon
> > > > >
> > > > > Thank you for your response, and I completely agree with your point
> > of
> > > > > view.
> > > > > Initially, I may have over complicated the whole issue. First and
> > > > foremost,
> > > > > we need to consider the persistence of the Catalog's Configuration.
> > > > > If we only need to provide persistence for Catalog Configuration,
> we
> > > can
> > > > > add a toConfiguration method to the Catalog interface.
> > > > > This method can convert a Catalog instance to a Map
> > > > > properties, and the default i

Re: [DISCUSS] FLIP-306: Unified File Merging Mechanism for Checkpoints

2023-04-27 Thread Zakelly Lan
Hi Yuan,

Thanks for sharing your thoughts. Like you said, the code changes and
complexities are shaded in the newly introduced file management in TM,
while the old file management remains the same. It is safe for us to
take a small step towards decentralized file management in this way. I
put the POC branch here[1] so everyone can check the code change.

Best regards,
Zakelly

[1] https://github.com/Zakelly/flink/tree/flip306_poc

On Thu, Apr 27, 2023 at 8:13 PM Yuan Mei  wrote:
>
> Hey all,
>
> Thanks @Zakelly for driving this effort and thanks everyone for the warm
> discussion. Sorry for the late response.
>
> As I and Zakelly have already discussed and reviewed the design carefully
> when drafting this FLIP, I do not have additional inputs here. But I want
> to highlight several points that I've been quoted and explain why I think
> the current design is a reasonable and clean one.
>
> *Why this FLIP is proposed*
> File Flooding is a problem for Flink I've seen many people bring up
> throughout the years, especially for large clusters. Unfortunately, there
> are not yet accepted solutions for the most commonly used state backend
> like RocksDB. This FLIP was originally targeted to address merging
> SST(KeyedState) checkpoint files.
>
> While we are comparing different design choices, we found that different
> types of checkpoint files (OPState, Unaligned CP channel state, Changelog
> incremental state) share similar considerations, for example, file
> management, file merging granularity, and e.t.c. That's why we want to
> abstract a unified framework for merging these different types of
> checkpoint files and provide flexibility to choose between merging
> efficiency, rescaling/restoring cost, File system capabilities (affecting
> File visibility), and e.t.c.
>
> *File Ownership moved from JM to TM, WHY*
> One of the major differences in the proposed design is moving file
> ownership from JM to TM. A lot of questions/concerns are coming from here,
> let me answer them one by one:
>
> *1. Why the current JM SharedRegistry is not enough and do we have to
> introduce more complexity?*
> SharedRegistry maintains the mapping between *a file -> max CP ID using the
> file*
> For merging files, we have to introduce another level of mapping *a file ->
> checkpoint file segment (merged files)*
> So yes, no matter what, the second level of mapping has to be managed
> somewhere, either JM or TM.
>
> *2. Why the **complexity (second level of mapping)** cannot be maintained
> in JM?*
> - As a centralized service, JM has already been complicated and overloaded.
> As mentioned by @Yanfei Lei , "triggering checkpoints
> can be delayed by discarding shared state when JM manages a large number of
> files FLINK-26590". This ends up setting the JM thread pool to 500!
> - As explained by @Zakelly in the previous thread, the contract "for
> Checkpoint N, only re-use shared state handles that have been already
> referenced by checkpoint N-1" is not guaranteed for the concurrent
> checkpoint in the current JM-owned design.  This problem can not be
> addressed without significant changes in how SharedRegistry and checkpoint
> subsume work, which, I do not think is worth it since "concurrent_CP>1" is
> not used that much in prod.
>
> *3. We have similar discussions before, moving ownership from JM to TM, why
> it is not adopted at that time? *
> As mentioned by Yun and Piotr, we have had similar discussions to move
> ownership from JM to TM when designing the changelog state backend. The
> reason why we stuck to JM ownership at that time is mainly due to
> engineering time/effort constraints.
> This time, since we need an extra level of mapping, which complicates the
> JM logic even further, we indeed need to shade the complexity within the TM
> to avoid more communications between JM and TM.
> Zakelly has already shared the code branch (about 2000 lines), and it is
> simple.
>
> *4. Cloud-Native Trend*
> The current centralized file management framework contradicts the
> cloud-native trend. That's also one of the reasons moving ownership from JM
> to TM was first proposed. The proposed design and implementation is a
> worthy try-out in this direction. I'd like to put some more effort in this
> direction if this really turns out working well.
>
> One more thing I want to mention is that the proposed design shaded all the
> code changes and complexities in the newly introduced File management in
> TM. That says without enabling File merging, the code path of File managing
> remains the same as before. So it is also a safe change in this sense.
>
> Best,
> Yuan
>
>
>
> On Wed, Apr 12, 2023 at 5:23 PM Zakelly Lan  wrote:
>
> > Hi Yun,
> >
> > I reorganized our discussion and added a comparison table of state
> > ownership with some previous designs. Please take a look at section
> > "4.9. State ownership comparison with other designs".
> >
> > But I don't see them as alternatives since the design of state
> > ownership is integra

Re: [DISCUSS] FLIP-306: Unified File Merging Mechanism for Checkpoints

2023-04-27 Thread Zakelly Lan
Hi all,

Thanks for all the feedback so far.

The discussion has been going on for some time, and all the comments
and suggestions are addressed. So I would like to start a vote on this
FLIP, which begins a week later (May. 5th at 10:00 AM GMT).

If you have any concerns, please don't hesitate to follow up on this discussion.


Best regards,
Zakelly

On Fri, Apr 28, 2023 at 12:03 AM Zakelly Lan  wrote:
>
> Hi Yuan,
>
> Thanks for sharing your thoughts. Like you said, the code changes and
> complexities are shaded in the newly introduced file management in TM,
> while the old file management remains the same. It is safe for us to
> take a small step towards decentralized file management in this way. I
> put the POC branch here[1] so everyone can check the code change.
>
> Best regards,
> Zakelly
>
> [1] https://github.com/Zakelly/flink/tree/flip306_poc
>
> On Thu, Apr 27, 2023 at 8:13 PM Yuan Mei  wrote:
> >
> > Hey all,
> >
> > Thanks @Zakelly for driving this effort and thanks everyone for the warm
> > discussion. Sorry for the late response.
> >
> > As I and Zakelly have already discussed and reviewed the design carefully
> > when drafting this FLIP, I do not have additional inputs here. But I want
> > to highlight several points that I've been quoted and explain why I think
> > the current design is a reasonable and clean one.
> >
> > *Why this FLIP is proposed*
> > File Flooding is a problem for Flink I've seen many people bring up
> > throughout the years, especially for large clusters. Unfortunately, there
> > are not yet accepted solutions for the most commonly used state backend
> > like RocksDB. This FLIP was originally targeted to address merging
> > SST(KeyedState) checkpoint files.
> >
> > While we are comparing different design choices, we found that different
> > types of checkpoint files (OPState, Unaligned CP channel state, Changelog
> > incremental state) share similar considerations, for example, file
> > management, file merging granularity, and e.t.c. That's why we want to
> > abstract a unified framework for merging these different types of
> > checkpoint files and provide flexibility to choose between merging
> > efficiency, rescaling/restoring cost, File system capabilities (affecting
> > File visibility), and e.t.c.
> >
> > *File Ownership moved from JM to TM, WHY*
> > One of the major differences in the proposed design is moving file
> > ownership from JM to TM. A lot of questions/concerns are coming from here,
> > let me answer them one by one:
> >
> > *1. Why the current JM SharedRegistry is not enough and do we have to
> > introduce more complexity?*
> > SharedRegistry maintains the mapping between *a file -> max CP ID using the
> > file*
> > For merging files, we have to introduce another level of mapping *a file ->
> > checkpoint file segment (merged files)*
> > So yes, no matter what, the second level of mapping has to be managed
> > somewhere, either JM or TM.
> >
> > *2. Why the **complexity (second level of mapping)** cannot be maintained
> > in JM?*
> > - As a centralized service, JM has already been complicated and overloaded.
> > As mentioned by @Yanfei Lei , "triggering checkpoints
> > can be delayed by discarding shared state when JM manages a large number of
> > files FLINK-26590". This ends up setting the JM thread pool to 500!
> > - As explained by @Zakelly in the previous thread, the contract "for
> > Checkpoint N, only re-use shared state handles that have been already
> > referenced by checkpoint N-1" is not guaranteed for the concurrent
> > checkpoint in the current JM-owned design.  This problem can not be
> > addressed without significant changes in how SharedRegistry and checkpoint
> > subsume work, which, I do not think is worth it since "concurrent_CP>1" is
> > not used that much in prod.
> >
> > *3. We have similar discussions before, moving ownership from JM to TM, why
> > it is not adopted at that time? *
> > As mentioned by Yun and Piotr, we have had similar discussions to move
> > ownership from JM to TM when designing the changelog state backend. The
> > reason why we stuck to JM ownership at that time is mainly due to
> > engineering time/effort constraints.
> > This time, since we need an extra level of mapping, which complicates the
> > JM logic even further, we indeed need to shade the complexity within the TM
> > to avoid more communications between JM and TM.
> > Zakelly has already shared the code branch (about 2000 lines), and it is
> > simple.
> >
> > *4. Cloud-Native Trend*
> > The current centralized file management framework contradicts the
> > cloud-native trend. That's also one of the reasons moving ownership from JM
> > to TM was first proposed. The proposed design and implementation is a
> > worthy try-out in this direction. I'd like to put some more effort in this
> > direction if this really turns out working well.
> >
> > One more thing I want to mention is that the proposed design shaded all the
> > code changes and complexities i

[jira] [Created] (FLINK-31960) SQL OverBy. Error on a code that does not exist

2023-04-27 Thread padavan (Jira)
padavan created FLINK-31960:
---

 Summary: SQL OverBy. Error on a code that does not exist 
 Key: FLINK-31960
 URL: https://issues.apache.org/jira/browse/FLINK-31960
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: padavan


Hello. I use latest Flink. And want make query with LEAD, LAG function , but i 
got error 

 
{noformat}
SQL validation failed. From line 1, column 138 to line 1, column 142: ROW/RANGE 
not allowed with RANK, DENSE_RANK or ROW_NUMBER functions{noformat}
But i dont use RANK, DENSE_RANK or ROW_NUMBER functions in my code
{code:java}
        Table win = te.sqlQuery(
                "SELECT userId, " +
                "lead(`count`, 1) over w as ld, " +
                "lag(`count`, 1) over w as lg " +
                "FROM users " +
                "WINDOW w AS (" +
                "PARTITION BY userId " +
                "ORDER BY proctime " +
                "RANGE BETWEEN INTERVAL '1' MINUTE PRECEDING AND CURRENT ROW)"
        );{code}
 

I find what is problem fix in 2020 but not...

https://github.com/apache/flink/pull/12868



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31961) Implement ExpandedQuery and OriginalQuery methods for glue in catalog

2023-04-27 Thread Samrat Deb (Jira)
Samrat Deb created FLINK-31961:
--

 Summary: Implement ExpandedQuery and OriginalQuery methods for 
glue in catalog
 Key: FLINK-31961
 URL: https://issues.apache.org/jira/browse/FLINK-31961
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / AWS
Reporter: Samrat Deb






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31962) libssl not found when running CI

2023-04-27 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-31962:
--

 Summary: libssl not found when running CI
 Key: FLINK-31962
 URL: https://issues.apache.org/jira/browse/FLINK-31962
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.16.2, 1.18.0, 1.17.1
Reporter: Martijn Visser
Assignee: Martijn Visser


{code:java}
Installed Maven 3.2.5 to /home/vsts/maven_cache/apache-maven-3.2.5
Installing required software
Reading package lists...
Building dependency tree...
Reading state information...
bc is already the newest version (1.07.1-2build1).
bc set to manually installed.
libapr1 is already the newest version (1.6.5-1ubuntu1).
libapr1 set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 13 not upgraded.
--2023-04-27 11:42:53--  
http://security.ubuntu.com/ubuntu/pool/main/o/openssl1.0/libssl1.0.0_1.0.2n-1ubuntu5.11_amd64.deb
Resolving security.ubuntu.com (security.ubuntu.com)... 91.189.91.39, 
185.125.190.36, 185.125.190.39, ...
Connecting to security.ubuntu.com (security.ubuntu.com)|91.189.91.39|:80... 
connected.
HTTP request sent, awaiting response... 404 Not Found
2023-04-27 11:42:53 ERROR 404: Not Found.
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [Discussion] - Release major Flink version to support JDK 17 (LTS)

2023-04-27 Thread Martijn Visser
Scala 2.12.7 doesn't compile on Java 17, see
https://issues.apache.org/jira/browse/FLINK-25000.

On Thu, Apr 27, 2023 at 3:11 PM Jing Ge  wrote:

> Thanks Tamir for the information. According to the latest comment of the
> task FLINK-24998, this bug should be gone while using the latest JDK 17. I
> was wondering whether it means that there are no more issues to stop us
> releasing a major Flink version to support Java 17? Did I miss something?
>
> Best regards,
> Jing
>
> On Thu, Apr 27, 2023 at 8:18 AM Tamir Sagi 
> wrote:
>
>> More details about the JDK bug here
>> https://bugs.openjdk.org/browse/JDK-8277529
>>
>> Related Jira ticket
>> https://issues.apache.org/jira/browse/FLINK-24998
>>
>> --
>> *From:* Jing Ge via user 
>> *Sent:* Monday, April 24, 2023 11:15 PM
>> *To:* Chesnay Schepler 
>> *Cc:* Piotr Nowojski ; Alexis Sarda-Espinosa <
>> sarda.espin...@gmail.com>; Martijn Visser ;
>> dev@flink.apache.org ; user 
>> *Subject:* Re: [Discussion] - Release major Flink version to support JDK
>> 17 (LTS)
>>
>>
>> *EXTERNAL EMAIL*
>>
>>
>> Thanks Chesnay for working on this. Would you like to share more info
>> about the JDK bug?
>>
>> Best regards,
>> Jing
>>
>> On Mon, Apr 24, 2023 at 11:39 AM Chesnay Schepler 
>> wrote:
>>
>> As it turns out Kryo isn't a blocker; we ran into a JDK bug.
>>
>> On 31/03/2023 08:57, Chesnay Schepler wrote:
>>
>>
>> https://github.com/EsotericSoftware/kryo/wiki/Migration-to-v5#migration-guide
>>
>> Kroy themselves state that v5 likely can't read v2 data.
>>
>> However, both versions can be on the classpath without classpath as v5
>> offers a versioned artifact that includes the version in the package.
>>
>> It probably wouldn't be difficult to migrate a savepoint to Kryo v5,
>> purely from a read/write perspective.
>>
>> The bigger question is how we expose this new Kryo version in the API. If
>> we stick to the versioned jar we need to either duplicate all current
>> Kryo-related APIs or find a better way to integrate other serialization
>> stacks.
>> On 30/03/2023 17:50, Piotr Nowojski wrote:
>>
>> Hey,
>>
>> > 1. The Flink community agrees that we upgrade Kryo to a later version,
>> which means breaking all checkpoint/savepoint compatibility and releasing a
>> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API support
>> dropped. This is probably the quickest way, but would still mean that we
>> expose Kryo in the Flink APIs, which is the main reason why we haven't been
>> able to upgrade Kryo at all.
>>
>> This sounds pretty bad to me.
>>
>> Has anyone looked into what it would take to provide a smooth migration
>> from Kryo2 -> Kryo5?
>>
>> Best,
>> Piotrek
>>
>> czw., 30 mar 2023 o 16:54 Alexis Sarda-Espinosa 
>> napisał(a):
>>
>> Hi Martijn,
>>
>> just to be sure, if all state-related classes use a POJO serializer, Kryo
>> will never come into play, right? Given FLINK-16686 [1], I wonder how many
>> users actually have jobs with Kryo and RocksDB, but even if there aren't
>> many, that still leaves those who don't use RocksDB for
>> checkpoints/savepoints.
>>
>> If Kryo were to stay in the Flink APIs in v1.X, is it impossible to let
>> users choose between v2/v5 jars by separating them like log4j2 jars?
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-16686
>>
>> Regards,
>> Alexis.
>>
>> Am Do., 30. März 2023 um 14:26 Uhr schrieb Martijn Visser <
>> martijnvis...@apache.org>:
>>
>> Hi all,
>>
>> I also saw a thread on this topic from Clayton Wohl [1] on this topic,
>> which I'm including in this discussion thread to avoid that it gets lost.
>>
>> From my perspective, there's two main ways to get to Java 17:
>>
>> 1. The Flink community agrees that we upgrade Kryo to a later version,
>> which means breaking all checkpoint/savepoint compatibility and releasing a
>> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API support
>> dropped. This is probably the quickest way, but would still mean that we
>> expose Kryo in the Flink APIs, which is the main reason why we haven't been
>> able to upgrade Kryo at all.
>> 2. There's a contributor who makes a contribution that bumps Kryo, but
>> either a) automagically reads in all old checkpoints/savepoints in using
>> Kryo v2 and writes them to new snapshots using Kryo v5 (like is mentioned
>> in the Kryo migration guide [2][3] or b) provides an offline tool that
>> allows users that are interested in migrating their snapshots manually
>> before starting from a newer version. That potentially could prevent the
>> need to introduce a new Flink major version. In both scenarios, ideally the
>> contributor would also help with avoiding the exposure of Kryo so that we
>> will be in a better shape in the future.
>>
>> It would be good to get the opinion of the community for either of these
>> two options, or potentially for another one that I haven't mentioned. If it
>> appears that there's an overall agreement on the direction, I would propose
>> that a FLIP gets create

Re: [DISCUSS] Planning Flink 2.0

2023-04-27 Thread Martijn Visser
Hi all,

I think the proposal is a good starting point. We should aim to make Flink
a unified data processing, cloud friendly / cloud native technology, with
proper low-level and high-level interfaces (DataStream API, Table API,
SQL). I think it would make a lot of sense that we write down a vision for
Flink for the long term. That would also mean sharing and discussing more
insights and having conversations around some of the long-term direction
from the proposal.

In order to achieve that vision, I believe that we need a Flink 2.0 which I
consider a long overdue clean-up. That version should be the foundation for
Flink that allows the above mentioned vision to become actual proposals and
implementations.

As a foundation in Flink 2.0, I would be inclined to say it should be:

- Remove all deprecated APIs, including the DataSet API, Scala API,
Queryable State, legacy Source and Sink implementations, legacy SQL
functions etc.
- Add support for Java 17 and 21, make 17 the default (given that the next
Java LTS, 21, is released in September this year and the timeline is set of
2024)
- Drop support for Java 8 and 11
- Refactor the configuration layer
- Refactor the DataStream API, such as:
** Having a coherent and well designed API
** Decouple the API into API-only modules, so no more cyclic dependencies
and leaking of non-APIs, including Kryo
** Reorganize APIs and modules

I think these are some of the must-haves. Curious about the thoughts of the
community.

Thanks, Martijn

Op do 27 apr. 2023 om 10:16 schreef David Morávek 

> Hi,
>
> Great to see this topic moving forward; I agree it's long overdue.
>
> I keep thinking about 2.0 as a chance to eliminate things that didn't work,
> make the feature set denser, and fix rough edges and APIs that hold us
> back.
>
> Some items in the doc (Key Features section) don't tick these boxes for me,
> as they could also be implemented in the 1x branch. We should consider
> whether we need a backward incompatible release to introduce each feature.
> This should help us to keep the discussion more focused.
>
> Best,
> D.
>
>
> On Wed, Apr 26, 2023 at 2:33 PM DONG Weike 
> wrote:
>
> > Hi,
> >
> > It is thrilling to see the foreseeable upcoming rollouts of Flink 2.x
> > releases, and I believe that this roadmap can take Flink to the next
> stage
> > of a top-of-notch unified streaming & batch computing engine.
> >
> > Given that all of the existing user programs are written and run in Flink
> > 1.x versions as for now, and some of them are very complex and rely on
> > various third-party connectors written with legacy APIs, one thing that I
> > have concerns about is if, one day in the future, the community decides
> > that new features are only given to 2.x releases, could the last release
> of
> > Flink 1.x be converted as an LTS version (backporting severe bug fixes
> and
> > critical security patches), so that existing users could have enough time
> > to wait for third-party connectors to upgrade, test their programs on the
> > Flink APIs, and avoid sudden loss of community support.
> >
> > Just my two cents : )
> >
> > Best,
> > Weike
> >
> > 
> > 发件人: Xintong Song 
> > 发送时间: 2023年4月26日 20:01
> > 收件人: dev 
> > 主题: Re: [DISCUSS] Planning Flink 2.0
> >
> > @Chesnay
> >
> >
> > > Technically this implies that every minor release may contain breaking
> > > changes, which is exactly what users don't want.
> >
> >
> > It's not necessary to introduce the breaking chagnes immediately upon
> > reaching the minimum guaranteed stable time. If there are multiple
> changes
> > waiting for the stable time, we can still gather them in 1 minor release.
> > But I see your point, from the user's perspective, the mechanism does not
> > provide any guarantees for the compatibility of minor releases.
> >
> > What problems to do you see in creating major releases every N years?
> > >
> >
> > It might not be concrete problem, but I'm a bit concerned by the
> > uncertainty. I assume N should not be too small, e.g., at least 3. I'd
> > expect the decision to ship a major release would be made based on
> > comprehensive considerations over the situations at that time. Making a
> > decision now that we would ship a major release 3 years later seems a bit
> > agressive to me.
> >
> > We need to figure out what this release means for connectors
> > > compatibility-wise.
> > >
> >
> > +1
> >
> >
> > > What process are you thinking of for deciding what breaking changes to
> > > make? The obvious choice would be FLIPs, but I'm worried that this will
> > > overload the mailing list / wiki for lots of tiny changes.
> > >
> >
> > This should be a community decision. What I have in mind would be: (1)
> > collect a wish list on wiki, (2) schedule a series of online meetings
> (like
> > the release syncs) to get an agreed set of must-have items, (3) develop
> and
> > polish the detailed plans of items via FLIPs, and (4) if the plan for a
> > must-have item does not work ou

Re: [Discussion] - Release major Flink version to support JDK 17 (LTS)

2023-04-27 Thread Thomas Weise
Is the intention to bump the Flink major version and only support Java 17+?
If so, can Scala not be upgraded at the same time?

Thanks,
Thomas


On Thu, Apr 27, 2023 at 4:53 PM Martijn Visser 
wrote:

> Scala 2.12.7 doesn't compile on Java 17, see
> https://issues.apache.org/jira/browse/FLINK-25000.
>
> On Thu, Apr 27, 2023 at 3:11 PM Jing Ge  wrote:
>
> > Thanks Tamir for the information. According to the latest comment of the
> > task FLINK-24998, this bug should be gone while using the latest JDK 17.
> I
> > was wondering whether it means that there are no more issues to stop us
> > releasing a major Flink version to support Java 17? Did I miss something?
> >
> > Best regards,
> > Jing
> >
> > On Thu, Apr 27, 2023 at 8:18 AM Tamir Sagi 
> > wrote:
> >
> >> More details about the JDK bug here
> >> https://bugs.openjdk.org/browse/JDK-8277529
> >>
> >> Related Jira ticket
> >> https://issues.apache.org/jira/browse/FLINK-24998
> >>
> >> --
> >> *From:* Jing Ge via user 
> >> *Sent:* Monday, April 24, 2023 11:15 PM
> >> *To:* Chesnay Schepler 
> >> *Cc:* Piotr Nowojski ; Alexis Sarda-Espinosa <
> >> sarda.espin...@gmail.com>; Martijn Visser ;
> >> dev@flink.apache.org ; user <
> u...@flink.apache.org>
> >> *Subject:* Re: [Discussion] - Release major Flink version to support JDK
> >> 17 (LTS)
> >>
> >>
> >> *EXTERNAL EMAIL*
> >>
> >>
> >> Thanks Chesnay for working on this. Would you like to share more info
> >> about the JDK bug?
> >>
> >> Best regards,
> >> Jing
> >>
> >> On Mon, Apr 24, 2023 at 11:39 AM Chesnay Schepler 
> >> wrote:
> >>
> >> As it turns out Kryo isn't a blocker; we ran into a JDK bug.
> >>
> >> On 31/03/2023 08:57, Chesnay Schepler wrote:
> >>
> >>
> >>
> https://github.com/EsotericSoftware/kryo/wiki/Migration-to-v5#migration-guide
> >>
> >> Kroy themselves state that v5 likely can't read v2 data.
> >>
> >> However, both versions can be on the classpath without classpath as v5
> >> offers a versioned artifact that includes the version in the package.
> >>
> >> It probably wouldn't be difficult to migrate a savepoint to Kryo v5,
> >> purely from a read/write perspective.
> >>
> >> The bigger question is how we expose this new Kryo version in the API.
> If
> >> we stick to the versioned jar we need to either duplicate all current
> >> Kryo-related APIs or find a better way to integrate other serialization
> >> stacks.
> >> On 30/03/2023 17:50, Piotr Nowojski wrote:
> >>
> >> Hey,
> >>
> >> > 1. The Flink community agrees that we upgrade Kryo to a later version,
> >> which means breaking all checkpoint/savepoint compatibility and
> releasing a
> >> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API
> support
> >> dropped. This is probably the quickest way, but would still mean that we
> >> expose Kryo in the Flink APIs, which is the main reason why we haven't
> been
> >> able to upgrade Kryo at all.
> >>
> >> This sounds pretty bad to me.
> >>
> >> Has anyone looked into what it would take to provide a smooth migration
> >> from Kryo2 -> Kryo5?
> >>
> >> Best,
> >> Piotrek
> >>
> >> czw., 30 mar 2023 o 16:54 Alexis Sarda-Espinosa <
> sarda.espin...@gmail.com>
> >> napisał(a):
> >>
> >> Hi Martijn,
> >>
> >> just to be sure, if all state-related classes use a POJO serializer,
> Kryo
> >> will never come into play, right? Given FLINK-16686 [1], I wonder how
> many
> >> users actually have jobs with Kryo and RocksDB, but even if there aren't
> >> many, that still leaves those who don't use RocksDB for
> >> checkpoints/savepoints.
> >>
> >> If Kryo were to stay in the Flink APIs in v1.X, is it impossible to let
> >> users choose between v2/v5 jars by separating them like log4j2 jars?
> >>
> >> [1] https://issues.apache.org/jira/browse/FLINK-16686
> >>
> >> Regards,
> >> Alexis.
> >>
> >> Am Do., 30. März 2023 um 14:26 Uhr schrieb Martijn Visser <
> >> martijnvis...@apache.org>:
> >>
> >> Hi all,
> >>
> >> I also saw a thread on this topic from Clayton Wohl [1] on this topic,
> >> which I'm including in this discussion thread to avoid that it gets
> lost.
> >>
> >> From my perspective, there's two main ways to get to Java 17:
> >>
> >> 1. The Flink community agrees that we upgrade Kryo to a later version,
> >> which means breaking all checkpoint/savepoint compatibility and
> releasing a
> >> Flink 2.0 with Java 17 support added and Java 8 and Flink Scala API
> support
> >> dropped. This is probably the quickest way, but would still mean that we
> >> expose Kryo in the Flink APIs, which is the main reason why we haven't
> been
> >> able to upgrade Kryo at all.
> >> 2. There's a contributor who makes a contribution that bumps Kryo, but
> >> either a) automagically reads in all old checkpoints/savepoints in using
> >> Kryo v2 and writes them to new snapshots using Kryo v5 (like is
> mentioned
> >> in the Kryo migration guide [2][3] or b) provides an offline tool that
> >> allows users that are interested in migrating their snapshots manually
> >> before s

RE: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-04-27 Thread Raman Verma
Hello Hongshun Wang,

You have mentioned that the first partition discovery can be very slow 
(section: Why do we need initialDiscoveryFinished?)

Do you mean that Kafka can be slow to respond. If so, any idea under what 
conditions Kafka would be slow. 
Or, is it just a matter of bad timing, where this call does not return before 
checkpoint.

Thanks,
Raman Verma

On 2023/03/17 10:41:40 Hongshun Wang wrote:
> Hi everyone,
> 
> I would like to start a discussion on FLIP-288:Enable Dynamic Partition
> Discovery by Default in Kafka Source[1].
> 
> As described in mail thread[2], dynamic partition discovery is disabled by
> default and users have to explicitly specify the interval of discovery in
> order to turn it on. Besides, if the initial offset strategy is LATEST,
> same strategy is used for new partitions, leading to the loss of some data
> (thinking a new partition is created and might be discovered by Kafka
> source several minutes later, and the message produced into the partition
> within the gap might be dropped if we use for example "latest" as the
> initial offset strategy.)
> 
> The goals of this FLIP are as follows:
> 
> 1. Enable partition discovery by default.
> 2. Use earliest as the offset strategy for new partitions after the
> first discovery.
> 
> Looking forward to hearing from you.
> 
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source
> 
> [2] 
> https://lists.apache.org/thread/d7zy46gj3sw0zwzq2rj3fmc0hx8ojtln
> 
> 
> Best,
> 
> Hongshun
>


Sent from my iPad

[jira] [Created] (FLINK-31963) java.lang.ArrayIndexOutOfBoundsException when scale down via autoscaler

2023-04-27 Thread Tan Kim (Jira)
Tan Kim created FLINK-31963:
---

 Summary: java.lang.ArrayIndexOutOfBoundsException when scale down 
via autoscaler
 Key: FLINK-31963
 URL: https://issues.apache.org/jira/browse/FLINK-31963
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator, Runtime / Checkpointing
 Environment: Flink: 1.17.0
FKO: 1.4.0
StateBackend: RocksDB(Genetic Incremental Checkpoint & Unaligned Checkpoint 
enabled)
Reporter: Tan Kim
 Attachments: jobmanager_error.txt, taskmanager_error.txt

I'm testing Autoscaler through Kubernetes Operator and I'm facing the following 
issue.

As you know, when a job is scaled down through the autoscaler, the job manager 
and task manager go down and then back up again.

When this happens, an index out of bounds exception is thrown and the state is 
not restored from a checkpoint.

[~gyfora] told me via the Flink Slack troubleshooting channel that this is 
likely an issue with Unaligned Checkpoint and not an issue with the autoscaler, 
but I'm opening a ticket with Gyula for more clarification.

Please see the attached JM and TM error logs.
Thank you.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31964) Improve the document of Autoscaler as 1.17.0 is released

2023-04-27 Thread Biao Geng (Jira)
Biao Geng created FLINK-31964:
-

 Summary: Improve the document of Autoscaler as 1.17.0 is released
 Key: FLINK-31964
 URL: https://issues.apache.org/jira/browse/FLINK-31964
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Biao Geng
 Attachments: image-2023-04-28-10-21-09-935.png

Since 1.17.0 is released and the official image is 
[available|https://hub.docker.com/layers/library/flink/1.17.0-scala_2.12-java8/images/sha256-a8bbef97ec3f7ce4fa6541d48dfe16261ee7f93f93b164c0e84644605f9ea0a3?context=explore],
 we can update the image link in the Autoscaler section.
 !image-2023-04-28-10-21-09-935.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-04-27 Thread Hongshun Wang
Hi,Raman Verma*:*

Generally, Kafka responds quickly. However, as an asynchronous operation,
we cannot guarantee that there will be no abnormal operations, such as
temporary network issues. The two cases I mentioned are special situations
to better understand the concept.


Best

Hongshun

On Fri, Apr 28, 2023 at 9:48 AM Raman Verma  wrote:

> Hello Hongshun Wang,
>
> You have mentioned that the first partition discovery can be very slow
> (section: Why do we need initialDiscoveryFinished?)
>
> Do you mean that Kafka can be slow to respond. If so, any idea under what
> conditions Kafka would be slow.
> Or, is it just a matter of bad timing, where this call does not return
> before checkpoint.
>
> Thanks,
> Raman Verma
>
> On 2023/03/17 10:41:40 Hongshun Wang wrote:
> > Hi everyone,
> >
> > I would like to start a discussion on FLIP-288:Enable Dynamic Partition
> > Discovery by Default in Kafka Source[1].
> >
> > As described in mail thread[2], dynamic partition discovery is disabled
> by
> > default and users have to explicitly specify the interval of discovery in
> > order to turn it on. Besides, if the initial offset strategy is LATEST,
> > same strategy is used for new partitions, leading to the loss of some
> data
> > (thinking a new partition is created and might be discovered by Kafka
> > source several minutes later, and the message produced into the partition
> > within the gap might be dropped if we use for example "latest" as the
> > initial offset strategy.)
> >
> > The goals of this FLIP are as follows:
> >
> > 1. Enable partition discovery by default.
> > 2. Use earliest as the offset strategy for new partitions after the
> > first discovery.
> >
> > Looking forward to hearing from you.
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-288%3A+Enable+Dynamic+Partition+Discovery+by+Default+in+Kafka+Source
> >
> > [2] 
> > https://lists.apache.org/thread/d7zy46gj3sw0zwzq2rj3fmc0hx8ojtln
> >
> >
> > Best,
> >
> > Hongshun
> >
>
>
> Sent from my iPad


Re: [DISCUSS] Planning Flink 2.0

2023-04-27 Thread Saichandrasekar TM
Hi All,

Awesome...I see this as a great opportunity for newcomers like me to
contribute.

Is this discussion happening in a slack or discord forum too? If so, pls
include me.

Thanks,
Sai

On Fri, Apr 28, 2023 at 2:55 AM Martijn Visser 
wrote:

> Hi all,
>
> I think the proposal is a good starting point. We should aim to make Flink
> a unified data processing, cloud friendly / cloud native technology, with
> proper low-level and high-level interfaces (DataStream API, Table API,
> SQL). I think it would make a lot of sense that we write down a vision for
> Flink for the long term. That would also mean sharing and discussing more
> insights and having conversations around some of the long-term direction
> from the proposal.
>
> In order to achieve that vision, I believe that we need a Flink 2.0 which I
> consider a long overdue clean-up. That version should be the foundation for
> Flink that allows the above mentioned vision to become actual proposals and
> implementations.
>
> As a foundation in Flink 2.0, I would be inclined to say it should be:
>
> - Remove all deprecated APIs, including the DataSet API, Scala API,
> Queryable State, legacy Source and Sink implementations, legacy SQL
> functions etc.
> - Add support for Java 17 and 21, make 17 the default (given that the next
> Java LTS, 21, is released in September this year and the timeline is set of
> 2024)
> - Drop support for Java 8 and 11
> - Refactor the configuration layer
> - Refactor the DataStream API, such as:
> ** Having a coherent and well designed API
> ** Decouple the API into API-only modules, so no more cyclic dependencies
> and leaking of non-APIs, including Kryo
> ** Reorganize APIs and modules
>
> I think these are some of the must-haves. Curious about the thoughts of the
> community.
>
> Thanks, Martijn
>
> Op do 27 apr. 2023 om 10:16 schreef David Morávek 
>
> > Hi,
> >
> > Great to see this topic moving forward; I agree it's long overdue.
> >
> > I keep thinking about 2.0 as a chance to eliminate things that didn't
> work,
> > make the feature set denser, and fix rough edges and APIs that hold us
> > back.
> >
> > Some items in the doc (Key Features section) don't tick these boxes for
> me,
> > as they could also be implemented in the 1x branch. We should consider
> > whether we need a backward incompatible release to introduce each
> feature.
> > This should help us to keep the discussion more focused.
> >
> > Best,
> > D.
> >
> >
> > On Wed, Apr 26, 2023 at 2:33 PM DONG Weike 
> > wrote:
> >
> > > Hi,
> > >
> > > It is thrilling to see the foreseeable upcoming rollouts of Flink 2.x
> > > releases, and I believe that this roadmap can take Flink to the next
> > stage
> > > of a top-of-notch unified streaming & batch computing engine.
> > >
> > > Given that all of the existing user programs are written and run in
> Flink
> > > 1.x versions as for now, and some of them are very complex and rely on
> > > various third-party connectors written with legacy APIs, one thing
> that I
> > > have concerns about is if, one day in the future, the community decides
> > > that new features are only given to 2.x releases, could the last
> release
> > of
> > > Flink 1.x be converted as an LTS version (backporting severe bug fixes
> > and
> > > critical security patches), so that existing users could have enough
> time
> > > to wait for third-party connectors to upgrade, test their programs on
> the
> > > Flink APIs, and avoid sudden loss of community support.
> > >
> > > Just my two cents : )
> > >
> > > Best,
> > > Weike
> > >
> > > 
> > > 发件人: Xintong Song 
> > > 发送时间: 2023年4月26日 20:01
> > > 收件人: dev 
> > > 主题: Re: [DISCUSS] Planning Flink 2.0
> > >
> > > @Chesnay
> > >
> > >
> > > > Technically this implies that every minor release may contain
> breaking
> > > > changes, which is exactly what users don't want.
> > >
> > >
> > > It's not necessary to introduce the breaking chagnes immediately upon
> > > reaching the minimum guaranteed stable time. If there are multiple
> > changes
> > > waiting for the stable time, we can still gather them in 1 minor
> release.
> > > But I see your point, from the user's perspective, the mechanism does
> not
> > > provide any guarantees for the compatibility of minor releases.
> > >
> > > What problems to do you see in creating major releases every N years?
> > > >
> > >
> > > It might not be concrete problem, but I'm a bit concerned by the
> > > uncertainty. I assume N should not be too small, e.g., at least 3. I'd
> > > expect the decision to ship a major release would be made based on
> > > comprehensive considerations over the situations at that time. Making a
> > > decision now that we would ship a major release 3 years later seems a
> bit
> > > agressive to me.
> > >
> > > We need to figure out what this release means for connectors
> > > > compatibility-wise.
> > > >
> > >
> > > +1
> > >
> > >
> > > > What process are you thinking of for deciding what breaking changes
> to

[jira] [Created] (FLINK-31965) Fix ClassNotFoundException in benchmarks

2023-04-27 Thread Paul Lin (Jira)
Paul Lin created FLINK-31965:


 Summary: Fix ClassNotFoundException in benchmarks
 Key: FLINK-31965
 URL: https://issues.apache.org/jira/browse/FLINK-31965
 Project: Flink
  Issue Type: Bug
  Components: Benchmarks
Affects Versions: 1.18.0
Reporter: Paul Lin


The benchmarks rely on the test jar of `flink-streaming-java`. However, the jar 
is set to test scope, thus not included in the packaged jar. Therefore 
ClassNotFoundException occurs while running the benchmarks with `java --jar 
xxx` command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)