[VOTE] FLIP-419: Optimize multi-sink query plan generation

2024-02-21 Thread Jeyhun Karimov
Hi everyone,

I'd like to start a vote on the FLIP-419: Optimize multi-sink query plan
generation [1]. The discussion thread is here [2].

The vote will be open for at least 72 hours unless there is an objection or
insufficient votes.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-419%3A+Optimize+multi-sink+query+plan+generation
[2] https://lists.apache.org/thread/whblrt641kqd0mvpxf5gjmkc56hvo0kn


BR,
Jeyhun


Re: [DISCUSS]FLIP-420: Add API annotations for RocksDB StateBackend user-facing classes

2024-02-21 Thread Yanfei Lei
Hi Jinzhong,
Thanks for driving this!

1. I'm wondering if `ConfigurableRocksDBOptionsFactory` will be used
by users,  currently it looks like only developers use it in rocksdb
state backend module. And Its only non-testing subclass
"DefaultConfigurableOptionsFactory" is marked @Deprecated.
2. Regarding @Internal,  according to the comments, it is used for
"Annotation to mark methods within stable, public APIs as an internal
developer API."  So marking "SingleStateIterator" and
"RocksDBRestoreOperation" as @Internal is acceptable for me.

Best,
Yanfei

Jinzhong Li  于2024年1月25日周四 12:16写道:
>
> Hi Zakelly,
>
> Thanks for your comments!
>
> 1)I agree that almost no user would use "RocksDBStateUploader" and
> "RocksDBStateDownloader" to do something. It's fine for me to keep them
> unmarked.
> 2)Regarding "SingleStateIterator", I think it's acceptable to either leave
> it unmarked or mark it as @Internal. I just consider that
> SingleStateIterator is one interface with the "public" modifier and it is
> harmless to annotate it as @Internal.
>
>
>
>
> Hi Hangxiang,
>
> Thanks for the reminder!
>
> It makes sense to mark RocksDBStateBackendFactory as Deprecated.
>
> Best,
> Jinzhong Li
>
>
> On Thu, Jan 25, 2024 at 10:22 AM Hangxiang Yu  wrote:
>
> > Hi Jinzhong.
> > Thanks for driving this!
> > Some suggestions:
> > 1. As RocksDBStateBackend marked as Deprecated, We should also
> > mark RocksDBStateBackendFactory as Deprecated
> > 2. Since 1.19 will be freezed in 1.26. Let's adjust the target version to
> > 1.20
> >
> >
> > On Wed, Jan 24, 2024 at 11:50 PM Zakelly Lan 
> > wrote:
> >
> > > Hi Jinzhong,
> > >
> > > Thanks for driving this! +1 for fixing the lack of annotation.
> > >
> > > I'm wondering if we really need to annotate *RocksDBStateUploader* and
> > > *RocksDBStateDownloader
> > > *with @Internal, as they seem to be ordinary classes without interacting
> > > with other modules.
> > > Also, I have reservations about annotating *SingleStateIterator*, but I'd
> > > like to hear others' opinions and won't insist on this.
> > >
> > > Best,
> > > Zakelly
> > >
> > > On Wed, Jan 24, 2024 at 10:26 PM Jinzhong Li 
> > > wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I’m opening this thread to discuss about FLIP-420: Add API annotations
> > > for
> > > > RocksDB StateBackend user-facing classes[1].
> > > >
> > > > As described in FLINK-18255[2] , several user-facing classes in
> > > > flink-statebackend-rocksdb module don't have any API annotations, not
> > > even
> > > > @PublicEvolving. This FLIP will add annotations for them to clarify
> > their
> > > > usage.
> > > >
> > > > Looking forward to hearing from you, thanks!
> > > >
> > > >
> > > > Best regards,
> > > > Jinzhong Li
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-420%3A+Add+API+annotations+for+RocksDB+StateBackend+user-facing+classes
> > > > [2] https://issues.apache.org/jira/browse/FLINK-18255
> > > >
> > >
> >
> >
> > --
> > Best,
> > Hangxiang.
> >


[jira] [Created] (FLINK-34495) Resuming Savepoint (rocks, scale up, heap timers) end-to-end test failure

2024-02-21 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34495:
-

 Summary: Resuming Savepoint (rocks, scale up, heap timers) 
end-to-end test failure
 Key: FLINK-34495
 URL: https://issues.apache.org/jira/browse/FLINK-34495
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57760&view=logs&j=e9d3d34f-3d15-59f4-0e3e-35067d100dfe&t=5d91035e-8022-55f2-2d4f-ab121508bf7e&l=2010

I guess the failure occurred due to the existence of a checkpoint failure:
{code}
Feb 22 00:49:16 2024-02-22 00:49:04,305 WARN  
org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to 
trigger or complete checkpoint 12 for job 3c9ffc670ead2cb3c4118410cbef3b72. (0 
consecutive failed attempts so far)
Feb 22 00:49:16 org.apache.flink.runtime.checkpoint.CheckpointException: 
Checkpoint Coordinator is suspending.
Feb 22 00:49:16 at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.stopCheckpointScheduler(CheckpointCoordinator.java:2056)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.scheduler.SchedulerBase.stopCheckpointScheduler(SchedulerBase.java:960)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.scheduler.SchedulerBase.stopWithSavepoint(SchedulerBase.java:1030)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.jobmaster.JobMaster.stopWithSavepoint(JobMaster.java:901)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
Feb 22 00:49:16 at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 ~[?:?]
Feb 22 00:49:16 at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:?]
Feb 22 00:49:16 at java.lang.reflect.Method.invoke(Method.java:566) 
~[?:?]
Feb 22 00:49:16 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:309)
 ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
 ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:307)
 ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:222)
 ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85)
 ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168)
 ~[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar:1.20-SNAPSHOT]
Feb 22 00:49:16 at 
org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) 
[flink-rpc-akkad6c8f388-439d-487d-ab4d-9a34a56cbc0d.jar

[jira] [Created] (FLINK-34494) Migrate ReplaceIntersectWithSemiJoinRule

2024-02-21 Thread Jacky Lau (Jira)
Jacky Lau created FLINK-34494:
-

 Summary: Migrate ReplaceIntersectWithSemiJoinRule
 Key: FLINK-34494
 URL: https://issues.apache.org/jira/browse/FLINK-34494
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.20.0
Reporter: Jacky Lau
 Fix For: 1.20.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34493) Migrate ReplaceMinusWithAntiJoinRule

2024-02-21 Thread Jacky Lau (Jira)
Jacky Lau created FLINK-34493:
-

 Summary: Migrate ReplaceMinusWithAntiJoinRule
 Key: FLINK-34493
 URL: https://issues.apache.org/jira/browse/FLINK-34493
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.20.0
Reporter: Jacky Lau
 Fix For: 1.20.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34492) fix scala style comment link when migrate scala to java

2024-02-21 Thread Jacky Lau (Jira)
Jacky Lau created FLINK-34492:
-

 Summary: fix scala style comment link when migrate scala to java
 Key: FLINK-34492
 URL: https://issues.apache.org/jira/browse/FLINK-34492
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.20.0
Reporter: Jacky Lau
 Fix For: 1.20.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Apache Flink Committer - Jiabao Sun

2024-02-21 Thread Jingsong Li
Congratulations! Well deserved!

On Wed, Feb 21, 2024 at 4:36 PM Yuepeng Pan  wrote:
>
> Congratulations~ :)
>
> Best,
> Yuepeng Pan
>
>
>
>
>
>
>
>
>
>
> 在 2024-02-21 09:52:17,"Hongshun Wang"  写道:
> >Congratulations, Jiabao :)
> >Congratulations Jiabao!
> >
> >Best,
> >Hongshun
> >Best regards,
> >
> >Weijie
> >
> >On Tue, Feb 20, 2024 at 2:19 PM Runkang He  wrote:
> >
> >> Congratulations Jiabao!
> >>
> >> Best,
> >> Runkang He
> >>
> >> Jane Chan  于2024年2月20日周二 14:18写道:
> >>
> >> > Congrats, Jiabao!
> >> >
> >> > Best,
> >> > Jane
> >> >
> >> > On Tue, Feb 20, 2024 at 10:32 AM Paul Lam  wrote:
> >> >
> >> > > Congrats, Jiabao!
> >> > >
> >> > > Best,
> >> > > Paul Lam
> >> > >
> >> > > > 2024年2月20日 10:29,Zakelly Lan  写道:
> >> > > >
> >> > > >> Congrats! Jiabao!
> >> > >
> >> > >
> >> >
> >>


[jira] [Created] (FLINK-34491) Move from experimental support to production support for Java 17

2024-02-21 Thread Dhruv Patel (Jira)
Dhruv Patel created FLINK-34491:
---

 Summary: Move from experimental support to production support for 
Java 17
 Key: FLINK-34491
 URL: https://issues.apache.org/jira/browse/FLINK-34491
 Project: Flink
  Issue Type: New Feature
Affects Versions: 1.18.1
Reporter: Dhruv Patel


Tracking task to move away from experimental support for Java 17 to production 
support so that teams running Flink in production can migrate to Java 17 
successfully



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34490) flink-connector-kinesis not correctly supporting credential chaining

2024-02-21 Thread Eddie Ramirez (Jira)
Eddie Ramirez created FLINK-34490:
-

 Summary: flink-connector-kinesis not correctly supporting 
credential chaining
 Key: FLINK-34490
 URL: https://issues.apache.org/jira/browse/FLINK-34490
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kinesis
Affects Versions: 1.17.2, aws-connector-4.2.0
Reporter: Eddie Ramirez
 Attachments: Flink Credential Chaining.png

When using AWS credential chaining, `{{{}flink-connector-kinesis{}}}` does not 
correctly follow the chain of credentials.


*Expected Result*

 `{{{}flink-connector-kinesis{}}}`  should follow the `{{{}source_profile{}}}` 
for each respective profile in `{{{}~/.aws/config{}}}` to ultimately determine 
credentials.


*Observed Result*

 `{{{}flink-connector-kinesis{}}}` only follows the first matching 
`{{{}source_profile{}}}` specified in `{{{}~/.aws/config{}}}` and then errors 
out because there is no credentials for that profile.


{code:java}
org.apache.flink.kinesis.shaded.com.amazonaws.SdkClientException: Unable to 
load credentials into profile [profile intermediate-role]: AWS Access Key ID is 
not specified
{code}

*Configuration*

connector config

 
{code:java}
aws.credentials.provider: PROFILE
aws.credentials.profile.name: flink-access-role{code}

aws `{{{}~/.aws/config{}}}` file

 
{code:java}
[profile flink-access-role]
role_arn = arn:aws:iam::x:role/flink-access-role
source_profile = intermediate-role
[profile intermediate-role]
role_arn = arn:aws:iam::x:role/intermediate-role
source_profile = aws-sso-role
[profile aws-sso-role]
sso_session = idc
sso_role_name = x
sso_account_id = x
credential_process = aws configure export-credentials --profile=aws-sso-role
[sso-session idc]
sso_start_url = x
sso_region = x
sso_registration_scopes = sso:account:access
{code}
 

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-02-21 Thread Kevin Lam
I would love to get some feedback from the community on this JIRA issue:
https://issues.apache.org/jira/projects/FLINK/issues/FLINK-34440

I am looking into creating a PR and would appreciate some review on the
approach.

In terms of design I think we can mirror the `debezium-avro-confluent` and
`avro-confluent` formats already available in Flink:

   1. `protobuf-confluent` format which uses DynamicMessage
   

   for encoding and decoding.
  - For encoding the Flink RowType will be used to dynamically create a
  Protobuf Schema and register it with the Confluent Schema
Registry. It will
  use the same schema to construct a DynamicMessage and serialize it.
  - For decoding, the schema will be fetched from the registry and use
  DynamicMessage to deserialize and convert the Protobuf object to a Flink
  RowData.
  - Note: here there is no external .proto file
   2. `debezium-avro-confluent` format which unpacks the Debezium Envelope
   and collects the appropriate UPDATE_BEFORE, UPDATE_AFTER, INSERT, DELETE
   events.
  - We may be able to refactor and reuse code from the existing
  DebeziumAvroDeserializationSchema + DebeziumAvroSerializationSchema since
  the deser logic is largely delegated to and these Schemas are concerned
  with the handling the Debezium envelope.
   3. Move the Confluent Schema Registry Client code to a separate maven
   module, flink-formats/flink-confluent-common, and extend it to support
   ProtobufSchemaProvider
   

   .


Does anyone have any feedback or objections to this approach?


[jira] [Created] (FLINK-34489) New File Sink end-to-end test timed out

2024-02-21 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34489:
-

 Summary: New File Sink end-to-end test timed out
 Key: FLINK-34489
 URL: https://issues.apache.org/jira/browse/FLINK-34489
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.19.0, 1.20.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57707&view=logs&j=af184cdd-c6d8-5084-0b69-7e9c67b35f7a&t=0f3adb59-eefa-51c6-2858-3654d9e0749d&l=3726

{code}
eb 21 07:26:03 Number of produced values 10770/6
Feb 21 07:39:50 Test (pid: 151375) did not finish after 900 seconds.
Feb 21 07:39:50 Printing Flink logs and killing it:
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34488) Integrate snapshot deployment into GHA nightly workflow

2024-02-21 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34488:
-

 Summary: Integrate snapshot deployment into GHA nightly workflow
 Key: FLINK-34488
 URL: https://issues.apache.org/jira/browse/FLINK-34488
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


Analogously to the [Azure Pipelines nightly 
config|https://github.com/apache/flink/blob/e923d4060b6dabe650a8950774d176d3e92437c2/tools/azure-pipelines/build-apache-repo.yml#L103]
 we want to deploy the snapshot artifacts in the GHA nightly workflow as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34487) Integrate tools/azure-pipelines/build-python-wheels.yml into GHA nightly workflow

2024-02-21 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34487:
-

 Summary: Integrate tools/azure-pipelines/build-python-wheels.yml 
into GHA nightly workflow
 Key: FLINK-34487
 URL: https://issues.apache.org/jira/browse/FLINK-34487
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl


Analogously to the [Azure Pipelines nightly 
config|https://github.com/apache/flink/blob/e923d4060b6dabe650a8950774d176d3e92437c2/tools/azure-pipelines/build-apache-repo.yml#L183]
 we want to generate the wheels artifacts in the GHA nightly workflow as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34486) Add documentation on how to add the shared utils as a submodule to the connector repo

2024-02-21 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34486:
-

 Summary: Add documentation on how to add the shared utils as a 
submodule to the connector repo
 Key: FLINK-34486
 URL: https://issues.apache.org/jira/browse/FLINK-34486
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Common
Affects Versions: connector-parent-1.1.0
Reporter: Matthias Pohl


[apache/flink-connector-shared-utils:README.md|https://github.com/apache/flink-connector-shared-utils/blob/release_utils/README.md]
 doesn't state how a the shared utils shall be added as a submodule to a 
connector repository. But this is expected from within [connector release 
documentation|https://cwiki.apache.org/confluence/display/FLINK/Creating+a+flink-connector+release#Creatingaflinkconnectorrelease-Buildareleasecandidate]:
{quote}
The following sections assume that the release_utils branch from 
flink-connector-shared-utils is mounted as a git submodule under 
tools/releasing/shared, you can update the submodule by running  git submodule 
update --remote (or git submodule update --init --recursive if the submodule 
wasn't initialized, yet) to use latest release utils, you need to mount the  
flink-connector-shared-utils  as a submodule under the tools/releasing/shared 
if it hasn't been mounted in the connector repository. See the README for 
details.
{quote}

Let's update the README accordingly and add a link to {{README}} in the 
connector release documentation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34485) Token delegation doesn't work with Presto S3 filesystem

2024-02-21 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-34485:


 Summary: Token delegation doesn't work with Presto S3 filesystem
 Key: FLINK-34485
 URL: https://issues.apache.org/jira/browse/FLINK-34485
 Project: Flink
  Issue Type: Bug
  Components: Connectors / FileSystem
Affects Versions: 1.18.1
Reporter: Chesnay Schepler
 Fix For: 1.20.0


AFAICT it's not possible to use token delegation with the Presto filesystem.
The token delegation relies on the {{DynamicTemporaryAWSCredentialsProvider}}, 
but it doesn't have a constructor that presto required (ruling out 
presto.s3.credentials-provider), and other providers can't be used due to 
FLINK-13602.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-414: Support Retry Mechanism in RocksDBStateDataTransfer

2024-02-21 Thread xiangyu feng
Hi Mayue, Hangxiangyu and Piotr,

Sorry for the late reply and thanks a lot for your feedback. After careful
thoughts, I agree with your opinions in following ways:

1, Instead of introducing a specific retry mechanism for RocksDB data
transfer, we should introduce a more fine-grained retry mechanism for
checkpointing when interacting with external storage to improve the overall
success rate;

2, This fine-grained retry mechanism should work with all kinds of state
implementations (OperatorStateBackend、KeyedStateBackend);

3, Currently, there are several Retry Implementations in Flink Code base:

   - `RetryStrategy` and `FutureUtils` under `flink-core` module for common
   usage
   - `AsyncRetryStrategy` and `RetryableResultHandlerDelegator` under
   datastream-api and used for async operators
   - `RetryPolicy` and `RetryingExecutor` under `flink-dstl` module and
   used for ChangelogStateBackend

IMHO, these implementations are almost interchangeable with few
variations. Unifying all the Retry implementations might be another big
topic to discuss, but I do agree that the retry tools used by all
StateBackend should be consistent.

4,The fine-grained retry mechanism should also consider different external
storage exceptions, the retry action should not be performed for a
permanent failure. AFAIK, this can be achieved by a customized retry
predicate.

According to the suggestions mentioned above, I will rework FLIP-414[1] and
continue this discussion after that.

Thx again for your valuable options~

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-414%3A+Support+Retry+Mechanism+in+RocksDBStateDataTransfer

Best Regards,
Xiangyu Feng


Piotr Nowojski  于2024年1月12日周五 01:39写道:

> Hi,
>
> Thanks for the proposal. I second the Hangxiang's suggestions.
>
> I think this might be valuable. Instead of retrying the whole checkpoint,
> it will be more resource efficient
> to retry upload of a single file.
>
> Regarding re-using configuration options, a while back we introduced
> `taskmanager.network.retries`
> config option. It was hoped to eventually encompass things like this.
>
> My own concern is if we should retry regardless of the exception type, or
> should we focus on things like
> connection loss/host unreachable? All in all, it would be better to not
> retry upload if the failure was:
> - `FileSystem` for given schema not found
> - authorisation failed
> - lack of write rights
> - ...
>
> Best,
> Piotrek
>
>
>
>
> czw., 11 sty 2024 o 10:35 Hangxiang Yu  napisał(a):
>
> > Thanks for driving this.
> > Retry mechanism is common when we want to get or put data by network.
> > So I think it will help when checkpoint failure due to temporary network
> > problems, of course it may increase a bit overhead for some other
> reasons.
> >
> > Some comments and suggestions:
> > 1. Since Flink has a checkpoint mechanism to retry failed checkpoint
> > coarsely, I think it looks good to me if this fine-grained retry could be
> > configurable and don't change the current default mechanism.
> > 2. This should work with the checkpoint procedure of all state backends,
> > Could we make this config unrelated to a specific state backend (maybe
> > execution.checkpointing.xxx)?  Then it could be supported by below state
> > backends.
> > 3. We may not need to re-implement it. There are some tools supporting
> the
> > Retry mechanism (see RetryingExecutor and RetryPolicy in changelog dstl
> > module), it's better to make them become more common tools and reuse
> them.
> >
> > On Thu, Jan 11, 2024 at 3:09 PM yue ma  wrote:
> >
> > > Thanks for driving this effort, xiangyu!
> > > The proposal overall LGTM.
> > > I just have a small question. There are other places in Flink that
> > interact
> > > with external storage. Should we consider adding a general retry
> > mechanism
> > > to them?
> > >
> > > xiangyu feng  于2024年1月8日周一 11:31写道:
> > >
> > > > Hi devs,
> > > >
> > > > I'm opening this thread to discuss FLIP-414: Support Retry Mechanism
> in
> > > > RocksDBStateDataTransfer[1].
> > > >
> > > > Currently, there is no retry mechanism for downloading and uploading
> > > > RocksDB state files. Any jittering of remote filesystem might lead
> to a
> > > > checkpoint failure. By supporting retry mechanism in
> > > > `RocksDBStateDataTransfer`, we can significantly reduce the failure
> > rate
> > > of
> > > > checkpoint during asynchronous phrase.
> > > >
> > > > To make this retry mechanism configurable, we have introduced two
> > options
> > > > in this FLIP: `state.backend.rocksdb.checkpoint.transfer.retry.times`
> > > and `
> > > > state.backend.rocksdb.checkpoint.transfer.retry.interval`. The
> default
> > > > behavior remains to be no retry will be performed in order to be
> > > consistent
> > > > with the original behavior.
> > > >
> > > > Looking forward to your feedback, thanks.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-414%3A+Support+Retry+Mechanis

[jira] [Created] (FLINK-34484) Split 'state.backend.local-recovery' into two options for checkpointing and recovery

2024-02-21 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-34484:
---

 Summary: Split 'state.backend.local-recovery' into two options for 
checkpointing and recovery
 Key: FLINK-34484
 URL: https://issues.apache.org/jira/browse/FLINK-34484
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing
Reporter: Zakelly Lan
 Fix For: 1.20.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34483) Merge 'state.checkpoints.dir' and 'state.checkpoints-storage' into one

2024-02-21 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-34483:
---

 Summary: Merge 'state.checkpoints.dir' and 
'state.checkpoints-storage' into one
 Key: FLINK-34483
 URL: https://issues.apache.org/jira/browse/FLINK-34483
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing
Reporter: Zakelly Lan
 Fix For: 1.20.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34482) Rename options for checkpointing

2024-02-21 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-34482:
---

 Summary: Rename options for checkpointing
 Key: FLINK-34482
 URL: https://issues.apache.org/jira/browse/FLINK-34482
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing
Reporter: Zakelly Lan
 Fix For: 1.20.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34481) Migrate SetOpRewriteUtil

2024-02-21 Thread Jacky Lau (Jira)
Jacky Lau created FLINK-34481:
-

 Summary: Migrate SetOpRewriteUtil
 Key: FLINK-34481
 URL: https://issues.apache.org/jira/browse/FLINK-34481
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.20.0
Reporter: Jacky Lau
 Fix For: 1.20.0


we should Migrate SetOpRewriteUtil for 
ReplaceMinusWithAntiJoinRule ReplaceMinusWithAntiJoinRule
RewriteIntersectAllRule
RewriteMinusAllRule



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34480) Add method to support user jar overwrite flink inner jar class when same class

2024-02-21 Thread JinxinTang (Jira)
JinxinTang created FLINK-34480:
--

 Summary: Add method to support user jar overwrite flink inner jar 
class when same class
 Key: FLINK-34480
 URL: https://issues.apache.org/jira/browse/FLINK-34480
 Project: Flink
  Issue Type: Improvement
Reporter: JinxinTang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New Apache Flink Committer - Jiabao Sun

2024-02-21 Thread Yuepeng Pan
Congratulations~ :)

Best,
Yuepeng Pan










在 2024-02-21 09:52:17,"Hongshun Wang"  写道:
>Congratulations, Jiabao :)
>Congratulations Jiabao!
>
>Best,
>Hongshun
>Best regards,
>
>Weijie
>
>On Tue, Feb 20, 2024 at 2:19 PM Runkang He  wrote:
>
>> Congratulations Jiabao!
>>
>> Best,
>> Runkang He
>>
>> Jane Chan  于2024年2月20日周二 14:18写道:
>>
>> > Congrats, Jiabao!
>> >
>> > Best,
>> > Jane
>> >
>> > On Tue, Feb 20, 2024 at 10:32 AM Paul Lam  wrote:
>> >
>> > > Congrats, Jiabao!
>> > >
>> > > Best,
>> > > Paul Lam
>> > >
>> > > > 2024年2月20日 10:29,Zakelly Lan  写道:
>> > > >
>> > > >> Congrats! Jiabao!
>> > >
>> > >
>> >
>>