[ANNOUNCE] Flink 1.20 feature freeze

2024-06-14 Thread weijie guo
Hi everyone,


The feature freeze of 1.20 has started now. That means that no new features

or improvements should now be merged into the master branch unless you ask

the release managers first, which has already been done for PRs, or pending

on CI to pass. Bug fixes and documentation PRs can still be merged.



- *Cutting release branch*


Currently we have no blocker issues(beside tickets that used for
release-testing).

We are planning to cut the release branch on next Friday (June 21) if
no new test instabilities, and we'll make another announcement in the
dev mailing list then.



- *Cross-team testing*


The release testing will start right after we cut the release branch, which

is expected to come in the next week. As a prerequisite of it, we have created

the corresponding instruction ticket in FLINK-35602 [1], please check
and complete the

documentation and test instruction of your new feature and mark the related JIRA

issue in the 1.20 release wiki page [2] before we start testing, which

would be quite helpful for other developers to validate your features.



Best regards,

Robert, Rui, Ufuk and Weijie


[1]https://issues.apache.org/jira/browse/FLINK-35602

[2] https://cwiki.apache.org/confluence/display/FLINK/1.20+Release


[jira] [Created] (FLINK-35621) Release Testing Instructions: Verify FLIP-436: Introduce Catalog-related Syntax

2024-06-14 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35621:
--

 Summary: Release Testing Instructions: Verify FLIP-436: Introduce 
Catalog-related Syntax
 Key: FLINK-35621
 URL: https://issues.apache.org/jira/browse/FLINK-35621
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Common
Reporter: Weijie Guo
Assignee: Ahmed Hamdy
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-35435



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35620) Parquet writer creates wrong file for nested fields

2024-06-14 Thread Vicky Papavasileiou (Jira)
Vicky Papavasileiou created FLINK-35620:
---

 Summary: Parquet writer creates wrong file for nested fields
 Key: FLINK-35620
 URL: https://issues.apache.org/jira/browse/FLINK-35620
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.19.0
Reporter: Vicky Papavasileiou


After PR [https://github.com/apache/flink/pull/24795] got merged that added 
support for nested arrays, the parquet writer produces wrong parquet files that 
cannot be read. Note, the reader (both flink and iceberg) don't throw an 
exception but return `null` for the nested field. 

The error is in how the field `max_definition_level` is populated for nested 
fields. 

Consider Avro schema:

```

{
"namespace": "com.test",
"type": "record",
"name": "RecordData",
"fields": [
{
"name": "Field1",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "NestedField2",
"fields": [
{ "name": "NestedField3", "type": "double" }

]
}
}
}
]
}

```

Consider the excerpt below of a parquet file produced by Flink for the above 
schema:

```

 Column(SegmentStartTime) 
name: NestedField3
path: Field1.list.element.NestedField3
max_definition_level: 1
max_repetition_level: 1
physical_type: DOUBLE
logical_type: None
converted_type (legacy): NONE
compression: SNAPPY (space_saved: 7%)

```

The max_definition_level should be 4 but is 1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35619) Window rank query fails with "must call validate first"

2024-06-14 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-35619:


 Summary: Window rank query fails with "must call validate first"
 Key: FLINK-35619
 URL: https://issues.apache.org/jira/browse/FLINK-35619
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.20.0


A program:

{code}
static final TableTestProgram WINDOW_RANK_HOP_TVF_NAMED_MIN_TOP_1 =
TableTestProgram.of(
"window-rank-hop-tvf-named-min-top-n",
"validates window min top-n follows after hop 
window")
.setupTableSource(SourceTestStep.newBuilder("bid_t")
.addSchema(
"ts STRING",
"price DECIMAL(10,2)",
"supplier_id STRING",
"`bid_time` AS TO_TIMESTAMP(`ts`)",
"WATERMARK for `bid_time` AS `bid_time` - 
INTERVAL '1' SECOND")
.producedValues(
Row.of(
"2020-04-15 08:00:05",
new BigDecimal(4.00),
"supplier1"))
.build())
.setupTableSink(
SinkTestStep.newBuilder("sink_t")
.addSchema("bid_time TIMESTAMP(3)", 
"supplier_id STRING")
.consumedValues(
"+I[2020-04-15T08:00:05, 
supplier1]",
"+I[2020-04-15T08:00:05, 
supplier1]")
.build())
.runSql("INSERT INTO sink_t(bid_time, supplier_id) "
+ "SELECT bid_time, supplier_id\n"
+ "  FROM (\n"
+ "SELECT\n"
+ " bid_time,\n"
+ " supplier_id,\n"
+ " ROW_NUMBER() OVER (PARTITION BY 
window_start, window_end ORDER BY price ASC) AS row_num\n"
+ "FROM TABLE(HOP(\n"
+ "  DATA => TABLE bid_t,\n"
+ "  TIMECOL => DESCRIPTOR(`bid_time`),\n"
+ "  SLIDE => INTERVAL '5' SECOND,\n"
+ "  SIZE => INTERVAL '10' SECOND))\n"
+ "  ) WHERE row_num <= 3")
.build();
{code}

fails with:
{code}
java.lang.AssertionError: must call validate first

at 
org.apache.calcite.sql.validate.IdentifierNamespace.resolve(IdentifierNamespace.java:256)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:2871)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2464)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2378)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2323)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:730)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:716)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3880)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryOrInList(SqlToRelConverter.java:1912)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertExists(SqlToRelConverter.java:1895)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.substituteSubQuery(SqlToRelConverter.java:1421)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.replaceSubQueries(SqlToRelConverter.java:1161)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertCollectionTable(SqlToRelConverter.java:2928)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2511)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2378)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2323)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:730)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:716)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3880)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2490)
at 

Re: [VOTE] Apache Flink CDC Release 3.1.1, release candidate #0

2024-06-14 Thread Xiqian YU
Thanks Qingsheng for driving this release!

+1 (non-binding)


  *   Verified tarball checksum
  *   Compiled CDC from source
  *   Confirmed jars were compiled with JDK 8
  *   Ran savepoint migration test from CDC 3.0.x
  *   Ran Pipeline E2e tests with Flink 1.18.1 / 1.19.0

Regards,
Xiqian

De : Qingsheng Ren 
Date : vendredi, 14 juin 2024 à 15:06
À : dev 
Objet : [VOTE] Apache Flink CDC Release 3.1.1, release candidate #0
Hi everyone,

Please review and vote on the release candidate #0 for the version 3.1.1 of
Apache Flink CDC, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

**Release Overview**

As an overview, the release consists of the following:
a) Flink CDC source release to be deployed to dist.apache.org
b) Maven artifacts to be deployed to the Maven Central Repository

**Staging Areas to Review**

The staging areas containing the above mentioned artifacts are as follows,
for your review:
* All artifacts for a) can be found in the corresponding dev repository at
dist.apache.org [1], which are signed with the key with fingerprint
A1BD477F79D036D2C30CA7DBCA8AEEC2F6EB040B [2]
* All artifacts for b) can be found at the Apache Nexus Repository [3]

Other links for your review:
* JIRA release notes [4]
* Source code tag "release-3.1.1-rc0" with commit hash
b163b8e1589184bd7716cf6d002f3472766eb5db [5]
* PR for release announcement blog post of Flink CDC 3.1.1 in flink-web [6]

**Vote Duration**

The voting time will run for at least 72 hours.
It is adopted by majority approval, with at least 3 PMC affirmative votes.

Thanks,
Qingsheng Ren

[1] https://dist.apache.org/repos/dist/dev/flink/flink-cdc-3.1.1-rc0
[2] https://dist.apache.org/repos/dist/release/flink/KEYS
[3] https://repository.apache.org/content/repositories/orgapacheflink-1739
[4]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354763
[5] https://github.com/apache/flink-cdc/releases/tag/release-3.1.1-rc0
[6] https://github.com/apache/flink-web/pull/746


[jira] [Created] (FLINK-35618) Flink CDC add MongoDB pipeline data sink connector

2024-06-14 Thread Jiabao Sun (Jira)
Jiabao Sun created FLINK-35618:
--

 Summary: Flink CDC add MongoDB pipeline data sink connector
 Key: FLINK-35618
 URL: https://issues.apache.org/jira/browse/FLINK-35618
 Project: Flink
  Issue Type: New Feature
  Components: Flink CDC
Affects Versions: cdc-3.2.0
Reporter: Jiabao Sun
Assignee: Jiabao Sun


Flink CDC add MongoDB pipeline data sink connector



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35617) Support object reuse in async state execution

2024-06-14 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-35617:
---

 Summary: Support object reuse in async state execution
 Key: FLINK-35617
 URL: https://issues.apache.org/jira/browse/FLINK-35617
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: Zakelly Lan
Assignee: Zakelly Lan


The record processor of {{AEC}} in async state execution model should consider 
object reuse and copy if needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35616) Support upsert into sharded collections

2024-06-14 Thread Jiabao Sun (Jira)
Jiabao Sun created FLINK-35616:
--

 Summary: Support upsert into sharded collections
 Key: FLINK-35616
 URL: https://issues.apache.org/jira/browse/FLINK-35616
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / MongoDB
Affects Versions: mongodb-1.2.0
Reporter: Jiabao Sun
Assignee: Jiabao Sun


{panel:}
For a db.collection.update() operation that includes upsert: true and is on a 
sharded collection, the full sharded key must be included in the filter:

* For an update operation.
* For a replace document operation (starting in MongoDB 4.2).
{panel}

https://www.mongodb.com/docs/manual/reference/method/db.collection.update/#upsert-on-a-sharded-collection

We need to allow users to configure the full sharded key field names to upsert 
into the sharded collection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-06-14 Thread David Radley
Hi everyone,
I have talked with Chesnay and Danny offline. Danny and I were not very happy 
with the passing Maps around, and were looking for a neater design. Chesnay 
suggested that we could move the new format to the Kafka connector, then pass 
the Kafka record down to the deserialize logic so it can make use of the 
headers during deserialization and serialisation.

I think this is a neat idea. This would mean:
- the Kafka connector code would need to be updated to pass down the Kafka 
record
- there would be the Avro Apicurio format and SQL in the kafka repository. We 
feel it is unlikely to want to use the Apicurio registry with files, as the 
Avro format could be used.

Unfortunately I have found that this as not so straight forward to implement as 
the Avro Apicurio format uses the Avro format, which is tied to the 
DeserializationSchema. We were hoping to have a new decoding implementation 
that would pass down the Kafka record rather than the payload. This does not 
appear possible without a Avro format change.


Inspired by this idea, I notice that KafkaValueOnlyRecordDeserializerWrapper 
extends KafkaValueOnlyDeserializerWrapper

Does

deserializer.deserialize(record.topic(),record.value())



I am investigating If I can add a factory/reflection to provide an alternative
Implementation that will pass the record based (the kafka record is not 
serializable so I will pick what we need and deserialize) as a byte array.

I would need to do this 4 times (value ,key for deserialisation and 
serialisation. To do this I would need to convert the record into a byte array, 
so it fits into the existing interface (DeserializationSchema).  I think this 
could be a way through, to avoid using maps and avoid changing the existing 
Avro format and avoid change any core Flink interfaces.

I am going to prototype this idea. WDYT?

My thanks go to Chesnay and Danny for their support and insight around this 
Flip,
   Kind regards, David.






From: David Radley 
Date: Wednesday, 29 May 2024 at 11:39
To: dev@flink.apache.org 
Subject: [EXTERNAL] RE: [DISCUSS] FLIP-XXX Apicurio-avro format
Hi Danny,
Thank you for your feedback on this.

I agree that using maps has pros and cons. The maps are flexible, but do 
require the sender and receiver to know what is in the map.

When you say “That sounds like it would fit in better, I assume we cannot just 
take that approach?” The motivation behind this Flip is to support the headers 
which is the usual way that Apicurio runs. We will support the “schema id in 
the payload” as well.

I agree with you when you say “ I am not 100% happy with the solution but I
cannot offer a better option.” – this is a pragmatic way we have found to solve 
this issue. I am open to any suggestions to improve this as well.

If we are going with the maps design (which is the best we have at the moment) 
; it would be good to have the Flink core changes in base Flink version 2.0 as 
this would mean we do not need to use reflection in a Flink Kafka version 2 
connector to work out if the runtime Flink has the new methods.

At this stage we only have one committer (yourself) backing this. Do you know 
of other 2 committers who would support this Flip?

 Kind regards, David.



From: Danny Cranmer 
Date: Friday, 24 May 2024 at 19:32
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-XXX Apicurio-avro format
Hello,

> I am curious what you mean by abused.

I just meant we will end up adding more and more fields to this map over
time, and it may be hard to undo.

> For Apicurio it can be sent at the start of the payload like Confluent
Avro does. Confluent Avro have a magic byte followed by 4 bytes of schema
id, at the start of the payload. Apicurio clients and SerDe libraries can
be configured to not put the schema id in the headers in which case there
is a magic byte followed by an 8 byte schema at the start of the payload.
In the deserialization case, we would not need to look at the headers –
though the encoding is also in the headers.

That sounds like it would fit in better, I assume we cannot just take that
approach?

Thanks for the discussion. I am not 100% happy with the solution but I
cannot offer a better option. I would be interested to hear if others have
any suggestions. Playing devil's advocate against myself, we pass maps
around to configure connectors so it is not too far away from that.

Thanks,
Danny


On Fri, May 24, 2024 at 2:23 PM David Radley 
wrote:

> Hi Danny,
> No worries, thanks for replying. I have working prototype code that is
> being reviewed. It needs some cleaning up and more complete testing before
> it is ready, but will give you the general idea [1][2] to help to assess
> this approach.
>
>
> I am curious what you mean by abused. I guess the approaches are between
> generic map, mechanism vs a more particular more granular things being
> passed that might be used by another connector.
>
> Your first question:
> “how would this work if the schema 

[jira] [Created] (FLINK-35615) CDC3.0, the pipeline of mysql-doris , do not support the MySQL field type timestamp(6)

2024-06-14 Thread Dylan Zhao (Jira)
Dylan Zhao created FLINK-35615:
--

 Summary: CDC3.0, the pipeline of mysql-doris ,  do not support the 
MySQL field type timestamp(6)
 Key: FLINK-35615
 URL: https://issues.apache.org/jira/browse/FLINK-35615
 Project: Flink
  Issue Type: Bug
Reporter: Dylan Zhao


@吕宴全I困溺 大佬,目前CDC3.0doris的pipeline同步,字段类型为timestamp(6)包含精度的字段类型,会报错数组越界的问题。

!https://static.dingtalk.com/media/lQLPJw-K4zrLMjnNA6LNB-Swvk-76DoWxAUGVxnmewT2AA_2020_930.png|width=2020,height=930!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35614) Release Testing Instructions: Verify FLIP-443: Interruptible timers firing

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35614:
---

 Summary: Release Testing Instructions: Verify  FLIP-443: 
Interruptible timers firing 
 Key: FLINK-35614
 URL: https://issues.apache.org/jira/browse/FLINK-35614
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35613) Release Testing Instructions: Verify [FLIP-451] Introduce timeout configuration to AsyncSink

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35613:
---

 Summary: Release Testing Instructions: Verify [FLIP-451] Introduce 
timeout configuration to AsyncSink
 Key: FLINK-35613
 URL: https://issues.apache.org/jira/browse/FLINK-35613
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35612) Release Testing Instructions: Verify FLIP-445: Support dynamic parallelism inference for HiveSource

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35612:
---

 Summary: Release Testing Instructions: Verify FLIP-445: Support 
dynamic parallelism inference for HiveSource
 Key: FLINK-35612
 URL: https://issues.apache.org/jira/browse/FLINK-35612
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35611) Release Testing Instructions: Verify [FLIP-453] Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35611:
---

 Summary: Release Testing Instructions: Verify [FLIP-453] Promote 
Unified Sink API V2 to Public and Deprecate SinkFunction
 Key: FLINK-35611
 URL: https://issues.apache.org/jira/browse/FLINK-35611
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35610) Release Testing Instructions: Verify FLIP-448: Introduce Pluggable Workflow Scheduler Interface for Materialized Table

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35610:
---

 Summary: Release Testing Instructions: Verify FLIP-448: Introduce 
Pluggable Workflow Scheduler Interface for Materialized Table
 Key: FLINK-35610
 URL: https://issues.apache.org/jira/browse/FLINK-35610
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35609) Release Testing Instructions: Verify FLIP-435: Introduce a New Materialized Table for Simplifying Data Pipelines

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35609:
---

 Summary: Release Testing Instructions: Verify FLIP-435: Introduce 
a New Materialized Table for Simplifying Data Pipelines
 Key: FLINK-35609
 URL: https://issues.apache.org/jira/browse/FLINK-35609
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35608) CLONE - Release Testing Instructions: Verify FLIP-376: Add DISTRIBUTED BY clause

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35608:
---

 Summary: CLONE - Release Testing Instructions: Verify FLIP-376: 
Add DISTRIBUTED BY clause
 Key: FLINK-35608
 URL: https://issues.apache.org/jira/browse/FLINK-35608
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35607) Release Testing Instructions: Verify FLIP-441: Show the JobType and remove Execution Mode on Flink WebUI

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35607:
---

 Summary: Release Testing Instructions: Verify  FLIP-441: Show the 
JobType and remove Execution Mode on Flink WebUI 
 Key: FLINK-35607
 URL: https://issues.apache.org/jira/browse/FLINK-35607
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-29481



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35606) Release Testing Instructions: Verify FLINK-26050 Too many small sst files in rocksdb state backend when using time window created in ascending order

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35606:
---

 Summary: Release Testing Instructions: Verify FLINK-26050 Too many 
small sst files in rocksdb state backend when using time window created in 
ascending order
 Key: FLINK-35606
 URL: https://issues.apache.org/jira/browse/FLINK-35606
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Reporter: Rui Fan
Assignee: Roman Khachatryan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-26050



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35605) Release Testing Instructions: Verify FLIP-306 Unified File Merging Mechanism for Checkpoints

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35605:
---

 Summary: Release Testing Instructions: Verify FLIP-306 Unified 
File Merging Mechanism for Checkpoints
 Key: FLINK-35605
 URL: https://issues.apache.org/jira/browse/FLINK-35605
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing
Reporter: Rui Fan
Assignee: Zakelly Lan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-32070



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35604) Release Testing Instructions: Verify FLIP-383: Support Job Recovery from JobMaster Failures for Batch Jobs

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35604:
---

 Summary: Release Testing Instructions: Verify FLIP-383: Support 
Job Recovery from JobMaster Failures for Batch Jobs
 Key: FLINK-35604
 URL: https://issues.apache.org/jira/browse/FLINK-35604
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Rui Fan
Assignee: Yuxin Tan
 Fix For: 1.20.0


Follow up the test for https://issues.apache.org/jira/browse/FLINK-35533



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35603) Release Testing Instructions: Verify FLINK-35533(FLIP-459): Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-14 Thread Rui Fan (Jira)
Rui Fan created FLINK-35603:
---

 Summary: Release Testing Instructions: Verify 
FLINK-35533(FLIP-459): Support Flink hybrid shuffle integration with Apache 
Celeborn
 Key: FLINK-35603
 URL: https://issues.apache.org/jira/browse/FLINK-35603
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Rui Fan
Assignee: Yuxin Tan


Follow up the test for https://issues.apache.org/jira/browse/FLINK-35533



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35602) [Umbrella] Test Flink Release 1.20

2024-06-14 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35602:
--

 Summary: [Umbrella] Test Flink Release 1.20
 Key: FLINK-35602
 URL: https://issues.apache.org/jira/browse/FLINK-35602
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Re: Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-14 Thread weijie guo
Hi all,

Thanks for all the feedback and suggestions so far.

If there is no further comment, we will open the voting thread next monday.

Best regards,

Weijie


weijie guo  于2024年6月14日周五 15:49写道:

> Thanks Lincoln for the quick response.
>
> > Since we've decided to extend a new hint option 'shuffle' to the current
> `LOOKUP` join hint, do we support hash shuffle as well?(It seems like it
> shouldn't require a lot of extra work, right?) This will deliver a
> complete new feature to users,  also because
> FLIP-204 is stale for now and this new extension will give user a more
> simpler way to achieve the goal, WDYT?
>
> Yes, I think this makes more sense.
>
> In a word: If the target dim table does not
> implement SupportsLookupCustomShuffle, the planner will try best to apply
> customer partitioning for the input stream. Otherwise, the planner will try
> best to apply a hash partitioning.
>
> As for FLIP-204, I think we can discuss whether it should be discarded or
> refactored in a separate thread. TBH, I think the current approach can
> completely cover it and be much easier to use.
> > "upsert mode" should be "updating stream" or "non-insert-only stream".
>
> Thanks, updated the FLIP.
>
>
>
> Best regards,
>
> Weijie
>
>
> Lincoln Lee  于2024年6月13日周四 23:08写道:
>
>> Thanks Weijie & Wencong for your update including the conclusions of
>> the offline discussion.
>>
>> There's one thing need to be confirmed in the FLIP:
>> > The hint only provides a suggestion to the optimizer, it is not an
>> enforcer. As a result, If the target dim table not implements
>> SupportsLookupCustomShuffle, planner will ignore this newly introduced
>> shuffle option.
>>
>> Since we've decided to extend a new hint option 'shuffle' to the current
>> `LOOKUP` join hint, do we support hash shuffle as well?(It seems like it
>> shouldn't require a lot of extra work, right?)
>> This will deliver a complete new feature to users,  also because
>> FLIP-204 is stale for now and this new extension will give user a more
>> simpler way to achieve the goal, WDYT?
>>
>> Another small comment for the new interface:
>> > "... planner may not apply this partitioner in upsert mode ..."
>> > default boolean isDeterministic()
>> "upsert mode" should be "updating stream" or "non-insert-only stream".
>>
>>
>> Best,
>> Lincoln Lee
>>
>>
>> Wencong Liu  于2024年6月12日周三 21:43写道:
>>
>> > Hi Jingsong,
>> >
>> >
>> > Some of the points you mentioned are currently clarified in
>> > the updated FLIP. Please check it out.
>> >
>> >
>> > 1. Enabling custom data distribution can be done through the
>> > LOOKUP SQL Hint. There are detailed examples provided in the FLIP.
>> >
>> >
>> > 2. We will add the isDeterministic method to the `InputDataPartitioner`
>> > interface, which will return true by default. If the
>> > `InputDataPartitioner`
>> > is not deterministic, the connector developer need to override the
>> > isDeterministic method to return false. If the connector developer
>> > cannot ensure this protocol, they will need to bear the correctness
>> > issues that arise.
>> >
>> >
>> > 3. Yes, this feature will work in batch mode as well.
>> >
>> >
>> > Best regards,
>> > Wencong
>> >
>> >
>> >
>> >
>> >
>> > At 2024-06-11 23:47:40, "Jingsong Li"  wrote:
>> > >Hi all,
>> > >
>> > >+1 to this FLIP, very thanks all for your proposal.
>> > >
>> > >isDeterministic looks good to me too.
>> > >
>> > >We can consider stating the following points:
>> > >
>> > >1. How to enable custom data distribution? Is it a dynamic hint? Can
>> > >you provide an SQL example.
>> > >
>> > >2. What impact will it have when the mainstream is changelog? Causing
>> > >disorder? This may need to be emphasized.
>> > >
>> > >3. Does this feature work in batch mode too?
>> > >
>> > >Best,
>> > >Jingsong
>> > >
>> > >On Tue, Jun 11, 2024 at 8:22 PM Wencong Liu 
>> wrote:
>> > >>
>> > >> Hi Lincoln,
>> > >>
>> > >>
>> > >> Thanks for your reply. Weijie and I discussed these two issues
>> offline,
>> > >> and here are the results of our discussion:
>> > >> 1. When the user utilizes the hash lookup join hint introduced by
>> > FLIP-204[1],
>> > >> the `SupportsLookupCustomShuffle` interface should be ignored. This
>> is
>> > because
>> > >> the hash lookup join hint is directly specified by the user through a
>> > SQL HINT,
>> > >> which is more in line with user intuition. WDYT?
>> > >> 2. We agree with the introduction of the `isDeterministic` method.
>> The
>> > >> `SupportsLookupCustomShuffle` interface introduces a custom shuffle,
>> > which
>> > >> can cause ADD/UPDATE_AFTER events (+I, +U) to appear
>> > >> after UPDATE_BEFORE/DELETE events (-D, -U), thus breaking the current
>> > >> limitations of the Flink Sink Operator[2]. If `isDeterministic`
>> returns
>> > false and the
>> > >> changelog event type is not insert-only, the Planner should not apply
>> > the shuffle
>> > >> provided by `SupportsLookupCustomShuffle`.
>> > >>
>> > >>
>> > >> [1]
>> >
>> 

Re: Re: Re: [DISCUSS] FLIP-462: Support Custom Data Distribution for Input Stream of Lookup Join

2024-06-14 Thread weijie guo
Thanks Lincoln for the quick response.

> Since we've decided to extend a new hint option 'shuffle' to the current
`LOOKUP` join hint, do we support hash shuffle as well?(It seems like it
shouldn't require a lot of extra work, right?) This will deliver a complete
new feature to users,  also because
FLIP-204 is stale for now and this new extension will give user a more
simpler way to achieve the goal, WDYT?

Yes, I think this makes more sense.

In a word: If the target dim table does not
implement SupportsLookupCustomShuffle, the planner will try best to apply
customer partitioning for the input stream. Otherwise, the planner will try
best to apply a hash partitioning.

As for FLIP-204, I think we can discuss whether it should be discarded or
refactored in a separate thread. TBH, I think the current approach can
completely cover it and be much easier to use.
> "upsert mode" should be "updating stream" or "non-insert-only stream".

Thanks, updated the FLIP.



Best regards,

Weijie


Lincoln Lee  于2024年6月13日周四 23:08写道:

> Thanks Weijie & Wencong for your update including the conclusions of
> the offline discussion.
>
> There's one thing need to be confirmed in the FLIP:
> > The hint only provides a suggestion to the optimizer, it is not an
> enforcer. As a result, If the target dim table not implements
> SupportsLookupCustomShuffle, planner will ignore this newly introduced
> shuffle option.
>
> Since we've decided to extend a new hint option 'shuffle' to the current
> `LOOKUP` join hint, do we support hash shuffle as well?(It seems like it
> shouldn't require a lot of extra work, right?)
> This will deliver a complete new feature to users,  also because
> FLIP-204 is stale for now and this new extension will give user a more
> simpler way to achieve the goal, WDYT?
>
> Another small comment for the new interface:
> > "... planner may not apply this partitioner in upsert mode ..."
> > default boolean isDeterministic()
> "upsert mode" should be "updating stream" or "non-insert-only stream".
>
>
> Best,
> Lincoln Lee
>
>
> Wencong Liu  于2024年6月12日周三 21:43写道:
>
> > Hi Jingsong,
> >
> >
> > Some of the points you mentioned are currently clarified in
> > the updated FLIP. Please check it out.
> >
> >
> > 1. Enabling custom data distribution can be done through the
> > LOOKUP SQL Hint. There are detailed examples provided in the FLIP.
> >
> >
> > 2. We will add the isDeterministic method to the `InputDataPartitioner`
> > interface, which will return true by default. If the
> > `InputDataPartitioner`
> > is not deterministic, the connector developer need to override the
> > isDeterministic method to return false. If the connector developer
> > cannot ensure this protocol, they will need to bear the correctness
> > issues that arise.
> >
> >
> > 3. Yes, this feature will work in batch mode as well.
> >
> >
> > Best regards,
> > Wencong
> >
> >
> >
> >
> >
> > At 2024-06-11 23:47:40, "Jingsong Li"  wrote:
> > >Hi all,
> > >
> > >+1 to this FLIP, very thanks all for your proposal.
> > >
> > >isDeterministic looks good to me too.
> > >
> > >We can consider stating the following points:
> > >
> > >1. How to enable custom data distribution? Is it a dynamic hint? Can
> > >you provide an SQL example.
> > >
> > >2. What impact will it have when the mainstream is changelog? Causing
> > >disorder? This may need to be emphasized.
> > >
> > >3. Does this feature work in batch mode too?
> > >
> > >Best,
> > >Jingsong
> > >
> > >On Tue, Jun 11, 2024 at 8:22 PM Wencong Liu 
> wrote:
> > >>
> > >> Hi Lincoln,
> > >>
> > >>
> > >> Thanks for your reply. Weijie and I discussed these two issues
> offline,
> > >> and here are the results of our discussion:
> > >> 1. When the user utilizes the hash lookup join hint introduced by
> > FLIP-204[1],
> > >> the `SupportsLookupCustomShuffle` interface should be ignored. This is
> > because
> > >> the hash lookup join hint is directly specified by the user through a
> > SQL HINT,
> > >> which is more in line with user intuition. WDYT?
> > >> 2. We agree with the introduction of the `isDeterministic` method. The
> > >> `SupportsLookupCustomShuffle` interface introduces a custom shuffle,
> > which
> > >> can cause ADD/UPDATE_AFTER events (+I, +U) to appear
> > >> after UPDATE_BEFORE/DELETE events (-D, -U), thus breaking the current
> > >> limitations of the Flink Sink Operator[2]. If `isDeterministic`
> returns
> > false and the
> > >> changelog event type is not insert-only, the Planner should not apply
> > the shuffle
> > >> provided by `SupportsLookupCustomShuffle`.
> > >>
> > >>
> > >> [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
> > >> [2]
> >
> https://www.ververica.com/blog/flink-sql-secrets-mastering-the-art-of-changelog-event-out-of-orderness
> > >>
> > >>
> > >> Best,
> > >> Wencong
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> At 2024-06-11 00:02:57, "Lincoln Lee"  wrote:
> > >> >Hi Weijie,
> > >> >
> > >> 

Re: [DISCUSS] Add a JDBC Sink Plugin to Flink-CDC-Pipeline

2024-06-14 Thread Jerry
The content has been edited in FLIP format, but I don't have wiki
permissions and can't create a wiki document.

https://docs.google.com/document/d/1bgKV9Teq8ktHZOAJ7EKzgSqP11Q4ZT2EM8BndOtDgy0/edit

Leonard Xu  于2024年5月21日周二 22:50写道:

> Thanks Jerry for kicking off this thread, the idea makes sense to me, JDBC
> Sink is users’ need and Flink CDC project should support it soon.
>
> Could you share your design docs(FLIP) firstly[1]? And then we can
> continue the design discussion.
>
> Please feel free to ping me if you have any concerns about FLIP process or
> Flink CDC design part.
>
> Best,
> Leonard
> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template
>
>
> > 2024年5月15日 下午3:06,Jerry  写道:
> >
> > Hi all
> > My name is ZhengjunZhou, an user and developer of FlinkCDC. In my recent
> > projects, I realized that we could enhance the capabilities of
> > Flink-CDC-Pipeline by introducing a JDBC Sink plugin, enabling FlinkCDC
> to
> > directly output change data capture (CDC) to various JDBC-supported
> > database systems.
> >
> > Currently, while FlinkCDC offers support for a wide range of data
> sources,
> > there is no direct solution for sinks, especially for common relational
> > databases. I believe that adding a JDBC Sink plugin will significantly
> > boost its applicability in data integration scenarios.
> >
> > Specifically, this plugin would allow users to configure database
> > connections and stream data directly to SQL databases via the standard
> JDBC
> > interface. This could be used for data migration tasks as well as
> real-time
> > data synchronization.
> >
> > To further discuss this proposal and gather feedback from the community,
> I
> > have prepared a preliminary design draft and hope to discuss it in detail
> > in the upcoming community meeting. Please consider the potential value of
> > this feature and provide your insights and guidance.
> >
> > Thank you for your time and consideration. I look forward to your active
> > feedback and further discussion.
> >
> > [1] https://github.com/apache/flink-connector-jdbc
>
>


[jira] [Created] (FLINK-35601) InitOutputPathTest.testErrorOccursUnSynchronized failed due to NoSuchFieldException

2024-06-14 Thread Weijie Guo (Jira)
Weijie Guo created FLINK-35601:
--

 Summary: InitOutputPathTest.testErrorOccursUnSynchronized failed 
due to NoSuchFieldException
 Key: FLINK-35601
 URL: https://issues.apache.org/jira/browse/FLINK-35601
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.20.0
Reporter: Weijie Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] Apache Flink CDC Release 3.1.1, release candidate #0

2024-06-14 Thread Qingsheng Ren
Hi everyone,

Please review and vote on the release candidate #0 for the version 3.1.1 of
Apache Flink CDC, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

**Release Overview**

As an overview, the release consists of the following:
a) Flink CDC source release to be deployed to dist.apache.org
b) Maven artifacts to be deployed to the Maven Central Repository

**Staging Areas to Review**

The staging areas containing the above mentioned artifacts are as follows,
for your review:
* All artifacts for a) can be found in the corresponding dev repository at
dist.apache.org [1], which are signed with the key with fingerprint
A1BD477F79D036D2C30CA7DBCA8AEEC2F6EB040B [2]
* All artifacts for b) can be found at the Apache Nexus Repository [3]

Other links for your review:
* JIRA release notes [4]
* Source code tag "release-3.1.1-rc0" with commit hash
b163b8e1589184bd7716cf6d002f3472766eb5db [5]
* PR for release announcement blog post of Flink CDC 3.1.1 in flink-web [6]

**Vote Duration**

The voting time will run for at least 72 hours.
It is adopted by majority approval, with at least 3 PMC affirmative votes.

Thanks,
Qingsheng Ren

[1] https://dist.apache.org/repos/dist/dev/flink/flink-cdc-3.1.1-rc0
[2] https://dist.apache.org/repos/dist/release/flink/KEYS
[3] https://repository.apache.org/content/repositories/orgapacheflink-1739
[4]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354763
[5] https://github.com/apache/flink-cdc/releases/tag/release-3.1.1-rc0
[6] https://github.com/apache/flink-web/pull/746


[DISCUSS] FLINK-35571: ProfilingServiceTest improper test isolation

2024-06-14 Thread Grace Grimwood
Hi devs,

I created a ticket FLINK-35571 - ProfilingServiceTest.testRollingDeletion
intermittently fails due to improper test isolation[1] yesterday, and my
colleagues and I were hoping to get some feedback on how best for us to fix
it. We'd like to raise a PR to fix it, just want to be sure we understand
what we're doing and aren't breaking anything in doing so.

To summarise the issue briefly: this test fails sometimes with too many
files in the tempDir used by the ProfilingService instance. This happens
because the instance retrieved by ProfilingServiceTest.setUp is sometimes
an old instance with different configuration that was created as part of
other tests outside this test class. There's more detail on the ticket, but
that's the gist.

My colleagues and I have been looking into this, and we've got a handful of
ideas for things we could change here.
In test code:
  1. Add a call to close any existing instances of ProfilingService in
ProfilingServiceTest.setUp before instantiating its own
  2. Modify ProfilingServiceTest.testRollingDeletion so that it is able to
handle being given a previously-used instance of ProfilingService and any
files it might leave laying around
  3. Modify
ProfilingServiceTest.testProfilingConfigurationWorkingAsExpected (or add
another test) so that it covers cases where there is an existing instance
with different config
  4. Add a close method to TestingMiniCluster which calls close on any
existing ProfilingService (prevents tests using TaskExecutor via
TestingMiniCluster from leaking unclosed instances of ProfilingService)

In application code:
  1. Add a check in ProfilingService.getInstance to ensure the instance
returned is correctly configured
  2. Make ProfilingService.close synchronize on ProfilingService.class to
ensure it cannot race with getInstance
  3. Modify ProfilingService so it is no longer a global singleton

We aren't familiar enough with these components of Flink to know what the
implications are for the issue or our suggested fixes. For example, we
don't think that allowing multiple instances of ProfilingService to exist
would cause issues with the application code (especially if AsyncProfiler
was kept as a singleton), but we don't know for certain because we're not
very familiar with how this all fits together. We would appreciate any
input anyone has on this.

[1] https://issues.apache.org/jira/browse/FLINK-35571

Thanks,
Grace

-- 

Grace Grimwood

She/They

Senior Software Engineer

Red Hat