Re: [PROPOSAL] Contribute Flink CDC Connectors project to Apache Flink

2023-12-10 Thread Xin Gong


good news.

+1

Best,
gongxin
On 2023/12/07 03:24:59 Leonard Xu wrote:
> Dear Flink devs,
> 
> As you may have heard, we at Alibaba (Ververica) are planning to donate CDC 
> Connectors for the Apache Flink project[1] to the Apache Flink community.
> 
> CDC Connectors for Apache Flink comprise a collection of source connectors 
> designed specifically for Apache Flink. These connectors[2] enable the 
> ingestion of changes from various databases using Change Data Capture (CDC), 
> most of these CDC connectors are powered by Debezium[3]. They support both 
> the DataStream API and the Table/SQL API, facilitating the reading of 
> database snapshots and continuous reading of transaction logs with 
> exactly-once processing, even in the event of failures.
> 
> 
> Additionally, in the latest version 3.0, we have introduced many long-awaited 
> features. Starting from CDC version 3.0, we've built a Streaming ELT 
> Framework available for streaming data integration. This framework allows 
> users to write their data synchronization logic in a simple YAML file, which 
> will automatically be translated into a Flink DataStreaming job. It 
> emphasizes optimizing the task submission process and offers advanced 
> functionalities such as whole database synchronization, merging sharded 
> tables, and schema evolution[4].
> 
> 
> I believe this initiative is a perfect match for both sides. For the Flink 
> community, it presents an opportunity to enhance Flink's competitive 
> advantage in streaming data integration, promoting the healthy growth and 
> prosperity of the Apache Flink ecosystem. For the CDC Connectors project, 
> becoming a sub-project of Apache Flink means being part of a neutral 
> open-source community, which can attract a more diverse pool of contributors.
> 
> Please note that the aforementioned points represent only some of our 
> motivations and vision for this donation. Specific future operations need to 
> be further discussed in this thread. For example, the sub-project name after 
> the donation; we hope to name it Flink-CDC aiming to streaming data 
> intergration through Apache Flink, following the naming convention of 
> Flink-ML; And this project is managed by a total of 8 maintainers, including 
> 3 Flink PMC members and 1 Flink Committer. The remaining 4 maintainers are 
> also highly active contributors to the Flink community, donating this project 
> to the Flink community implies that their permissions might be reduced. 
> Therefore, we may need to bring up this topic for further discussion within 
> the Flink PMC. Additionally, we need to discuss how to migrate existing users 
> and documents. We have a user group of nearly 10,000 people and a 
> multi-version documentation site need to migrate. We also need to plan for 
> the migration of CI/CD processes and other specifics. 
> 
> 
> While there are many intricate details that require implementation, we are 
> committed to progressing and finalizing this donation process.
> 
> 
> Despite being Flink’s most active ecological project (as evaluated by GitHub 
> metrics), it also boasts a significant user base. However, I believe it's 
> essential to commence discussions on future operations only after the 
> community reaches a consensus on whether they desire this donation.
> 
> 
> Really looking forward to hear what you think! 
> 
> 
> Best,
> Leonard (on behalf of the Flink CDC Connectors project maintainers)
> 
> [1] https://github.com/ververica/flink-cdc-connectors
> [2] 
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-connectors.html
> [3] https://debezium.io
> [4] 
> https://ververica.github.io/flink-cdc-connectors/master/content/overview/cdc-pipeline.html


[jira] [Created] (FLINK-34908) mysql pipeline to doris and starrocks will lost precision for timestamp

2024-03-21 Thread Xin Gong (Jira)
Xin Gong created FLINK-34908:


 Summary: mysql pipeline to doris and starrocks will lost precision 
for timestamp
 Key: FLINK-34908
 URL: https://issues.apache.org/jira/browse/FLINK-34908
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Xin Gong
 Fix For: cdc-3.1.0


flink cdc pipeline will decide timestamp zone by config of pipeline. I found 
mysql2doris and mysql2starracks will specific datetime format

-MM-dd HH:mm:ss, it will cause lost datatime precision. I think we can 
don't specific datetime format, just return LocalDateTime object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34990) [feature][cdc-connector][oracle] Oracle cdc support newly add table

2024-04-02 Thread Xin Gong (Jira)
Xin Gong created FLINK-34990:


 Summary: [feature][cdc-connector][oracle] Oracle cdc support newly 
add table
 Key: FLINK-34990
 URL: https://issues.apache.org/jira/browse/FLINK-34990
 Project: Flink
  Issue Type: New Feature
  Components: Flink CDC
Reporter: Xin Gong
 Fix For: cdc-3.1.0


[feature][cdc-connector][oracle] Oracle cdc support newly add table



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35151) Flink mysql cdc will stuck when suspend binlog split and ChangeEventQueue is full

2024-04-17 Thread Xin Gong (Jira)
Xin Gong created FLINK-35151:


 Summary: Flink mysql cdc will  stuck when suspend binlog split and 
ChangeEventQueue is full
 Key: FLINK-35151
 URL: https://issues.apache.org/jira/browse/FLINK-35151
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
 Environment: I use master branch reproduce it.

Reason is that producing binlog is too fast.  
MySqlSplitReader#suspendBinlogReaderIfNeed will execute 
BinlogSplitReader#stopBinlogReadTask to set 

currentTaskRunning to be false after MysqSourceReader receives binlog split 
update event.

MySqlSplitReader#pollSplitRecords is executed and 

dataIt is null to execute closeBinlogReader when currentReader is 
BinlogSplitReader. closeBinlogReader will execute 
statefulTaskContext.getBinaryLogClient().disconnect(), it could dead lock. 
Because BinaryLogClient#connectLock is not release  when 
MySqlStreamingChangeEventSource add element to full queue.
Reporter: Xin Gong
 Attachments: dumpstack.txt

Flink mysql cdc will  stuck when suspend binlog split and ChangeEventQueue is 
full.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35274) Occasional failure issue with Flink CDC Db2 UT

2024-04-30 Thread Xin Gong (Jira)
Xin Gong created FLINK-35274:


 Summary: Occasional failure issue with Flink CDC Db2 UT
 Key: FLINK-35274
 URL: https://issues.apache.org/jira/browse/FLINK-35274
 Project: Flink
  Issue Type: Bug
Reporter: Xin Gong


Occasional failure issue with Flink CDC Db2 UT. Because db2 redolog data 
tableId don't have database name, it will cause table schame occasional not 
found when task exception restart. I will fix it by supplement database name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34715) Fix mysql ut about closing BinlogSplitReader

2024-03-18 Thread Xin Gong (Jira)
Xin Gong created FLINK-34715:


 Summary: Fix mysql ut about closing BinlogSplitReader
 Key: FLINK-34715
 URL: https://issues.apache.org/jira/browse/FLINK-34715
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Xin Gong
 Fix For: cdc-3.1.0


BinlogSplitReaderTest#readBinlogSplitsFromSnapshotSplits should test binlog 
reader is closed after binlog reader close. But code always test snapshot split 
reader is closed.

```java

binlogReader.close();

assertNotNull(snapshotSplitReader.getExecutorService());
assertTrue(snapshotSplitReader.getExecutorService().isTerminated());

```

We shoud change code to 

```java

binlogReader.close();

assertNotNull(binlogReader.getExecutorService());
assertTrue(binlogReader.getExecutorService().isTerminated());

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)