[jira] [Commented] (KUDU-3662) Flink based continuous replication

ASF subversion and git services (Jira) Wed, 04 Jun 2025 09:55:06 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17956158#comment-17956158
 ]


ASF subversion and git services commented on KUDU-3662:
-------------------------------------------------------

Commit d1e316417c079af6e35140f44a83292080b6e63f in kudu's branch 
refs/heads/master from Zoltan Chovan
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=d1e316417 ]

KUDU-3662 [2/n] Add job config parsing

This patch introduces a configuration model and CLI parser for job-level
replication settings.
 - Introduced ReplicationJobConfig to encapsulate parameters like
source/sink Kudu master addresses, table name, etc.
- Added ReplicationConfigParser with a static parseJobConfig method
for extracting job.* parameters from ParameterTool.
- CLI argument parsing follows a naming convention with job., reader.,
and writer.* prefixes. This patch adds support for job.* only.
- The parsed job config is passed to ReplicationEnvProvider and will be
used in follow-up patches.
- Added unit tests for config parsing in TestReplicationConfigParser.
- Added AssertJ as a test dependency for more expressive assertions.

Note: added a temporary exclusion rule to suppress "URF_UNREAD_FIELD"
SpotBugs error, since the jobConfig field will be used in following
commits, this exclusion should be removed when no longer relevant.

Change-Id: Ic7229e11baa6a03c8986f206f456725acda00774
Reviewed-on: http://gerrit.cloudera.org:8080/22902
Tested-by: Marton Greber <[email protected]>
Reviewed-by: Marton Greber <[email protected]>
Reviewed-by: Abhishek Chennaka <[email protected]>


> Flink based continuous replication
> ----------------------------------
>
>                 Key: KUDU-3662
>                 URL: https://issues.apache.org/jira/browse/KUDU-3662
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: Marton Greber
>            Priority: Major
>
> Goal:
> Implement a Flink job that continuously reads from one Kudu cluster and 
> writes to a sink Kudu cluster. 
> Prerequisites:
> Previously there existed only a Flink Kudu sink connector. With the release 
> of flink-connector-kudu 2.0 we developed a source connector that has a 
> continuous unbounded mode, that utilises diffscan to read from Kudu. 
> (https://github.com/apache/flink-connector-kudu/pull/8)
> The above prerequisite is now available.
> Design:
> A high level design doc has been already sent out to the mailing list:
> https://docs.google.com/document/d/1oaAn_cOY7aKth0C6MbNXgKU3R-PYols-V4got-_gpDk/edit?usp=sharing
> Development:
> - The Flink based Kudu replication job would live in the Kudu java project. 
> (similar how the backup and restore Spark job is) 
> - We need to create a Flink job that utilises the Flink Kudu source and sink 
> implementations.
> - Provide CLI interface to be able to pipe down all the necessary reader and 
> writer configs. 
> - Create a table initialiser that can re-create the source table schema and 
> partitioning schema on the sink cluster, if desired. (this is a convenience 
> feature, the provide easier setup)
> - The Flink Kudu source is at this time missing metrics. In order to avoid 
> waiting another release cycle, we can just create a wrapped source inside our 
> project. We will contribute those metrics back to the flink-connector-kudu 
> repo, and then we can just remove this intermediary logic from the job.
> - Write unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KUDU-3662) Flink based continuous replication

Reply via email to