Description

We need support data consistency test in connector v2 e2e. I have some idea
about it and welcome everyone to discuss.
Test Sink ConnectorFake Source Connector

If we want to test the data consistency of a sink connector, We can
use the Fake
Source connector. The Fake Source Connector support define row numbers and
Primary key fields in the feature. Defile Primary key fields is useful to
test exactly-once sink which implement exactly-once by Idempotent write
data. If we can simulate task failure and then restore task, We can
complete the data consistency test.
How to simulate task failure and restore task.

I think we can use Fake Source connector to simulate task failure too. We
can add some active triggering failure function in Fake Source. To ensure
Fake Source can support read playback, the Fake Source need support
snapshot too.
How to check data

We can check the rows that wrote in sink.
Test Source ConnectorTest JDBC Sink Connector

If we want to test the data consistency of a source connector, We can
add a Test
JDBC Sink connector. It need support exactly-once.
How to simulate task failure and restore task.

We can add some active triggering failure function in Test JDBC Sink
 connector.
How to check data

There are two ways to do it.

First one: After the job xxxSource -> TestJDBCSink finished, we can
automatically create a job JDBCSource -> AssertSink and use AssertSink to
check data.

shortcoming This way need run two jobs.

advantage This way can do test standardization, people only need config the
check rules in AssertSink connector.

The second one is add a java program to check data in MySQL/PG.

shortcoming This way can not do test standardization, every source
connector e2e need add the check program and define the check rules
themselves.

advantage only need run one job.

-- 

Best Regards

------------

EricJoy2048
[email protected]

Reply via email to