It is necessary to test whether the state recovery of the connector is
correct, the problem is how to implement the test logic of all engines
uniformly.

Best,
Zongwen Li

JUN GAO <[email protected]> 于2022年9月20日周二 13:37写道:

> Description
>
> We need support data consistency test in connector v2 e2e. I have some idea
> about it and welcome everyone to discuss.
> Test Sink ConnectorFake Source Connector
>
> If we want to test the data consistency of a sink connector, We can
> use the Fake
> Source connector. The Fake Source Connector support define row numbers and
> Primary key fields in the feature. Defile Primary key fields is useful to
> test exactly-once sink which implement exactly-once by Idempotent write
> data. If we can simulate task failure and then restore task, We can
> complete the data consistency test.
> How to simulate task failure and restore task.
>
> I think we can use Fake Source connector to simulate task failure too. We
> can add some active triggering failure function in Fake Source. To ensure
> Fake Source can support read playback, the Fake Source need support
> snapshot too.
> How to check data
>
> We can check the rows that wrote in sink.
> Test Source ConnectorTest JDBC Sink Connector
>
> If we want to test the data consistency of a source connector, We can
> add a Test
> JDBC Sink connector. It need support exactly-once.
> How to simulate task failure and restore task.
>
> We can add some active triggering failure function in Test JDBC Sink
>  connector.
> How to check data
>
> There are two ways to do it.
>
> First one: After the job xxxSource -> TestJDBCSink finished, we can
> automatically create a job JDBCSource -> AssertSink and use AssertSink to
> check data.
>
> shortcoming This way need run two jobs.
>
> advantage This way can do test standardization, people only need config the
> check rules in AssertSink connector.
>
> The second one is add a java program to check data in MySQL/PG.
>
> shortcoming This way can not do test standardization, every source
> connector e2e need add the check program and define the check rules
> themselves.
>
> advantage only need run one job.
>
> --
>
> Best Regards
>
> ------------
>
> EricJoy2048
> [email protected]
>

Reply via email to