Description We need support data consistency test in connector v2 e2e. I have some idea about it and welcome everyone to discuss. Test Sink ConnectorFake Source Connector
If we want to test the data consistency of a sink connector, We can use the Fake Source connector. The Fake Source Connector support define row numbers and Primary key fields in the feature. Defile Primary key fields is useful to test exactly-once sink which implement exactly-once by Idempotent write data. If we can simulate task failure and then restore task, We can complete the data consistency test. How to simulate task failure and restore task. I think we can use Fake Source connector to simulate task failure too. We can add some active triggering failure function in Fake Source. To ensure Fake Source can support read playback, the Fake Source need support snapshot too. How to check data We can check the rows that wrote in sink. Test Source ConnectorTest JDBC Sink Connector If we want to test the data consistency of a source connector, We can add a Test JDBC Sink connector. It need support exactly-once. How to simulate task failure and restore task. We can add some active triggering failure function in Test JDBC Sink connector. How to check data There are two ways to do it. First one: After the job xxxSource -> TestJDBCSink finished, we can automatically create a job JDBCSource -> AssertSink and use AssertSink to check data. shortcoming This way need run two jobs. advantage This way can do test standardization, people only need config the check rules in AssertSink connector. The second one is add a java program to check data in MySQL/PG. shortcoming This way can not do test standardization, every source connector e2e need add the check program and define the check rules themselves. advantage only need run one job. -- Best Regards ------------ EricJoy2048 [email protected]
