It is necessary to test whether the state recovery of the connector is correct, the problem is how to implement the test logic of all engines uniformly.
Best, Zongwen Li JUN GAO <[email protected]> 于2022年9月20日周二 13:37写道: > Description > > We need support data consistency test in connector v2 e2e. I have some idea > about it and welcome everyone to discuss. > Test Sink ConnectorFake Source Connector > > If we want to test the data consistency of a sink connector, We can > use the Fake > Source connector. The Fake Source Connector support define row numbers and > Primary key fields in the feature. Defile Primary key fields is useful to > test exactly-once sink which implement exactly-once by Idempotent write > data. If we can simulate task failure and then restore task, We can > complete the data consistency test. > How to simulate task failure and restore task. > > I think we can use Fake Source connector to simulate task failure too. We > can add some active triggering failure function in Fake Source. To ensure > Fake Source can support read playback, the Fake Source need support > snapshot too. > How to check data > > We can check the rows that wrote in sink. > Test Source ConnectorTest JDBC Sink Connector > > If we want to test the data consistency of a source connector, We can > add a Test > JDBC Sink connector. It need support exactly-once. > How to simulate task failure and restore task. > > We can add some active triggering failure function in Test JDBC Sink > connector. > How to check data > > There are two ways to do it. > > First one: After the job xxxSource -> TestJDBCSink finished, we can > automatically create a job JDBCSource -> AssertSink and use AssertSink to > check data. > > shortcoming This way need run two jobs. > > advantage This way can do test standardization, people only need config the > check rules in AssertSink connector. > > The second one is add a java program to check data in MySQL/PG. > > shortcoming This way can not do test standardization, every source > connector e2e need add the check program and define the check rules > themselves. > > advantage only need run one job. > > -- > > Best Regards > > ------------ > > EricJoy2048 > [email protected] >
