alamb commented on PR #9450: URL: https://github.com/apache/arrow-rs/pull/9450#issuecomment-4085357250
For this PR I think we need an "End to end" test that shows the usecase that the CDC code is intended to solve For example, perhaps such a test can write two parquet files, with the same data except for some chosen rows in the middle , and verify that most of the pages are the same. It is not entirely clear to me how a "content addressable filesystem" works (aka how does it know where the parquet pages start/end) so having that documented / mocked out would also be nice -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
