alamb commented on PR #9450: URL: https://github.com/apache/arrow-rs/pull/9450#issuecomment-4085342902
I think there is something wrong with the tests in this PR. Specifically, I did an ablation study (what a fancy word!) to verify the test coverage in this PR. I disconnected all the CDC code from the writer like this - https://github.com/kszucs/arrow-rs/pull/1 Here is how I ran the tests ```shell cargo test -p parquet --features=arrow -- cdc ``` Almost all of them still pass (only 3 fail). which suggests they aren't actually testing the CDC code ``` running 35 tests test column::chunker::cdc::arrow_tests::test_cdc_find_differences ... ok test column::chunker::cdc::arrow_tests::test_cdc_empty_table ... ok test column::chunker::cdc::arrow_tests::test_cdc_f64_column ... ok test column::chunker::cdc::arrow_tests::test_cdc_array_offsets ... FAILED test column::chunker::cdc::arrow_tests::test_cdc_prepend ... ok test column::chunker::cdc::arrow_tests::test_cdc_append ... ok test column::chunker::cdc::arrow_tests::test_cdc_insert_once ... ok test column::chunker::cdc::arrow_tests::test_cdc_delete_once ... ok test column::chunker::cdc::arrow_tests::test_cdc_delete_twice ... FAILED test column::chunker::cdc::arrow_tests::test_cdc_insert_twice ... FAILED test column::chunker::cdc::arrow_tests::test_cdc_multiple_row_groups_insert ... ok test column::chunker::cdc::arrow_tests::test_cdc_roundtrip_dictionary ... ok test column::chunker::cdc::arrow_tests::test_cdc_multiple_row_groups_append ... ok test column::chunker::cdc::arrow_tests::test_cdc_roundtrip_large_binary ... ok test column::chunker::cdc::arrow_tests::test_cdc_deterministic ... ok test column::chunker::cdc::tests::test_calculate_mask_defaults ... ok test column::chunker::cdc::tests::test_calculate_mask_invalid ... ok test column::chunker::cdc::tests::test_calculate_mask_with_norm_level ... ok test column::chunker::cdc::arrow_tests::test_cdc_roundtrip_string ... ok test column::chunker::cdc::tests::test_non_nested_non_null_single_chunk ... ok test column::chunker::cdc::tests::test_nullable_non_nested ... ok test column::chunker::cdc::arrow_tests::test_cdc_roundtrip_nullable ... ok test column::chunker::cdc::tests::test_deterministic_chunks ... ok test column::chunker::cdc::tests::test_max_chunk_size_forces_boundary ... ok test column::chunker::cdc::arrow_tests::test_cdc_roundtrip_i32 ... ok test column::chunker::cdc::arrow_tests::test_cdc_roundtrip_list ... ok test column::chunker::cdc::arrow_tests::test_cdc_update_once ... ok test column::chunker::cdc::arrow_tests::test_cdc_nullable_column ... ok test column::chunker::cdc::arrow_tests::test_cdc_string_column ... ok test column::chunker::cdc::arrow_tests::test_cdc_update_twice ... ok test column::chunker::cdc::arrow_tests::test_cdc_array_offsets_direct ... ok test column::chunker::cdc::arrow_tests::test_cdc_roundtrip_multiple_columns ... ok test column::chunker::cdc::arrow_tests::test_cdc_produces_multiple_pages ... ok test column::chunker::cdc::arrow_tests::test_cdc_page_reuse_on_append ... ok test column::chunker::cdc::arrow_tests::test_cdc_state_persists_across_row_groups ... ok failures: ---- column::chunker::cdc::arrow_tests::test_cdc_array_offsets stdout ---- thread 'column::chunker::cdc::arrow_tests::test_cdc_array_offsets' (9668022) panicked at parquet/src/column/chunker/cdc.rs:1668:9: assertion `left == right` failed: sliced first page should be 10 values shorter: full=[20480, 12288] sliced=[20480, 12278] left: 0 right: 10 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ---- column::chunker::cdc::arrow_tests::test_cdc_delete_twice stdout ---- thread 'column::chunker::cdc::arrow_tests::test_cdc_delete_twice' (9668025) panicked at parquet/src/column/chunker/cdc.rs:1574:9: assertion `left == right` failed: Expected 2 diffs for double delete, got [([16448], [16384])] left: 1 right: 2 ---- column::chunker::cdc::arrow_tests::test_cdc_insert_twice stdout ---- thread 'column::chunker::cdc::arrow_tests::test_cdc_insert_twice' (9668031) panicked at parquet/src/column/chunker/cdc.rs:1613:9: assertion `left == right` failed: Expected 2 diffs for double insert, got [([16384], [16448])] left: 1 right: 2 failures: column::chunker::cdc::arrow_tests::test_cdc_array_offsets column::chunker::cdc::arrow_tests::test_cdc_delete_twice column::chunker::cdc::arrow_tests::test_cdc_insert_twice test result: FAILED. 32 passed; 3 failed; 0 ignored; 0 measured; 885 filtered out; finished in 0.66s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
