alamb commented on PR #9450:
URL: https://github.com/apache/arrow-rs/pull/9450#issuecomment-4085342902

   I think there is something wrong with the tests in this PR.
   
   Specifically, I did an ablation study (what a fancy word!) to verify the 
test coverage in this PR. 
   
   I  disconnected all the CDC code from the writer like this
   - https://github.com/kszucs/arrow-rs/pull/1
   
   Here is how I ran the tests
   
   ```shell
   cargo test -p parquet --features=arrow -- cdc
   ```
   
   Almost all of them still pass (only 3 fail). which suggests they aren't 
actually testing the CDC code
   ```
   running 35 tests
   test column::chunker::cdc::arrow_tests::test_cdc_find_differences ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_empty_table ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_f64_column ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_array_offsets ... FAILED
   test column::chunker::cdc::arrow_tests::test_cdc_prepend ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_append ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_insert_once ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_delete_once ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_delete_twice ... FAILED
   test column::chunker::cdc::arrow_tests::test_cdc_insert_twice ... FAILED
   test column::chunker::cdc::arrow_tests::test_cdc_multiple_row_groups_insert 
... ok
   test column::chunker::cdc::arrow_tests::test_cdc_roundtrip_dictionary ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_multiple_row_groups_append 
... ok
   test column::chunker::cdc::arrow_tests::test_cdc_roundtrip_large_binary ... 
ok
   test column::chunker::cdc::arrow_tests::test_cdc_deterministic ... ok
   test column::chunker::cdc::tests::test_calculate_mask_defaults ... ok
   test column::chunker::cdc::tests::test_calculate_mask_invalid ... ok
   test column::chunker::cdc::tests::test_calculate_mask_with_norm_level ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_roundtrip_string ... ok
   test column::chunker::cdc::tests::test_non_nested_non_null_single_chunk ... 
ok
   test column::chunker::cdc::tests::test_nullable_non_nested ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_roundtrip_nullable ... ok
   test column::chunker::cdc::tests::test_deterministic_chunks ... ok
   test column::chunker::cdc::tests::test_max_chunk_size_forces_boundary ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_roundtrip_i32 ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_roundtrip_list ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_update_once ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_nullable_column ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_string_column ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_update_twice ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_array_offsets_direct ... ok
   test column::chunker::cdc::arrow_tests::test_cdc_roundtrip_multiple_columns 
... ok
   test column::chunker::cdc::arrow_tests::test_cdc_produces_multiple_pages ... 
ok
   test column::chunker::cdc::arrow_tests::test_cdc_page_reuse_on_append ... ok
   test 
column::chunker::cdc::arrow_tests::test_cdc_state_persists_across_row_groups 
... ok
   
   failures:
   
   ---- column::chunker::cdc::arrow_tests::test_cdc_array_offsets stdout ----
   
   thread 'column::chunker::cdc::arrow_tests::test_cdc_array_offsets' (9668022) 
panicked at parquet/src/column/chunker/cdc.rs:1668:9:
   assertion `left == right` failed: sliced first page should be 10 values 
shorter: full=[20480, 12288] sliced=[20480, 12278]
     left: 0
    right: 10
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   
   ---- column::chunker::cdc::arrow_tests::test_cdc_delete_twice stdout ----
   
   thread 'column::chunker::cdc::arrow_tests::test_cdc_delete_twice' (9668025) 
panicked at parquet/src/column/chunker/cdc.rs:1574:9:
   assertion `left == right` failed: Expected 2 diffs for double delete, got 
[([16448], [16384])]
     left: 1
    right: 2
   
   ---- column::chunker::cdc::arrow_tests::test_cdc_insert_twice stdout ----
   
   thread 'column::chunker::cdc::arrow_tests::test_cdc_insert_twice' (9668031) 
panicked at parquet/src/column/chunker/cdc.rs:1613:9:
   assertion `left == right` failed: Expected 2 diffs for double insert, got 
[([16384], [16448])]
     left: 1
    right: 2
   
   
   failures:
       column::chunker::cdc::arrow_tests::test_cdc_array_offsets
       column::chunker::cdc::arrow_tests::test_cdc_delete_twice
       column::chunker::cdc::arrow_tests::test_cdc_insert_twice
   
   test result: FAILED. 32 passed; 3 failed; 0 ignored; 0 measured; 885 
filtered out; finished in 0.66s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to