scovich opened a new issue, #8549:
URL: https://github.com/apache/arrow-rs/issues/8549

   **Describe the bug**
   
   The parquet reader always widens `INT32 (Decimal)` and `INT64 (Decimal)` 
columns to `Decimal128`. It might be possible for a reader to specifically 
request the narrower types in their read schema, but if they just read what was 
written to disk they get back a different type than what was written.
   
   **To Reproduce**
   
   Several unit tests _expect_ the current behavior, and started failing when I 
changed the parquet reader to preserve types:
   ```
   test 
arrow::array_reader::primitive_array::tests::test_primitive_array_reader_decimal_types
 ... FAILED
   test arrow::arrow_reader::tests::test_decimal ... FAILED
   test arrow::arrow_reader::tests::test_arbitrary_decimal ... FAILED
   test arrow::arrow_reader::tests::test_read_decimal_file ... FAILED
   test arrow::arrow_writer::tests::arrow_writer_decimal128_dictionary ... 
FAILED
   test arrow::arrow_writer::tests::arrow_writer_decimal ... FAILED
   test arrow::arrow_writer::tests::arrow_writer_decimal256_dictionary ... 
FAILED
   test arrow::arrow_writer::tests::arrow_writer_decimal64_dictionary ... FAILED
   test arrow::schema::tests::test_arrow_schema_roundtrip ... FAILED
   test arrow::schema::tests::test_decimal_fields ... FAILED
   test arrow::schema::tests::test_column_desc_to_field ... FAILED
   test statistics::test_decimal128 ... FAILED
   test statistics::test_decimal_256 ... FAILED
   test statistics::test_decimal64 ... FAILED
   test statistics::test_data_page_stats_with_all_null_page ... FAILED
   ```
   
   **Expected behavior**
   
   If it was written as `INT32 (Decimal)` it should come back as `Decimal32` 
unless the user requested something else via the read schema. Ditto for `INT64 
(Decimal)` as `Decimal64`.
   
   **Additional context**
   
   See https://github.com/apache/arrow-rs/pull/8540#discussion_r2399703980
   
   The variant shredding decimal integration tests fail by default, because 
they expect 32- and 64-bit decimals to faithfully round trip through parquet; I 
had to add a manual casting operation `VariantArray` constructor to compensate.
   
   For the fix I tried (but eventually reverted in favor of the above-mentioned 
casting), see
   
https://github.com/apache/arrow-rs/pull/8540/files/cd978f5374503fff3a265bc06ad6ce143a4110f3..5b55f121d7461bd31dfc2d204f8fc303523a3988
   
   The idea was to split the `decimal_type` helper (which only understood 128- 
and 256-bit decimals) into `decimal_[32|64|128|256]_type` helpers, where each 
helper's name indicates a lower bound on the resulting decimal's width. 
Too-high precision can still force a wider decimal, tho it's not clear to me 
that this is actually correct -- is e.g. `INT32 (Decimal(15, 2))` legal?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to