scovich opened a new pull request, #8540:
URL: https://github.com/apache/arrow-rs/pull/8540

   # Which issue does this PR close?
   
   - Closes https://github.com/apache/arrow-rs/issues/8332
   
   # Rationale for this change
   
   Missing feature
   
   # What changes are included in this PR?
   
   Add decimal unshredding support, which _should_ have been straightforward 
except:
   1. The variant decimal types are not generic and do not implement any common 
trait that lets us generalize the logic easily. I added a custom trait in the 
unshredding module as a workaround, but we should probably look at something 
similar to arrow's `DecimalType` trait for `VariantDecimalXX` classes to 
implement.
   2. The parquet reader seems to have a bug that forces 32- and 64-bit decimal 
columns to Decimal128 unless the reader specifically requests a narrower type. 
I think I fixed that bug, but naturally several existing parquet unit tests 
started to fail. I don't know yet whether those tests are simply expecting 
buggy behavior, or if my fix is wrong/misguided/misplaced?
   
   <details>
   <summary>Test failures caused by the parquet decimal fix</summary>
   
   ```
   test 
arrow::array_reader::primitive_array::tests::test_primitive_array_reader_decimal_types
 ... FAILED
   test arrow::arrow_reader::tests::test_arbitrary_decimal ... FAILED
   test arrow::arrow_reader::tests::test_decimal ... FAILED
   test arrow::arrow_reader::tests::test_read_decimal_file ... FAILED
   test arrow::arrow_writer::tests::arrow_writer_decimal ... FAILED
   test arrow::arrow_writer::tests::arrow_writer_decimal128_dictionary ... 
FAILED
   test arrow::arrow_writer::tests::arrow_writer_decimal256_dictionary ... 
FAILED
   test arrow::arrow_writer::tests::arrow_writer_decimal64_dictionary ... FAILED
   test arrow::schema::tests::test_arrow_schema_roundtrip ... FAILED
   test arrow::schema::tests::test_column_desc_to_field ... FAILED
   test arrow::schema::tests::test_decimal_fields ... FAILED
   test statistics::test_decimal128 ... FAILED
   test statistics::test_decimal64 ... FAILED
   test statistics::test_decimal_256 ... FAILED
   test statistics::test_data_page_stats_with_all_null_page ... FAILED
   ```
   
   For example:
   ```
   ---- arrow::arrow_reader::tests::test_decimal stdout ----
   
   thread 'arrow::arrow_reader::tests::test_decimal' panicked at 
parquet/src/arrow/arrow_reader/mod.rs:4570:9:
   assertion `left == right` failed
     left: Schema { fields: [Field { name: "d1", data_type: Decimal64(9, 2) }, 
Field { name: "d2", data_type: Decimal64(10, 2) }, Field { name: "d3", 
data_type: Decimal64(18, 2) }], metadata: {} }
    right: Schema { fields: [Field { name: "d1", data_type: Decimal32(9, 2) }, 
Field { name: "d2", data_type: Decimal64(10, 2) }, Field { name: "d3", 
data_type: Decimal64(18, 2) }], metadata: {} }
   
   ---- arrow::arrow_reader::tests::test_read_decimal_file stdout ----
   
   thread 'arrow::arrow_reader::tests::test_read_decimal_file' panicked at 
parquet/src/arrow/arrow_reader/mod.rs:2101:81:
   called `Result::unwrap()` on an `Err` value: General("invalid data type for 
byte array reader - Decimal32(4, 2)")
   
   ---- arrow::arrow_writer::tests::arrow_writer_decimal stdout ----
   
   thread 'arrow::arrow_writer::tests::arrow_writer_decimal' panicked at 
parquet/src/arrow/arrow_writer/mod.rs:2326:9:
   assertion `left == right` failed
     left: Schema { fields: [Field { name: "a", data_type: Decimal128(5, 2) }], 
metadata: {} }
    right: Schema { fields: [Field { name: "a", data_type: Decimal32(5, 2) }], 
metadata: {} }
   
   ```
   or
   ```
   --- 
arrow::array_reader::primitive_array::tests::test_primitive_array_reader_decimal_types
 stdout ----
   
   thread 
'arrow::array_reader::primitive_array::tests::test_primitive_array_reader_decimal_types'
 panicked at parquet/src/arrow/array_reader/primitive_array.rs:916:13:
   assertion `left == right` failed
     left: Decimal32(8, 2)
    right: Decimal128(8, 2)
   ```
   or
   ```
   ---- arrow::arrow_reader::tests::test_arbitrary_decimal stdout ----
   
   thread 'arrow::arrow_reader::tests::test_arbitrary_decimal' panicked at 
parquet/src/arrow/arrow_reader/mod.rs:4473:9:
   assertion `left == right` failed
     left: RecordBatch { schema: Schema { fields: [Field { name: 
"decimal_values_19_0", data_type: Decimal128(19, 0) }, Field { name: 
"decimal_values_12_0", data_type: Decimal128(12, 0) }, Field { name: 
"decimal_values_17_10", data_type: Decimal128(17, 10) }], metadata: {} 
   }, columns: [PrimitiveArray<Decimal128(19, 0)>
   [
     1,
     2,
     3,
     4,
     5,
     6,
     7,
     8,
   ], PrimitiveArray<Decimal128(12, 0)>
   [
     1,
     2,
     3,
     4,
     5,
     6,
     7,
     8,
   ], PrimitiveArray<Decimal128(17, 10)>
   [
     1,
     2,
     3,
     4,
     5,
     6,
     7,
     8,
   ]], row_count: 8 }
    right: RecordBatch { schema: Schema { fields: [Field { name: 
"decimal_values_19_0", data_type: Decimal128(19, 0) }, Field { name: 
"decimal_values_12_0", data_type: Decimal64(12, 0) }, Field { name: 
"decimal_values_17_10", data_type: Decimal64(17, 10) }], metadata: {} }, 
columns: [PrimitiveArray<Decimal128(19, 0)>
   [
     1,
     2,
     3,
     4,
     5,
     6,
     7,
     8,
   ], PrimitiveArray<Decimal64(12, 0)>
   [
     1,
     2,
     3,
     4,
     5,
     6,
     7,
     8,
   ], PrimitiveArray<Decimal64(17, 10)>
   [
     1,
     2,
     3,
     4,
     5,
     6,
     7,
     8,
   ]], row_count: 8 }
   ```
   ```
   
   </details>
   
   # Are these changes tested?
   
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   4. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are 
they covered by existing tests)?
   
   # Are there any user-facing changes?
   
   If there are user-facing changes then we may require documentation to be 
updated before approving the PR.
   
   If there are any breaking changes to public APIs, please call them out.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to