scovich commented on issue #8549:
URL: https://github.com/apache/arrow-rs/issues/8549#issuecomment-3367125440

   > FWIW I would expect a Decimal32Array, i.e. the arrow version, to use the 
embedded arrow schema to roundtrip. But I suspect you are referring to 
roundtripping outside arrow.
   
   You're right, the parquet files used by variant shredding integration are 
written by parquet-mr, not arrow:
   ```
   File path:  parquet-testing/shredded_variant/case-025.parquet
   Created by: parquet-mr version 1.16.0-SNAPSHOT (build 
ee34713e4d906d61f95d2b09145945638b2e2296)
   Properties:
     parquet.avro.schema: 
{"type":"record","name":"table","fields":[{"name":"id","type":"int"},{"name":"var","type":["null",{"type":"record","name":"var","fields":[{"name":"metadata","type":"bytes"},{"name":"value","type":["null","bytes"],"default":null},{"name":"typed_value","type":["null","int"],"default":null}]}],"default":null}]}
       writer.model.name: avro
   Schema:
   message table {
     required int32 id = 1;
     optional group var (VARIANT(1)) = 2 {
       required binary metadata;
       optional binary value;
       optional int32 typed_value (DECIMAL(9,4));
     }
   }
   
   
   Row group 0:  count: 1  127.00 B records  start: 4  total(compressed): 127 B 
total(uncompressed):127 B 
   
--------------------------------------------------------------------------------
                    type      encodings count     avg size   nulls   min / max
   id               INT32     _   _     1         27.00 B    0       "1" / "1"
   var.metadata     BINARY    _   _     1         36.00 B    0       "0x010000" 
/ "0x010000"
   var.value        BINARY    _   _     1         30.00 B    1       
   var.typed_value  INT32     _   _     1         34.00 B    0       
"-12345.6789" / "-12345.6789"
   ```
   
   ... tho even lacking arrow info, it seems like an `INT32 (Decimal(9, 4))` 
column really is decimal32? 
   
   > IIRC there may even be some smarts on the write path to use smaller 
representations when writing Decimal128Array based on the precision, which 
would further complicate any changes here.
   
   You mean, the parquet writer might emit an `INT32 (Decimal(9, 4))` column 
when asked to write out a `Decimal128(9, 4)` column? Or it might emit an `INT32 
(Decimal(30, 4))` column if it notices that the values all happen to fit?
   
   Either way, I thought the parquet writer's job was to emit what it was told 
to emit, not get fancy with data types and automatic conversions?? That came up 
_several_ times in variant shredding discussions...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to