scovich commented on issue #8420:
URL: https://github.com/apache/arrow-rs/issues/8420#issuecomment-3349303723

   > The arrow variant extension type docs don't seem to mandate (or prevent) 
using a canonical arrow extension type
   
   That's arguably a bug/oversight.
   
   The [variant shredding 
spec](https://github.com/apache/parquet-format/blob/master/VariantShredding.md#shredded-value-types)
 mandates that variant type `uuid` should have parquet physical type 
`FIXED_LEN_BYTE_ARRAY[len=16]` and parquet logical type `UUID`.
   
   We have three shredding integration test cases that work with UUID values: 
37, 81, 123. Of those, case 37 is the most relevant because it actually shreds 
as UUID (rather than storing a UUID value in an unshredded or wrong-shredded 
column): 
   ```json
   {
     "case_number" : 37,
     "test" : "testShreddedVariantPrimitives",
     "parquet_file" : "case-037.parquet",
     "variant_file" : "case-037_row-0.variant.bin",
     "variant" : "Variant(metadata=VariantMetadata(dict={}), 
value=Variant(type=UUID, value=f24f9b64-81fa-49d1-b74e-8c09a6e31c56))"
   }, {
     "case_number" : 81,
     "test" : "testUnshreddedVariants",
     "parquet_file" : "case-081.parquet",
     "variant_file" : "case-081_row-0.variant.bin",
     "variant" : "Variant(metadata=VariantMetadata(dict={}), 
value=Variant(type=UUID, value=f24f9b64-81fa-49d1-b74e-8c09a6e31c56))"
   }, {
     "case_number" : 123,
     "test" : "testUnshreddedVariantsWithShreddedSchema",
     "parquet_file" : "case-123.parquet",
     "variant_file" : "case-123_row-0.variant.bin",
     "variant" : "Variant(metadata=VariantMetadata(dict={}), 
value=Variant(type=UUID, value=f24f9b64-81fa-49d1-b74e-8c09a6e31c56))"
   }
   ```
   The corresponding parquet files have the following schemas:
   
   Case 37 (shredded as UUID):
   
   ```
   message table {
     required int32 id = 1;
     optional group var (VARIANT(1)) = 2 {
       required binary metadata;
       optional binary value;
       optional fixed_len_byte_array(16) typed_value (UUID);
     }
   }
   ```
   
   <details>
   <summary>case 81 (unshredded) </summary>
   
   ```
   message table {
     required int32 id = 1;
     required group var (VARIANT(1)) = 2 {
       required binary metadata;
       required binary value;
     }
   }
   ```
   
   </details>
   
   <details>
   <summary>case 123 (shredded as string) </summary>
   
   ```
   message table {
     required int32 id = 1;
     optional group var (VARIANT(1)) = 2 {
       required binary metadata;
       optional binary value;
       optional binary typed_value (STRING);
     }
   }
   ```
   
   </details>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to