alamb opened a new pull request, #85:
URL: https://github.com/apache/parquet-testing/pull/85

   - Closes https://github.com/apache/parquet-testing/issues/82
   
   As @mapleFU pointed out, the binary vale of `primtive_int64` actually 
contains an int32 as it appears Spark truncates variant values to the smallest 
type that will fit it
   
   # Changes
   1. Update the regeneration script
   2. Rerun the script
   3. Check in the results
   
   I also manually verified the output binary is correct:
   
   ```shell
   $ xxd primitive_int64.value
   00000000: 1815 81e9 7df4 1022 11                   ....}..".
   ```
   
   we see the first byte is `0x18`
   
   the first byte `0x18` is `0b00011000` 
   * low 2 bits are `0b00` => ` Primitive type`
   * high 6 bits are `0b000110` ==> 6
   
   Per the encoding grammar for [Variant basic 
types](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#encoding-types))
 table, primitive_type = `6` corresponds to `int64`:
   
   
   Exact Numeric | int8 | 3 | INT(8, signed) | 1 byte
   -- | -- | -- | -- | --
   Exact Numeric | int16 | 4 | INT(16, signed) | 2 byte little-endian
   Exact Numeric | int32 | 5 | INT(32, signed) | 4 byte little-endian
   Exact Numeric | int64 | 6 | INT(64, signed) | 8 byte little-endian
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to