alamb opened a new pull request, #85: URL: https://github.com/apache/parquet-testing/pull/85
- Closes https://github.com/apache/parquet-testing/issues/82 As @mapleFU pointed out, the binary vale of `primtive_int64` actually contains an int32 as it appears Spark truncates variant values to the smallest type that will fit it # Changes 1. Update the regeneration script 2. Rerun the script 3. Check in the results I also manually verified the output binary is correct: ```shell $ xxd primitive_int64.value 00000000: 1815 81e9 7df4 1022 11 ....}..". ``` we see the first byte is `0x18` the first byte `0x18` is `0b00011000` * low 2 bits are `0b00` => ` Primitive type` * high 6 bits are `0b000110` ==> 6 Per the encoding grammar for [Variant basic types](https://github.com/apache/parquet-format/blob/master/VariantEncoding.md#encoding-types)) table, primitive_type = `6` corresponds to `int64`: Exact Numeric | int8 | 3 | INT(8, signed) | 1 byte -- | -- | -- | -- | -- Exact Numeric | int16 | 4 | INT(16, signed) | 2 byte little-endian Exact Numeric | int32 | 5 | INT(32, signed) | 4 byte little-endian Exact Numeric | int64 | 6 | INT(64, signed) | 8 byte little-endian -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
