[
https://issues.apache.org/jira/browse/PARQUET-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shreyas B updated PARQUET-2420:
-------------------------------
Description:
The current implementation of Parquet serialisation from Thrift Definitions
results in the incorrect conversion of Thrift byte fields into INT32 without
preserving the required LogicalType Metadata in the Parquet file. This
behaviour leads to a loss of information and is inconsistent with the expected
behaviour. The correct conversion should result in INT32 with LogicalType
metadata indicating a bit width of 8 and signed as true.
Thrift Definition
```
struct TestLogicalType {
1: required i16 test_i16,
2: required byte test_i8
}
```
Current Parquet Schema
```
message ParquetSchema {
required int32 test_i16 (INTEGER(16,true)) = 1;
required int32 test_i8 = 2;
}
```
Expected Parquet Schema
```
message ParquetSchema {
required int32 test_i16 (INTEGER(16,true)) = 1;
required int32 test_i8 (INTEGER(8,true)) = 2;
}
```
was:The current implementation of Parquet serialisation from Thrift
Definitions results in the incorrect conversion of Thrift byte fields into
INT32 without preserving the required LogicalType Metadata in the Parquet file.
This behaviour leads to a loss of information and is inconsistent with the
expected behaviour. The correct conversion should result in INT32 with
LogicalType metadata indicating a bit width of 8 and signed as true.
> ThriftParquetWriter converts thrift byte to int32 without adding logical type
> ------------------------------------------------------------------------------
>
> Key: PARQUET-2420
> URL: https://issues.apache.org/jira/browse/PARQUET-2420
> Project: Parquet
> Issue Type: Bug
> Components: parquet-thrift
> Reporter: Shreyas B
> Priority: Major
>
> The current implementation of Parquet serialisation from Thrift Definitions
> results in the incorrect conversion of Thrift byte fields into INT32 without
> preserving the required LogicalType Metadata in the Parquet file. This
> behaviour leads to a loss of information and is inconsistent with the
> expected behaviour. The correct conversion should result in INT32 with
> LogicalType metadata indicating a bit width of 8 and signed as true.
>
> Thrift Definition
> ```
> struct TestLogicalType {
> 1: required i16 test_i16,
> 2: required byte test_i8
> }
> ```
> Current Parquet Schema
> ```
> message ParquetSchema {
> required int32 test_i16 (INTEGER(16,true)) = 1;
> required int32 test_i8 = 2;
> }
> ```
>
> Expected Parquet Schema
> ```
> message ParquetSchema {
> required int32 test_i16 (INTEGER(16,true)) = 1;
> required int32 test_i8 (INTEGER(8,true)) = 2;
> }
> ```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]