[ 
https://issues.apache.org/jira/browse/PARQUET-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreyas B updated PARQUET-2420:
-------------------------------
    Description: 
The current implementation of Parquet serialisation from Thrift Definitions 
results in the incorrect conversion of Thrift byte fields into INT32 without 
preserving the required LogicalType Metadata in the Parquet file. This 
behaviour leads to a loss of information and is inconsistent with the expected 
behaviour. The correct conversion should result in INT32 with LogicalType 
metadata indicating a bit width of 8 and signed as true.

 

Thrift Definition

```
struct TestLogicalType {
1: required i16 test_i16,
2: required byte test_i8
}
``` 

Current Parquet Schema 

```

message ParquetSchema {
  required int32 test_i16 (INTEGER(16,true)) = 1;
  required int32 test_i8 = 2;
}

```

 

Expected Parquet Schema 

```

message ParquetSchema {
  required int32 test_i16 (INTEGER(16,true)) = 1;
  required int32 test_i8 (INTEGER(8,true)) = 2;
}

```

  was:The current implementation of Parquet serialisation from Thrift 
Definitions results in the incorrect conversion of Thrift byte fields into 
INT32 without preserving the required LogicalType Metadata in the Parquet file. 
This behaviour leads to a loss of information and is inconsistent with the 
expected behaviour. The correct conversion should result in INT32 with 
LogicalType metadata indicating a bit width of 8 and signed as true.


> ThriftParquetWriter converts thrift byte to int32 without adding logical type 
> ------------------------------------------------------------------------------
>
>                 Key: PARQUET-2420
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2420
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-thrift
>            Reporter: Shreyas B
>            Priority: Major
>
> The current implementation of Parquet serialisation from Thrift Definitions 
> results in the incorrect conversion of Thrift byte fields into INT32 without 
> preserving the required LogicalType Metadata in the Parquet file. This 
> behaviour leads to a loss of information and is inconsistent with the 
> expected behaviour. The correct conversion should result in INT32 with 
> LogicalType metadata indicating a bit width of 8 and signed as true.
>  
> Thrift Definition
> ```
> struct TestLogicalType {
> 1: required i16 test_i16,
> 2: required byte test_i8
> }
> ``` 
> Current Parquet Schema 
> ```
> message ParquetSchema {
>   required int32 test_i16 (INTEGER(16,true)) = 1;
>   required int32 test_i8 = 2;
> }
> ```
>  
> Expected Parquet Schema 
> ```
> message ParquetSchema {
>   required int32 test_i16 (INTEGER(16,true)) = 1;
>   required int32 test_i8 (INTEGER(8,true)) = 2;
> }
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to