[ 
https://issues.apache.org/jira/browse/PARQUET-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093470#comment-17093470
 ] 

ASF GitHub Bot commented on PARQUET-1850:
-----------------------------------------

srinivasst opened a new pull request #789:
URL: https://github.com/apache/parquet-mr/pull/789


   ### Issue
   
   toParquetMetadata method converts 
org.apache.parquet.hadoop.metadata.ParquetMetadata to 
org.apache.parquet.format.FileMetaData but this does not set the dictionary 
page offset bit in FileMetaData.
   
   When a FileMetaData object is serialized while writing to the footer and 
then deserialized, the dictionary offset is lost as the dictionary page offset 
bit was never set.
   
   ### Fix
   
   The flag is set to true when a dictionary page is used for encoding.
   
   ### Tests
   
   A ParquetMetadata object is created with PLAIN_DICTIONARY encoding and 
dictionaryPageOffset is set to a non zero value. 
   
   The ParquetMetadata object is converted to FileMetaData using 
toParquetMetadata method.
   The FileMetaData object is then serialized and deserialized to FileMetaData 
and converted back to ParquetMetadata using fromParquetMetadata method. 
   
   The new ParquetMetadata should have the same dictionaryPageOffset as the 
original ParquetMetadata object.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> toParquetMetadata method in ParquetMetadataConverter does not set dictionary 
> page offset bit
> --------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-1850
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1850
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.10.1, 1.12.0
>            Reporter: Srinivas S T
>            Priority: Major
>             Fix For: 1.12.0
>
>
> toParquetMetadata method converts 
> org.apache.parquet.hadoop.metadata.ParquetMetadata to 
> org.apache.parquet.format.FileMetaData but this does not set the dictionary 
> page offset bit in FileMetaData.
> When a FileMetaData object is serialized while writing to the footer and then 
> deserialized, the dictionary offset is lost as the dictionary page offset bit 
> was never set. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to