[ 
https://issues.apache.org/jira/browse/PARQUET-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318547#comment-16318547
 ] 

Zoltan Ivanfi edited comment on PARQUET-1065 at 2/13/18 10:50 AM:
------------------------------------------------------------------

[~lv], we can not change the ordering for the existing {{min}} and {{max}} 
fields for int96 timestamps, because statistics were already written for them 
according to the wrong byte order.

We do not want to define a new int96 ordering for the new {{min-value}} and 
{{max-value}} fields either, because:
 # We can not distuingish between an int96 number and a timestamps stored in an 
int96, although they would require different endianness.
 # Introducing reverse endian ordering would put an unnecessary burden on the 
implementors for the sake of a legacy type that we would like to get rid of.


was (Author: zi):
[~lv], we can not change the ordering for the existing {{min}} and {{max}} 
fields for int96 timestamps, because statistics were already written for them 
according to the wrong byte order.

We do not want to define a new int96 ordering for the new {{min-value}} and 
{{max-value}} fields either, because:

# We can distuingish between a timestamps stored in an int96 that requires 
little-endian ordering and an actual int96 that requires big-endian ordering.
# Introducing little-endian ordering would put an unnecessary burden on the 
implementors for the sake of a legacy type that we would like to get rid of.

> Deprecate type-defined sort ordering for INT96 type
> ---------------------------------------------------
>
>                 Key: PARQUET-1065
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1065
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Zoltan Ivanfi
>            Assignee: Zoltan Ivanfi
>            Priority: Major
>             Fix For: 1.10.0
>
>
> [parquet.thrift in 
> parquet-format|https://github.com/apache/parquet-format/blob/041708da1af52e7cb9288c331b542aa25b68a2b6/src/main/thrift/parquet.thrift#L37]
>  defines the the sort order for INT96 to be signed. 
> [ParquetMetadataConverter.java in 
> parquet-mr|https://github.com/apache/parquet-mr/blob/352b906996f392030bfd53b93e3cf4adb78d1a55/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L422]
>  uses unsigned ordering instead. In practice, INT96 is only used for 
> timestamps and neither signed nor unsigned ordering of the numeric values is 
> correct for this purpose. For this reason, the INT96 sort order should be 
> specified as undefined.
> (As a special case, min == max signifies that all values are the same, and 
> can be considered valid even for undefined orderings.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to