[GitHub] [iceberg] findepi commented on a change in pull request #2055: Spec: add sort order to spec

GitBox Thu, 29 Jul 2021 05:05:51 -0700


findepi commented on a change in pull request #2055:
URL: https://github.com/apache/iceberg/pull/2055#discussion_r679088821




##########
File path: site/docs/spec.md
##########
@@ -254,6 +254,24 @@ Notes:
 2. The width, `W`, used to truncate decimal values is applied using the scale 
of the decimal column to avoid additional (and potentially conflicting) 
parameters.
 
 
+### Sorting
+
+Users can sort their data within partitions by columns to gain performance. 
The information on how the data is sorted can be declared per data or delete 
file, by a **sort order**.
+
+A sort order is defined by an sort order id and a list of sort fields. The 
order of the sort fields within the list defines the order in which the sort is 
applied to the data. Each sort field consists of:
+
+*   A **source column id** from the table's schema
+*   A **transform** that is used to produce values to be sorted on from the 
source column. This is the same transform as described in [partition 
transforms](#partition-transforms).
+*   A **sort direction**, that can only be either `asc` or `desc`
+*   A **null order** that describes the order of null values when sorted. Can 
only be either `nulls-first` or `nulls-last`
+
+Order id `0` is reserved for the unsorted order. 
+
+Sorting floating-point numbers should produce the following behavior: `-NaN` < 
`-Infinity` < `-value` < `-0` < `0` < `value` < `Infinity` < `NaN`. This aligns 
with the implementation of Java floating-point types comparisons. 

Review comment:
       The -NaN is a bit ambiguous:
   
   - it can be read as a result of applying unary minus to a NaN value. In 
Java, `-Double.NaN` is not distinguishable from `Double.NaN` (is exact same 
value bitwise). 
     - my JVM returns true from `Double.doubleToRawLongBits(-Double.NaN) == 
Double.doubleToRawLongBits(Double.NaN)`
   - it can be read as a "a IEEE 754 NaN value that has a sign bit set". For 
exampe
     - for example, in my JVM, `Double.longBitsToDouble(0xfff8000000000000L)` 
is such value, if I read this correctly. and so, 
`Double.isNaN(Double.longBitsToDouble(0xfff8000000000000L))` is true (while 
also `Double.doubleToRawLongBits(Double.longBitsToDouble(0xfff8000000000000L)) 
!= Double.doubleToRawLongBits(Double.NaN)` is true, i.e. this expression 
constructs a NaN value that;s bitwise distinguishable from `Double.NaN`)
   
   In both cases however, "Java floating-point types comparisons" seems to sort 
all NaN values as peers, and "greater" than positive infinity:
   
   ```
   System.out.println(Double.compare(Double.POSITIVE_INFINITY, Double.NaN)); // 
-1
   System.out.println(Double.compare(Double.NaN, -Double.NaN)); // 0
   System.out.println(Double.compare(Double.NaN, 
Double.longBitsToDouble(0xfff8000000000000L))); // 0
   System.out.println(Double.compare(Double.NaN, 
Double.longBitsToDouble(0xfff8000012340000L))); // 0
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] findepi commented on a change in pull request #2055: Spec: add sort order to spec

Reply via email to