I ran into a few issues with the rust sort kernels and would like to
discuss possible solutions.

1. When sorting by multiple columns (lexsort_to_indices) the Float32
and Float64 data types are not supported because the implementation
relies on the OrdArray trait. This trait is not implemented because
f64/f32 only implements PartialOrd. The sort function for a single
column (sort_to_indices) has some special logic which looks like it
wants to treats NaN the same as null, but I'm also not convinced this
is the correct way. For example postgres does the following
(https://www.postgresql.org/docs/12/datatype-numeric.html#DATATYPE-FLOAT)

"In order to allow floating-point values to be sorted and used in
tree-based indexes, PostgreSQL treats NaN values as equal, and greater
than all non-NaN values."

I propose to do the same in an OrdArray impl for
Float64Array/Float32Array and then simplifying the sort_to_indices
function accordingly.

2. Sorting for dictionary encoded strings. The problem here is that
DictionaryArray does not have a generic parameter for the value type
so it is not currently possible to only implement OrdArray for string
dictionaries. Again for the single column case, the value data type
could be checked and a sort could be implemented by looking up each
key in the dictionary. An optimization could be to check the is_sorted
flag of DictionaryArray (which does not seem to be used really) and
then directly sort by the keys. For the general case I see roughly to
options

- Somehow implement an OrdArray view of the dictionary array. This
could be easier if OrdArray did not extend Array but was a completely
separate trait.
- Change the lexicographic sort impl to not use dynamic calls but
instead sort multiple times. So for a query `ORDER BY a, b`, first
sort by b and afterwards sort again by a. With a stable sort
implementation this should result in the same ordering. I'm curious
about the performance, it could avoid dynamic method calls for each
comparison, but it would process the indices vector multiple times.

Looking forward to any other suggestions or feedback.

-- 
Jörn Horstmann | Senior Backend Engineer

www.signavio.com
Kurfürstenstraße 111, 10787 Berlin, Germany

Reply via email to