Yibo Cai created ARROW-8129:
-------------------------------

             Summary: [C++][Compute] Refine compare sorting kernel
                 Key: ARROW-8129
                 URL: https://issues.apache.org/jira/browse/ARROW-8129
             Project: Apache Arrow
          Issue Type: Improvement
            Reporter: Yibo Cai
            Assignee: Yibo Cai


Sorting kernel implements two comparison functions, 
[CompareValues|https://github.com/apache/arrow/blob/ab21f0ee429c2a2c82e4dbc5d216ab1da74221a2/cpp/src/arrow/compute/kernels/sort_to_indices.cc#L67]
 use array.Value() for numeric data and 
[CompareViews|https://github.com/apache/arrow/blob/ab21f0ee429c2a2c82e4dbc5d216ab1da74221a2/cpp/src/arrow/compute/kernels/sort_to_indices.cc#L72]
 uses array.GetView() for non-numeric ones. It can be simplified by using 
GetView() only as all data types support GetView().

To my surprise, benchmark shows about 40% performance improvement after the 
change.

After some digging, I find in current code, the [comparison 
callback|https://github.com/apache/arrow/blob/ab21f0ee429c2a2c82e4dbc5d216ab1da74221a2/cpp/src/arrow/compute/kernels/sort_to_indices.cc#L94]
 is not inlined (check disassembled code), it leads to a function call. It's 
very bad for this hot loop. Using only GetView() fixes this issue, code inlined 
okay.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to