Yibo Cai created ARROW-8129: ------------------------------- Summary: [C++][Compute] Refine compare sorting kernel Key: ARROW-8129 URL: https://issues.apache.org/jira/browse/ARROW-8129 Project: Apache Arrow Issue Type: Improvement Reporter: Yibo Cai Assignee: Yibo Cai
Sorting kernel implements two comparison functions, [CompareValues|https://github.com/apache/arrow/blob/ab21f0ee429c2a2c82e4dbc5d216ab1da74221a2/cpp/src/arrow/compute/kernels/sort_to_indices.cc#L67] use array.Value() for numeric data and [CompareViews|https://github.com/apache/arrow/blob/ab21f0ee429c2a2c82e4dbc5d216ab1da74221a2/cpp/src/arrow/compute/kernels/sort_to_indices.cc#L72] uses array.GetView() for non-numeric ones. It can be simplified by using GetView() only as all data types support GetView(). To my surprise, benchmark shows about 40% performance improvement after the change. After some digging, I find in current code, the [comparison callback|https://github.com/apache/arrow/blob/ab21f0ee429c2a2c82e4dbc5d216ab1da74221a2/cpp/src/arrow/compute/kernels/sort_to_indices.cc#L94] is not inlined (check disassembled code), it leads to a function call. It's very bad for this hot loop. Using only GetView() fixes this issue, code inlined okay. -- This message was sent by Atlassian Jira (v8.3.4#803005)