[ https://issues.apache.org/jira/browse/ARROW-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114893#comment-17114893 ]
Yordan Pavlov commented on ARROW-8907: -------------------------------------- Sounds good [~andygrove], I think it makes sense to have efficient comparison to scalar values as they are often used in real world queries; I already have some work in progress for adding scalar comparison functions to the comparison kernel of arrow and hope to submit a pull request within the next few days. Hopefully this can later be used to increase Data Fusion performance with scalar values. > [Rust] implement scalar comparison operations > --------------------------------------------- > > Key: ARROW-8907 > URL: https://issues.apache.org/jira/browse/ARROW-8907 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust > Reporter: Yordan Pavlov > Priority: Major > > Currently comparing an array to a scalar / literal value using the comparison > operations defined in the comparison kernel here: > https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs > is very inefficient because: > (1) an array with the scalar value repeated has to be created, taking time > and wasting memory > (2) time is spent during comparison to load the same literal values over and > over > Initial benchmarking of a specialized scalar comparison function indicates > good performance gains: > eq Float32 time: [938.54 us 950.28 us 962.65 us] > eq scalar Float32 time: [836.47 us 838.47 us 840.78 us] > eq Float32 simd time: [75.836 us 76.389 us 77.185 us] > eq scalar Float32 simd time: [61.551 us 61.605 us 61.671 us] > The benchmark results above show that the scalar comparison function is about > 12% faster for non-SIMD and about 20% faster for SIMD comparison operations. > And this is before accounting for creating the literal array. > In a more complex benchmark, the scalar comparison version is about 40% > faster overall when we account for not having to create arrays of scalar / > literal values. > Here are the benchmark results: > filter/filter with arrow SIMD (array) time: [647.77 us 675.12 us 706.69 us] > filter/filter with arrow SIMD (scalar) time: [402.19 us 404.23 us 407.22 us] > And here is the code for the benchmark: > https://github.com/yordan-pavlov/arrow-benchmark/blob/master/rust/arrow_benchmark/src/main.rs#L230 > My only concern is that I can't see an easy way to use scalar comparison > operations in Data Fusion as it is currently designed to only work on arrays. > [~paddyhoran] [~andygrove] let me know what you think, would there be value > in implementing scalar comparison operations? -- This message was sent by Atlassian Jira (v8.3.4#803005)