[jira] [Commented] (ARROW-8907) [Rust] implement scalar comparison operations

2020-06-03 Thread Neville Dipale (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124921#comment-17124921
 ] 

Neville Dipale commented on ARROW-8907:
---

I don't have permission on Jira to assign this to [~yordan-pavlov]

> [Rust] implement scalar comparison operations
> -
>
> Key: ARROW-8907
> URL: https://issues.apache.org/jira/browse/ARROW-8907
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Yordan Pavlov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently comparing an array to a scalar / literal value using the comparison 
> operations defined in the comparison kernel here:
> https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs
> is very inefficient because:
> (1) an array with the scalar value repeated has to be created, taking time 
> and wasting memory
> (2) time is spent during comparison to load the same literal values over and 
> over
> Initial benchmarking of a specialized scalar comparison function indicates 
> good performance gains:
> eq Float32 time: [938.54 us 950.28 us 962.65 us]
> eq scalar Float32 time: [836.47 us 838.47 us 840.78 us]
> eq Float32 simd time: [75.836 us 76.389 us 77.185 us]
> eq scalar Float32 simd time: [61.551 us 61.605 us 61.671 us]
> The benchmark results above show that the scalar comparison function is about 
> 12% faster for non-SIMD and about 20% faster for SIMD comparison operations.
> And this is before accounting for creating the literal array. 
> In a more complex benchmark, the scalar comparison version is about 40% 
> faster overall when we account for not having to create arrays of scalar / 
> literal values.
> Here are the benchmark results:
> filter/filter with arrow SIMD (array) time: [647.77 us 675.12 us 706.69 us]
> filter/filter with arrow SIMD (scalar) time: [402.19 us 404.23 us 407.22 us]
> And here is the code for the benchmark:
> https://github.com/yordan-pavlov/arrow-benchmark/blob/master/rust/arrow_benchmark/src/main.rs#L230
> My only concern is that I can't see an easy way to use scalar comparison 
> operations in Data Fusion as it is currently designed to only work on arrays.
> [~paddyhoran] [~andygrove]  let me know what you think, would there be value 
> in implementing scalar comparison operations?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8907) [Rust] implement scalar comparison operations

2020-05-23 Thread Yordan Pavlov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114893#comment-17114893
 ] 

Yordan Pavlov commented on ARROW-8907:
--

Sounds good [~andygrove], I think it makes sense to have efficient comparison 
to scalar values as they are often used in real world queries; I already have 
some work in progress for adding scalar comparison functions to the comparison 
kernel of arrow and hope to submit a pull request within the next few days. 
Hopefully this can later be used to increase Data Fusion performance with 
scalar values.

> [Rust] implement scalar comparison operations
> -
>
> Key: ARROW-8907
> URL: https://issues.apache.org/jira/browse/ARROW-8907
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Yordan Pavlov
>Priority: Major
>
> Currently comparing an array to a scalar / literal value using the comparison 
> operations defined in the comparison kernel here:
> https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs
> is very inefficient because:
> (1) an array with the scalar value repeated has to be created, taking time 
> and wasting memory
> (2) time is spent during comparison to load the same literal values over and 
> over
> Initial benchmarking of a specialized scalar comparison function indicates 
> good performance gains:
> eq Float32 time: [938.54 us 950.28 us 962.65 us]
> eq scalar Float32 time: [836.47 us 838.47 us 840.78 us]
> eq Float32 simd time: [75.836 us 76.389 us 77.185 us]
> eq scalar Float32 simd time: [61.551 us 61.605 us 61.671 us]
> The benchmark results above show that the scalar comparison function is about 
> 12% faster for non-SIMD and about 20% faster for SIMD comparison operations.
> And this is before accounting for creating the literal array. 
> In a more complex benchmark, the scalar comparison version is about 40% 
> faster overall when we account for not having to create arrays of scalar / 
> literal values.
> Here are the benchmark results:
> filter/filter with arrow SIMD (array) time: [647.77 us 675.12 us 706.69 us]
> filter/filter with arrow SIMD (scalar) time: [402.19 us 404.23 us 407.22 us]
> And here is the code for the benchmark:
> https://github.com/yordan-pavlov/arrow-benchmark/blob/master/rust/arrow_benchmark/src/main.rs#L230
> My only concern is that I can't see an easy way to use scalar comparison 
> operations in Data Fusion as it is currently designed to only work on arrays.
> [~paddyhoran] [~andygrove]  let me know what you think, would there be value 
> in implementing scalar comparison operations?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8907) [Rust] implement scalar comparison operations

2020-05-23 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114741#comment-17114741
 ] 

Andy Grove commented on ARROW-8907:
---

Thanks [~yordan-pavlov] . What I would really like is for DataFusion to use a 
specialized version of RecordBatch that can contain both Arrays and Scalar 
values, something like this:

 
{code:java}
enum ColumnarValue {
  Array(ArrayRef),
  Scalar(ScalarValue)
}
 
 
struct ColumnarBatch {
  columns: Vec
}
  {code}
 

 

> [Rust] implement scalar comparison operations
> -
>
> Key: ARROW-8907
> URL: https://issues.apache.org/jira/browse/ARROW-8907
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Yordan Pavlov
>Priority: Major
>
> Currently comparing an array to a scalar / literal value using the comparison 
> operations defined in the comparison kernel here:
> https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs
> is very inefficient because:
> (1) an array with the scalar value repeated has to be created, taking time 
> and wasting memory
> (2) time is spent during comparison to load the same literal values over and 
> over
> Initial benchmarking of a specialized scalar comparison function indicates 
> good performance gains:
> eq Float32 time: [938.54 us 950.28 us 962.65 us]
> eq scalar Float32 time: [836.47 us 838.47 us 840.78 us]
> eq Float32 simd time: [75.836 us 76.389 us 77.185 us]
> eq scalar Float32 simd time: [61.551 us 61.605 us 61.671 us]
> The benchmark results above show that the scalar comparison function is about 
> 12% faster for non-SIMD and about 20% faster for SIMD comparison operations.
> And this is before accounting for creating the literal array. 
> In a more complex benchmark, the scalar comparison version is about 40% 
> faster overall when we account for not having to create arrays of scalar / 
> literal values.
> Here are the benchmark results:
> filter/filter with arrow SIMD (array) time: [647.77 us 675.12 us 706.69 us]
> filter/filter with arrow SIMD (scalar) time: [402.19 us 404.23 us 407.22 us]
> And here is the code for the benchmark:
> https://github.com/yordan-pavlov/arrow-benchmark/blob/master/rust/arrow_benchmark/src/main.rs#L230
> My only concern is that I can't see an easy way to use scalar comparison 
> operations in Data Fusion as it is currently designed to only work on arrays.
> [~paddyhoran] [~andygrove]  let me know what you think, would there be value 
> in implementing scalar comparison operations?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)