[jira] [Created] (ARROW-11897) [Rust][Parquet] Use iterators to increase performance of creating Arrow arrays

2021-03-06 Thread Yordan Pavlov (Jira)
Yordan Pavlov created ARROW-11897:
-

 Summary: [Rust][Parquet] Use iterators to increase performance of 
creating Arrow arrays
 Key: ARROW-11897
 URL: https://issues.apache.org/jira/browse/ARROW-11897
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Yordan Pavlov


The overall goal is to create an efficient pipeline from Parquet page data into 
Arrow arrays, with as little intermediate conversion and memory allocation as 
possible. It is assumed, that for best performance, we favor doing fewer but 
larger copy operations (rather than more but smaller). 

Such a pipeline would need to be flexible in order to enable high performance 
implementations in several different cases:
(1) In some cases, such as plain-encoded number array, it might even be 
possible to copy / create the array from a single contiguous section from a 
page buffer. 
(2) In other cases, such as plain-encoded string array, since values are 
encoded in non-contiguous slices (where value bytes are separated by length 
bytes) in a page buffer contains multiple values, individual values will have 
to be copied separately and it's not obvious how this can be avoided.
(3) Finally, in the case of bit-packing encoding and smaller numeric values, 
page buffer data has to be decoded / expanded before it is ready to copy into 
an arrow arrow, so a `Vec` will have to be returned instead of a slice 
pointing to a page buffer.

A decoder output abstraction that enables all of the above cases and minimizes 
intermediate memory allocation is `Iterator)>`.
Then in case (1) above, where a numeric array could be created from a single 
contiguous byte slice, such an iterator could return a single item such as 
`(1024, &[u8])`. 
In case (2) above, where each string value is encoded as an individual byte 
slice, but it is still possible to copy directly from a page buffer, a decoder 
iterator could return a sequence of items such as `(1, &[u8])`. 
And finally in case (3) above, where bit-packed values have to be 
unpacked/expanded, and it's NOT possible to copy value bytes directly from a 
page buffer, a decoder iterator could return items representing chunks of 
values such as `(32, Vec)` where bit-packed values have been unpacked and  
the chunk size is configured for best performance.

Another benefit of an `Iterator`-based abstraction is that it would prepare the 
parquet crate for  migration to `async` `Stream`s (my understanding is that a 
`Stream` is effectively an async `Iterator`).

Then a higher level iterator could combine a value iterator and a (def) level 
iterator to produce a sequence of `ValueSequence(count, AsRef<[u8]>)` and 
`NullSequence(count)` items from which an arrow array can be created 
efficiently.

In future, a higher level iterator (for the keys) could be combined with a 
dictionary value iterator to create a dictionary array.

Finally, Arrow arrays would be created from a (generic) higher-level iterator, 
using a layer of array converters that know what the value bytes and nulls mean 
for each type of array.

[~nevime] , [~Dandandan] , [~jorgecarleitao] let me know what you think

Next steps:
* split work into smaller tasks that could be done over time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11799) [Rust] String and Binary arrays created with incorrect length from unbound iterator

2021-02-26 Thread Yordan Pavlov (Jira)
Yordan Pavlov created ARROW-11799:
-

 Summary: [Rust] String and Binary arrays created with incorrect 
length from unbound iterator
 Key: ARROW-11799
 URL: https://issues.apache.org/jira/browse/ARROW-11799
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 3.0.0
Reporter: Yordan Pavlov
Assignee: Yordan Pavlov


While looking for a way to make loading array data from parquet files faster, I 
stumbled on an edge case where string and binary arrays are created with an 
incorrect length from an iterator with no upper bound.

Here is a simple example:

```
 // iterator that doesn't declare (upper) size bound
let string_iter = (0..).scan(0usize, |pos, i| { 
if *pos < 10 {
*pos += 1;
Some(Some(format!("value {}", i)))
}
else {
// actually returns up to 10 values
None
}
})
// limited using take()
.take(100);

let (lower_size_bound, upper_size_bound) = string_iter.size_hint();
assert_eq!(lower_size_bound, 0);
// the upper bound, defined by take above, is 100
assert_eq!(upper_size_bound, Some(100));
let string_array: StringArray = string_iter.collect();
// but the actual number of items in the array is 10
assert_eq!(string_array.len(), 10);
```

Fortunately this is easy to fix by using the length of the child offset array 
and I will be creating a PR for this shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11410) [Rust][Parquet] Implement returning dictionary arrays from parquet reader

2021-01-27 Thread Yordan Pavlov (Jira)
Yordan Pavlov created ARROW-11410:
-

 Summary: [Rust][Parquet] Implement returning dictionary arrays 
from parquet reader
 Key: ARROW-11410
 URL: https://issues.apache.org/jira/browse/ARROW-11410
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Yordan Pavlov


Currently the Rust parquet reader returns a regular array (e.g. string array) 
even when the column is dictionary encoded in the parquet file.

If the parquet reader had the ability to return dictionary arrays for 
dictionary encoded columns this would bring many benefits such as:
 * faster reading of dictionary encoded columns from parquet (as no 
conversion/expansion into a regular array would be necessary)
 * more efficient memory use as the dictionary array would use less memory when 
loaded in memory
 * faster filtering operations as SIMD can be used to filter over the numeric 
keys of a dictionary string array instead of comparing string values in a 
string array

[~nevime] , [~alamb]  let me know what you think



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11074) [Rust][DataFusion] Implement predicate push-down for parquet tables

2020-12-30 Thread Yordan Pavlov (Jira)
Yordan Pavlov created ARROW-11074:
-

 Summary: [Rust][DataFusion] Implement predicate push-down for 
parquet tables
 Key: ARROW-11074
 URL: https://issues.apache.org/jira/browse/ARROW-11074
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Yordan Pavlov
Assignee: Yordan Pavlov


While profiling a DataFusion query I found that the code spends a lot of time 
in reading data from parquet files. Predicate / filter push-down is a commonly 
used performance optimization, where statistics data stored in parquet files 
(such as min / max values for columns in a parquet row group) is evaluated 
against query filters to determine which row groups could contain data 
requested by a query. In this way, by pushing down query filters all the way to 
the parquet data source, entire row groups or even parquet files can be skipped 
often resulting in significant performance improvements.

 

I have been working on an implementation for a few weeks and initial results 
look promising - with predicate push-down, DataFusion is now faster than Apache 
Spark (140ms for DataFusion vs 200ms for Spark) for the same query against the 
same parquet files. And I suspect with the latest improvements to the filter 
kernel, DataFusion performance will be even better.

 

My work is based on the following key ideas:
 * it's best to reuse the existing code for evaluating physical expressions 
already implemented in DataFusion
 * filter expressions pushed down to a parquet table are rewritten to use 
parquet statistics, for example `(column / 2) = 4`  becomes  `(column_min / 2) 
<= 4 && 4 <= (column_max / 2)` - this is done once for all files in a parquet 
table
 * for each parquet file, a RecordBatch containing all required statistics 
columns is produced, and the predicate expression from the previous step is 
evaluated, producing a binary array which is finally used to filter the row 
groups in each parquet file

Next steps are: integrate this work with latest changes from master branch, 
publish WIP PR, implement more unit tests

[~andygrove] , [~alamb] let me know what you think



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10759) [Rust][DataFusion] Implement support for casting string to date in sql expressions

2020-11-28 Thread Yordan Pavlov (Jira)
Yordan Pavlov created ARROW-10759:
-

 Summary: [Rust][DataFusion] Implement support for casting string 
to date in sql expressions
 Key: ARROW-10759
 URL: https://issues.apache.org/jira/browse/ARROW-10759
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Affects Versions: 2.0.0
Reporter: Yordan Pavlov


If DataFusion had support for creating date literals using a cast expression 
such as 
CAST('2019-01-01' AS DATE) this would allow direct (and therefore more 
efficient) comparison of date columns to scalar values (compared to 
representing dates as strings and then resorting to string comparison).
I already have a basic implementation that works, just have to add some more 
tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9523) [Rust] improve performance of filter kernel

2020-07-19 Thread Yordan Pavlov (Jira)
Yordan Pavlov created ARROW-9523:


 Summary: [Rust] improve performance of filter kernel
 Key: ARROW-9523
 URL: https://issues.apache.org/jira/browse/ARROW-9523
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.17.1
Reporter: Yordan Pavlov
 Fix For: 1.0.0


The filter kernel located here 
[https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/filter.rs]

currently has the following performance:

filter old u8 low selectivity time: [1.7782 ms 1.7790 ms 1.7801 ms]
filter old u8 high selectivity time: [815.58 us 816.58 us 817.57 us]
filter old u8 w NULLs low selectivity time: [1.8131 ms 1.8231 ms 1.8336 ms]
filter old u8 w NULLs high selectivity time: [817.41 us 820.01 us 823.05 us]

I have been working on a new implementation which performs between 
approximately 14 and 480 times faster depending mostly on filter selectivity. 
Here are the benchmark results:

filter u8 low selectivity time: [127.30 us 128.06 us 128.88 us]
filter u8 high selectivity time: [5.4215 us 5.5778 us 5.7335 us]
filter context u8 low selectivity time: [124.21 us 126.21 us 128.38 us]
filter context u8 high selectivity time: [1.6707 us 1.7052 us 1.7476 us]
filter context u8 w NULLs low selectivity time: [142.40 us 142.83 us 143.37 us]
filter context u8 w NULLs high selectivity time: [2.3338 us 2.3788 us 2.4304 us]
filter context f32 low selectivity time: [132.59 us 132.91 us 133.29 us]
filter context f32 high selectivity time: [1.6864 us 1.7026 us 1.7212 us]

This new implementation is based on a few key ideas:

(1) if the data array being filtered doesn't have a null bitmap, no time should 
be wasted to copy or create a null bitmap in the resulting filtered data array 
- this is implemented using the CopyNullBit trait which has a no-op 
implementation and an actual implementation

(2) when the filter is highly selective, e.g. only a small number of values 
from the data array are selected, the filter implementation should efficiently 
skip entire batches of 0s in the filter array - this is implemented by 
transmuting the filter array to u64 which allows to quickly check and skip 
entire batches of 64 bits 

(3) when an entire record batch is filtered, any computation which only depends 
on the filter array is done once and then shared for filtering all the data 
arrays in the record batch - this is implemented using the FilterContext struct

 

[~paddyhoran], [~andygrove] let me know what you think



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8907) [Rust] implement scalar comparison operations

2020-05-23 Thread Yordan Pavlov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114893#comment-17114893
 ] 

Yordan Pavlov commented on ARROW-8907:
--

Sounds good [~andygrove], I think it makes sense to have efficient comparison 
to scalar values as they are often used in real world queries; I already have 
some work in progress for adding scalar comparison functions to the comparison 
kernel of arrow and hope to submit a pull request within the next few days. 
Hopefully this can later be used to increase Data Fusion performance with 
scalar values.

> [Rust] implement scalar comparison operations
> -
>
> Key: ARROW-8907
> URL: https://issues.apache.org/jira/browse/ARROW-8907
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Yordan Pavlov
>Priority: Major
>
> Currently comparing an array to a scalar / literal value using the comparison 
> operations defined in the comparison kernel here:
> https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs
> is very inefficient because:
> (1) an array with the scalar value repeated has to be created, taking time 
> and wasting memory
> (2) time is spent during comparison to load the same literal values over and 
> over
> Initial benchmarking of a specialized scalar comparison function indicates 
> good performance gains:
> eq Float32 time: [938.54 us 950.28 us 962.65 us]
> eq scalar Float32 time: [836.47 us 838.47 us 840.78 us]
> eq Float32 simd time: [75.836 us 76.389 us 77.185 us]
> eq scalar Float32 simd time: [61.551 us 61.605 us 61.671 us]
> The benchmark results above show that the scalar comparison function is about 
> 12% faster for non-SIMD and about 20% faster for SIMD comparison operations.
> And this is before accounting for creating the literal array. 
> In a more complex benchmark, the scalar comparison version is about 40% 
> faster overall when we account for not having to create arrays of scalar / 
> literal values.
> Here are the benchmark results:
> filter/filter with arrow SIMD (array) time: [647.77 us 675.12 us 706.69 us]
> filter/filter with arrow SIMD (scalar) time: [402.19 us 404.23 us 407.22 us]
> And here is the code for the benchmark:
> https://github.com/yordan-pavlov/arrow-benchmark/blob/master/rust/arrow_benchmark/src/main.rs#L230
> My only concern is that I can't see an easy way to use scalar comparison 
> operations in Data Fusion as it is currently designed to only work on arrays.
> [~paddyhoran] [~andygrove]  let me know what you think, would there be value 
> in implementing scalar comparison operations?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8908) [Rust][DataFusion] improve performance of building literal arrays

2020-05-23 Thread Yordan Pavlov (Jira)
Yordan Pavlov created ARROW-8908:


 Summary: [Rust][DataFusion] improve performance of building 
literal arrays
 Key: ARROW-8908
 URL: https://issues.apache.org/jira/browse/ARROW-8908
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Yordan Pavlov


[~andygrove] I was doing some profiling and noticed a potential performance 
improvement described below


NOTE: The issue described below would be irrelevant if it was possible to use 
scalar comparison operations in DataFusion as described here:
https://issues.apache.org/jira/browse/ARROW-8907


the `build_literal_array` function defined here 
https://github.com/apache/arrow/blob/master/rust/datafusion/src/execution/physical_plan/expressions.rs#L1204
creates an array of literal values using a loop, but from benchmarks it appears 
creating an array from vec is much faster 
(about 58 times faster when building an array with 10 values).
Here are the benchmark results:

array builder/array from vec: time: [25.644 us 25.883 us 26.214 us]
array builder/array from values: time: [1.4985 ms 1.5090 ms 1.5213 ms]

here is the benchmark code:
```
fn bench_array_builder(c:  Criterion) {
 let array_len = 10;
 let mut count = 0;
 let mut group = c.benchmark_group("array builder");

group.bench_function("array from vec", |b| b.iter(|| {
 let float_array: PrimitiveArray = vec![1.0; array_len].into();
 count = float_array.len();
 }));
 println!("built array with {} values", count);

group.bench_function("array from values", |b| b.iter(|| {
 // let float_array: PrimitiveArray = build_literal_array(1.0, 
array_len);
 let mut builder = PrimitiveBuildernew(array_len);
 for _ in 0..count {
 _value(1.0);
 }
 let float_array = builder.finish();
 count = float_array.len();
 }));
 println!("built array with {} values", count);
}
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8907) [Rust] implement scalar comparison operations

2020-05-23 Thread Yordan Pavlov (Jira)
Yordan Pavlov created ARROW-8907:


 Summary: [Rust] implement scalar comparison operations
 Key: ARROW-8907
 URL: https://issues.apache.org/jira/browse/ARROW-8907
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Yordan Pavlov


Currently comparing an array to a scalar / literal value using the comparison 
operations defined in the comparison kernel here:
https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs
is very inefficient because:
(1) an array with the scalar value repeated has to be created, taking time and 
wasting memory
(2) time is spent during comparison to load the same literal values over and 
over

Initial benchmarking of a specialized scalar comparison function indicates good 
performance gains:

eq Float32 time: [938.54 us 950.28 us 962.65 us]
eq scalar Float32 time: [836.47 us 838.47 us 840.78 us]
eq Float32 simd time: [75.836 us 76.389 us 77.185 us]
eq scalar Float32 simd time: [61.551 us 61.605 us 61.671 us]

The benchmark results above show that the scalar comparison function is about 
12% faster for non-SIMD and about 20% faster for SIMD comparison operations.
And this is before accounting for creating the literal array. 
In a more complex benchmark, the scalar comparison version is about 40% faster 
overall when we account for not having to create arrays of scalar / literal 
values.
Here are the benchmark results:

filter/filter with arrow SIMD (array) time: [647.77 us 675.12 us 706.69 us]
filter/filter with arrow SIMD (scalar) time: [402.19 us 404.23 us 407.22 us]

And here is the code for the benchmark:
https://github.com/yordan-pavlov/arrow-benchmark/blob/master/rust/arrow_benchmark/src/main.rs#L230

My only concern is that I can't see an easy way to use scalar comparison 
operations in Data Fusion as it is currently designed to only work on arrays.

[~paddyhoran] [~andygrove]  let me know what you think, would there be value in 
implementing scalar comparison operations?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8831) [Rust] incomplete SIMD implementation in simd_compare_op

2020-05-17 Thread Yordan Pavlov (Jira)
Yordan Pavlov created ARROW-8831:


 Summary: [Rust] incomplete SIMD implementation in simd_compare_op
 Key: ARROW-8831
 URL: https://issues.apache.org/jira/browse/ARROW-8831
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Affects Versions: 0.17.0
Reporter: Yordan Pavlov


Currently the simd_compare_op function defined here 
[https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs#L204]
 is only about 10% faster compared to the non-SIMD implementation and is  
taking approximately the same time for types of different length (which 
indicates that the SIMD implementation is not complete). Below are results from 
benchmarks with Int8 and Float32 types:

eq Int8 time: [947.53 us 947.81 us 948.05 us]
eq Int8 simd time: [855.02 us 858.26 us 862.48 us]
neq Int8 time: [904.09 us 907.34 us 911.44 us]
neq Int8 simd time: [848.49 us 849.28 us 850.28 us]
lt Int8 time: [900.87 us 902.65 us 904.86 us]
lt Int8 simd time: [850.32 us 850.96 us 851.90 us]
lt_eq Int8 time: [974.68 us 983.03 us 991.98 us]
lt_eq Int8 simd time: [851.83 us 852.22 us 852.74 us]
gt Int8 time: [908.48 us 911.76 us 914.72 us]
gt Int8 simd time: [851.93 us 852.43 us 853.04 us]
gt_eq Int8 time: [981.53 us 983.37 us 986.31 us]
gt_eq Int8 simd time: [855.59 us 856.83 us 858.61 us]

eq Float32 time: [911.46 us 911.70 us 912.01 us]
eq Float32 simd time: [884.74 us 885.97 us 887.74 us]
neq Float32 time: [904.26 us 904.73 us 905.27 us]
neq Float32 simd time: [884.40 us 892.32 us 901.98 us]
lt Float32 time: [907.90 us 908.54 us 909.34 us]
lt Float32 simd time: [883.23 us 886.05 us 889.31 us]
lt_eq Float32 time: [911.44 us 911.62 us 911.82 us]
lt_eq Float32 simd time: [882.78 us 886.78 us 891.05 us]
gt Float32 time: [906.88 us 907.96 us 909.32 us]
gt Float32 simd time: [879.78 us 883.03 us 886.63 us]
gt_eq Float32 time: [924.72 us 926.03 us 928.29 us]
gt_eq Float32 simd time: [884.80 us 885.93 us 887.35 us]

In the benchmark results above, notice how both the SIMD and non-SIMD 
operations take similar amounts of time for types of different size (Int8 and 
Float32). This is normal for a non-SIMD implementation but is not normal for a 
SIMD implementation as SIMD operations can be executed on more values of 
smaller size.

 

This pull request attempts to fix that: 
[https://github.com/apache/arrow/pull/7204]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8598) [Rust] simd_compare_op creates buffer of incorrect length when item count is not a multiple of T::lanes()

2020-04-26 Thread Yordan Pavlov (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yordan Pavlov updated ARROW-8598:
-
Description: 
the simd_compare_op function defined here 
[https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs#L229]

appears to only work correctly when item count is a multiple of T::lanes().

Otherwise the resulting boolean array is created with a buffer of incorrect 
length and subsequent binary operations such as compute::and return an error.

The no_simd_compare_op function defined in the same module does not appear to 
have this problem.

This bug can be reproduced with the following code:

 
{code:java}
fn main() {
let lanes = Int8Type::lanes();
println("i8 lanes: {}", lanes); // 64
// let item_count = 128; // this works because item_count divides by 64 
(lanes) without remainder
let item_count = 130; // this fails because item_count divides by 64 
(lanes) with remainder
// item_count = 130 should return error:
// ComputeError("Buffers must be the same size to apply Bitwise AND.") 
// create boolean array
let mut select_mask: BooleanArray = vec![true; item_count].into();
// create arrays with i8 values
let value_array: PrimitiveArray = vec![1; item_count].into();
let filter_array: PrimitiveArray = vec![2; item_count].into();
// compare i8 arrays and produce a boolean array
let result_mask = compute::gt_eq(_array, _array).unwrap();
// compare boolean arrays
 select_mask = compute::and(_mask, _mask).unwrap();
// print result, should be all false
println!("select mask: {:?}", select_mask);
}
{code}
 

 

  was:
the simd_compare_op function defined here 
[https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs#L229]

appears to only work correctly when item count is a multiple of T::lanes().

Otherwise the resulting boolean array is created with a buffer of incorrect 
length and subsequent binary operations such as compute::and return an error.

This bug can be reproduced with the following code:

 

 
{code:java}
fn main() {
let lanes = Int8Type::lanes();
println("i8 lanes: {}", lanes); // 64
// let item_count = 128; // this works because item_count divides by 64 
(lanes) without remainder
let item_count = 130; // this fails because item_count divides by 64 
(lanes) with remainder
// item_count = 130 should return error:
// ComputeError("Buffers must be the same size to apply Bitwise AND.") 
// create boolean array
let mut select_mask: BooleanArray = vec![true; item_count].into();
// create arrays with i8 values
let value_array: PrimitiveArray = vec![1; item_count].into();
let filter_array: PrimitiveArray = vec![2; item_count].into();
// compare i8 arrays and produce a boolean array
let result_mask = compute::gt_eq(_array, _array).unwrap();
// compare boolean arrays
 select_mask = compute::and(_mask, _mask).unwrap();
// print result, should be all false
println!("select mask: {:?}", select_mask);
}
{code}
 

 


> [Rust] simd_compare_op creates buffer of incorrect length when item count is 
> not a multiple of T::lanes()
> -
>
> Key: ARROW-8598
> URL: https://issues.apache.org/jira/browse/ARROW-8598
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Affects Versions: 0.17.0
>Reporter: Yordan Pavlov
>Priority: Major
>
> the simd_compare_op function defined here 
> [https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs#L229]
> appears to only work correctly when item count is a multiple of T::lanes().
> Otherwise the resulting boolean array is created with a buffer of incorrect 
> length and subsequent binary operations such as compute::and return an error.
> The no_simd_compare_op function defined in the same module does not appear to 
> have this problem.
> This bug can be reproduced with the following code:
>  
> {code:java}
> fn main() {
> let lanes = Int8Type::lanes();
> println("i8 lanes: {}", lanes); // 64
> // let item_count = 128; // this works because item_count divides by 64 
> (lanes) without remainder
> let item_count = 130; // this fails because item_count divides by 64 
> (lanes) with remainder
> // item_count = 130 should return error:
> // ComputeError("Buffers must be the same size to apply Bitwise AND.")
>  
> // create boolean array
> let mut select_mask: BooleanArray = vec![true; item_count].into();
> // create arrays with i8 values
> let value_array: PrimitiveArray = vec![1; item_count].into();
> let filter_array: PrimitiveArray = vec![2; item_count].into();
> // compare i8 arrays and 

[jira] [Created] (ARROW-8598) [Rust] simd_compare_op creates buffer of incorrect length when item count is not a multiple of T::lanes()

2020-04-26 Thread Yordan Pavlov (Jira)
Yordan Pavlov created ARROW-8598:


 Summary: [Rust] simd_compare_op creates buffer of incorrect length 
when item count is not a multiple of T::lanes()
 Key: ARROW-8598
 URL: https://issues.apache.org/jira/browse/ARROW-8598
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Affects Versions: 0.17.0
Reporter: Yordan Pavlov


the simd_compare_op function defined here 
[https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs#L229]

appears to only work correctly when item count is a multiple of T::lanes().

Otherwise the resulting boolean array is created with a buffer of incorrect 
length and subsequent binary operations such as compute::and return an error.

This bug can be reproduced with the following code:

 

 
{code:java}
fn main() {
let lanes = Int8Type::lanes();
println("i8 lanes: {}", lanes); // 64
// let item_count = 128; // this works because item_count divides by 64 
(lanes) without remainder
let item_count = 130; // this fails because item_count divides by 64 
(lanes) with remainder
// item_count = 130 should return error:
// ComputeError("Buffers must be the same size to apply Bitwise AND.") 
// create boolean array
let mut select_mask: BooleanArray = vec![true; item_count].into();
// create arrays with i8 values
let value_array: PrimitiveArray = vec![1; item_count].into();
let filter_array: PrimitiveArray = vec![2; item_count].into();
// compare i8 arrays and produce a boolean array
let result_mask = compute::gt_eq(_array, _array).unwrap();
// compare boolean arrays
 select_mask = compute::and(_mask, _mask).unwrap();
// print result, should be all false
println!("select mask: {:?}", select_mask);
}
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)