Yordan Pavlov created ARROW-8908: ------------------------------------ Summary: [Rust][DataFusion] improve performance of building literal arrays Key: ARROW-8908 URL: https://issues.apache.org/jira/browse/ARROW-8908 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: Yordan Pavlov
[~andygrove] I was doing some profiling and noticed a potential performance improvement described below NOTE: The issue described below would be irrelevant if it was possible to use scalar comparison operations in DataFusion as described here: https://issues.apache.org/jira/browse/ARROW-8907 the `build_literal_array` function defined here https://github.com/apache/arrow/blob/master/rust/datafusion/src/execution/physical_plan/expressions.rs#L1204 creates an array of literal values using a loop, but from benchmarks it appears creating an array from vec is much faster (about 58 times faster when building an array with 100000 values). Here are the benchmark results: array builder/array from vec: time: [25.644 us 25.883 us 26.214 us] array builder/array from values: time: [1.4985 ms 1.5090 ms 1.5213 ms] here is the benchmark code: ``` fn bench_array_builder(c: &mut Criterion) { let array_len = 100000; let mut count = 0; let mut group = c.benchmark_group("array builder"); group.bench_function("array from vec", |b| b.iter(|| { let float_array: PrimitiveArray<Float32Type> = vec![1.0; array_len].into(); count = float_array.len(); })); println!("built array with {} values", count); group.bench_function("array from values", |b| b.iter(|| { // let float_array: PrimitiveArray<Float32Type> = build_literal_array(1.0, array_len); let mut builder = PrimitiveBuilder::<Float32Type>::new(array_len); for _ in 0..count { &builder.append_value(1.0); } let float_array = builder.finish(); count = float_array.len(); })); println!("built array with {} values", count); } ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)