[jira] [Created] (ARROW-10995) [Rust] [DataFusion] Improve parallelism when reading Parquet files

2020-12-20 Thread Andy Grove (Jira)
Andy Grove created ARROW-10995:
--

 Summary: [Rust] [DataFusion] Improve parallelism when reading 
Parquet files
 Key: ARROW-10995
 URL: https://issues.apache.org/jira/browse/ARROW-10995
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Reporter: Andy Grove


Currently the unit of parallelism is the number of parquet files being read.

For example, if we run a query against a Parquet table that consists of 8 
partitions then we will attempt to run 8 async tasks in parallel and if there 
is a single Parquet file then we will only try and run 1 async task so this 
does not scale well.

A better approach would be to have one parallel task per "chunk" in each 
parquet file. This would involve an upfront step in the planner to scan the 
parquet meta-data to get a list of chunks and then split these up between the 
configured number of parallel tasks.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10993) [CI][macOS] Fix Python 3.9 installation by Homebrew

2020-12-20 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-10993:


 Summary: [CI][macOS] Fix Python 3.9 installation by Homebrew
 Key: ARROW-10993
 URL: https://issues.apache.org/jira/browse/ARROW-10993
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


https://github.com/apache/arrow/runs/1579780011#step:4:520

{noformat}
==> Pouring python@3.9-3.9.1_1.catalina.bottle.tar.gz
Error: The `brew link` step did not complete successfully
The formula built, but is not symlinked into /usr/local
Could not symlink bin/2to3
Target /usr/local/bin/2to3
already exists. You may want to remove it:
  rm '/usr/local/bin/2to3'

To force the link and overwrite all conflicting files:
  brew link --overwrite python@3.9

To list all files that would be deleted:
  brew link --overwrite --dry-run python@3.9

Possible conflicting files are:
/usr/local/bin/2to3 -> 
/Library/Frameworks/Python.framework/Versions/2.7/bin/2to3
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10992) [C++] Arrow Cmake/-march compile flags conflict with Intel compiler (icc/icpc)

2020-12-20 Thread Daniel Jewell (Jira)
Daniel Jewell created ARROW-10992:
-

 Summary: [C++] Arrow Cmake/-march compile flags conflict with 
Intel compiler (icc/icpc)
 Key: ARROW-10992
 URL: https://issues.apache.org/jira/browse/ARROW-10992
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 2.0.0
Reporter: Daniel Jewell


Compiler support for Intel ICC/ICPC was added in ARROW-10849

However, there are still a few cases where 
{code:java}
-march={code}
 

is being added to C/CXXFLAGS. While this kinda-sorta works, for icc/icpc there 
is different set of optimization options (I use " -xCORE-AVX2" typically). The 
list of optimization options is quite verbose (well beyond what just the flags 
show) and this really needs the eyes of someone who is an expert on just what 
exactly happens to the generated code with the specific flags. 

 

Specific warnings:  

 
{code:java}
icpc: command line warning #10121: overriding '-xCORE-AVX2' with 
'-march=skylake-avx512'
icpc: command line warning #10121: overriding '-xCORE-AVX2' with 
'-march=skylake-avx512'
icpc: command line warning #10006: ignoring unknown option '-mbmi2'
icpc: command line warning #10121: overriding '-xCORE-AVX2' with 
'-march=haswell'
icpc: command line warning #10121: overriding '-xCORE-AVX2' with 
'-march=haswell'


{code}
 

The haswell warning appears to come from the compilation of 
cpp/src/arrow/util/bpacking_avx2.cc

 

See: 
[https://software.intel.com/content/www/us/en/develop/articles/performance-tools-for-software-developers-intel-compiler-options-for-sse-generation-and-processor-specific-optimizations.html]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10991) [Rust]: Valgrind warnings in filter kernel

2020-12-20 Thread Jira
Jörn Horstmann created ARROW-10991:
--

 Summary: [Rust]: Valgrind warnings in filter kernel
 Key: ARROW-10991
 URL: https://issues.apache.org/jira/browse/ARROW-10991
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Jörn Horstmann


{code}
==42927== Conditional jump or move depends on uninitialised value(s)
==42927==at 0x384DE0: 
arrow::compute::kernels::filter::FilterContext::filter (in 
/home/jhorstmann/Source/github/apache/arrow/rust/target/release/deps/arrow-421088df9db9800e)
==42927==by 0x396394: arrow::compute::kernels::filter::filter (in 
/home/jhorstmann/Source/github/apache/arrow/rust/target/release/deps/arrow-421088df9db9800e)
==42927==by 0x34B786: core::ops::function::FnOnce::call_once (in 
/home/jhorstmann/Source/github/apache/arrow/rust/target/release/deps/arrow-421088df9db9800e)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10990) [Rust]: SIMD implementation of compare kernels reads out of bounds

2020-12-20 Thread Jira
Jörn Horstmann created ARROW-10990:
--

 Summary: [Rust]: SIMD implementation of compare kernels reads out 
of bounds
 Key: ARROW-10990
 URL: https://issues.apache.org/jira/browse/ARROW-10990
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Jörn Horstmann


The simd compare kernels use the following pattern to handle the remainder that 
is not a multiple of the number of vector lanes:

{code}
if rem > 0 {
let simd_left = T::load(left.value_slice(len - rem, lanes));
let simd_right = T::load(right.value_slice(len - rem, lanes));
let simd_result = op(simd_left, simd_right);
let rem_buffer_size = (rem as f32 / 8f32).ceil() as usize;
T::bitmask(_result, |b| {
result.extend_from_slice([0..rem_buffer_size]);
});
}
{code}

While this avoids writing into result out of bounds, it still reads from the 
{{left}} and {{right}} arrays at out of bounds indices and valgrind complains 
about that. I propose to rewrite the logic to use chunked iteration, with a 
scalar loop for the remainder, similar to the change for arithmetic kernels in 
ARROW-10914.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)