Re: Arrow compute/dataset design doc missing

2022-07-06 Thread Wes McKinney
I definitely think that adding runtime SIMD dispatching for arithmetic (and using xsimd to write generic kernels that can be cross-compiled for different SIMD targets, i.e. AVX2, AVX512, NEON) functions is a good idea and hopefully it will be pretty low hanging fruit (~a day or two of work) to

Re: Arrow compute/dataset design doc missing

2022-06-09 Thread Sasha Krassovsky
To add on to Weston’s response, the only SIMD that will ever be generated for the kernels by compilers at the moment is with SSE4.2, it will not generate AVX2 as we have not set up the compiler flags to do that. Also the way the code is written doesn’t seem super easy to vectorize for the

Re: Arrow compute/dataset design doc missing

2022-06-09 Thread Shawn Yang
I see, thanks. I'll do more tests and dive into more arrow compute code. Sent from my iPhone > On Jun 9, 2022, at 5:30 PM, Weston Pace wrote: > >  >> >> Hi, do you guys know which functions support vectorized SIMD in arrow >> compute? > > I don't know that anyone has done a fully

Re: Arrow compute/dataset design doc missing

2022-06-09 Thread Weston Pace
> Hi, do you guys know which functions support vectorized SIMD in arrow compute? I don't know that anyone has done a fully systematic analysis of which kernels support and do not support SIMD at the moment. The kernels are still in flux. There is an active effort to reduce overhead[1] which is

Re: Arrow compute/dataset design doc missing

2022-06-09 Thread Shawn Yang
Hi, do you guys know which functions support vectorized SIMD in arrow compute? After a quick look as arrow compute cpp code, I only found very little functions support vectorized SIMD: ● bloom filter: avx2 ● key compare: avx2 ● key hash: avx2 ● key map: avx2 Does scalar operation support

Re: Arrow compute/dataset design doc missing

2022-05-25 Thread Shawn Yang
I see, the key for multiple loop is to ensure the data can be hold in l2 cache, so that later calculation can process this batch without reading from the main memory, and we can record the exec stats for every batch , and do better local task scheduling based on those stats. Thanks a lot.

Re: Arrow compute/dataset design doc missing

2022-05-24 Thread Weston Pace
There are a few levels of loops. Two at the moment and three in the future. Some are fused and some are not. What we have right now is early stages, is not ideal, and there are people investigating and working on improvements. I can speak a little bit about where we want to go. An example may

Re: Arrow compute/dataset design doc missing

2022-05-24 Thread Shawn Yang
Hi Ion, thank you for your reply which recaps the history of arrow compute. Those links are very valuable for me to understand arrow compute internal. I took a quick for those documents and will take a deeper into those later. I have another question, does arrow compute supports loop fusion,

Re: Arrow compute/dataset design doc missing

2022-05-23 Thread Ian Cook
Hi Shawn, In March of 2021, when major work on the C++ query execution machinery in Arrow was beginning, Wes sent a message [1] to the dev list and linked to a doc [2] with some details about the planned design. A few months later Neal sent an update [3] about this work. However those documents

Arrow compute/dataset design doc missing

2022-05-23 Thread Shawn Yang
Hi, I'm considering using arrow compute as an execution kernel for our distributed dataframe framework. I already read the great doc: https://arrow.apache.org/docs/cpp/compute.html, but it is an usage doc. Is there any design doc, inside introduction or benchmarks for arrow compute so I can