[GitHub] [arrow] wesm opened a new pull request #9280: ARROW-8928: [C++] Add microbenchmarks to help measure ExecBatchIterator overhead

GitBox Wed, 20 Jan 2021 16:57:06 -0800


wesm opened a new pull request #9280:
URL: https://github.com/apache/arrow/pull/9280



   These are only preliminary benchmarks but may help in examining 
microperformance overhead related to `ExecBatch` and its implementation (as a 
`vector<Datum>`).
   
   It may be desirable to devise an "array reference" data structure with few 
or no heap-allocated data structures and no `shared_ptr` interactions required 
to obtain memory addresses and other array information.
   
   On my test machine (macOS i9-9880H 2.3ghz), I see about 472 CPU cycles per 
field overhead for each ExecBatch produced. These benchmarks take a record 
batch with 1M rows and 10 columns/fields and iterates through the rows in 
smaller ExecBatches of the indicated sizes
   
   ```
   BM_ExecBatchIterator/256      8207877 ns      8204914 ns           81 
items_per_second=121.878/s
   BM_ExecBatchIterator/512      4421049 ns      4419958 ns          166 
items_per_second=226.247/s
   BM_ExecBatchIterator/1024     2056636 ns      2055369 ns          333 
items_per_second=486.531/s
   BM_ExecBatchIterator/2048     1056415 ns      1056264 ns          682 
items_per_second=946.733/s
   BM_ExecBatchIterator/4096      514276 ns       514136 ns         1246 
items_per_second=1.94501k/s
   BM_ExecBatchIterator/8192      262539 ns       262391 ns         2736 
items_per_second=3.81111k/s
   BM_ExecBatchIterator/16384     128995 ns       128974 ns         5398 
items_per_second=7.75351k/s
   BM_ExecBatchIterator/32768      64987 ns        64970 ns        10811 
items_per_second=15.3917k/s
   ```
   
   So for the 1024 case, it takes 2,055,367 ns to iterate through all 1024 
batches. That seems a bit expensive to me (?) — I suspect we can do better 
while also improving compilation times and reducing generated code size by 
using simpler data structures in our compute internals.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] wesm opened a new pull request #9280: ARROW-8928: [C++] Add microbenchmarks to help measure ExecBatchIterator overhead

Reply via email to