wesm opened a new pull request #9280: URL: https://github.com/apache/arrow/pull/9280
These are only preliminary benchmarks but may help in examining microperformance overhead related to `ExecBatch` and its implementation (as a `vector<Datum>`). It may be desirable to devise an "array reference" data structure with few or no heap-allocated data structures and no `shared_ptr` interactions required to obtain memory addresses and other array information. On my test machine (macOS i9-9880H 2.3ghz), I see about 472 CPU cycles per field overhead for each ExecBatch produced. These benchmarks take a record batch with 1M rows and 10 columns/fields and iterates through the rows in smaller ExecBatches of the indicated sizes ``` BM_ExecBatchIterator/256 8207877 ns 8204914 ns 81 items_per_second=121.878/s BM_ExecBatchIterator/512 4421049 ns 4419958 ns 166 items_per_second=226.247/s BM_ExecBatchIterator/1024 2056636 ns 2055369 ns 333 items_per_second=486.531/s BM_ExecBatchIterator/2048 1056415 ns 1056264 ns 682 items_per_second=946.733/s BM_ExecBatchIterator/4096 514276 ns 514136 ns 1246 items_per_second=1.94501k/s BM_ExecBatchIterator/8192 262539 ns 262391 ns 2736 items_per_second=3.81111k/s BM_ExecBatchIterator/16384 128995 ns 128974 ns 5398 items_per_second=7.75351k/s BM_ExecBatchIterator/32768 64987 ns 64970 ns 10811 items_per_second=15.3917k/s ``` So for the 1024 case, it takes 2,055,367 ns to iterate through all 1024 batches. That seems a bit expensive to me (?) — I suspect we can do better while also improving compilation times and reducing generated code size by using simpler data structures in our compute internals. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
