Hi Ivan, Earlier we did add some instructions to profile memory allocations on the memory pools ("big allocations" as described by Sasha above). Docs are here [1]. If you do come up with some other method, it would be great to document in an adjacent section :)
Another suggestion I heard a while ago was to use OpenTelemetry to collect memory usage / allocation metrics. I'm not super close to those efforts, but I believe there's already been some work to integrate OTel and Acero. I recorded the issue here [2]. I hope that's helpful info! Best, Will Jones [1] https://arrow.apache.org/docs/cpp/memory.html#memory-profiling [2] https://issues.apache.org/jira/browse/ARROW-15512 On Wed, Jul 6, 2022 at 1:17 PM Sasha Krassovsky <krassovskysa...@gmail.com> wrote: > Hi Ivan, > Inside of Acero, we can think of allocations as coming in two classes: > - "Big” allocations, which go through `MemoryPool`, using `Buffer`. These > are used for representing columns of input data and hash tables. > - “Small” allocations, which are usually small, local STL containers like > std::vector and std::unordered_map. These go through `operator new` (in > more detail: they are templated to use `std::allocator::allocate` which > ends up calling `operator new`). > > You’ll need to do different things to track these two classes. > > For big allocations, you can make your own implementation of the > MemoryPool interface which performs all of the statistics gathering you’d > need to do (you can see `LoggingMemoryPool` as an example, which just > prints to stdout every time there is an allocation). You can then pass this > memory pool in via the `ExecPlan`’s `ExecContext`. > > For small allocations, I think you should just be able to implement your > own `operator new` and `operator delete` inside of your own benchmark file. > This will replace the default `operator new` and `operator delete` and let > you gather statistics. One note is that you’ll have to call `malloc` and > `free` in your implementations as the default `operator new` and `operator > delete` will be inaccessible. > > Sasha > > > On Jul 6, 2022, at 12:42 PM, Ivan Chau <ivan.m.c...@gmail.com> wrote: > > > > Hi all, > > > > > > My name is Ivan -- some of you may know me from some of my contributions > > benchmarking node performances on Acero. Thank you for all the help so > far! > > > > > > > > In addition to my runtime benchmarking, I am interested in pursuing some > > method of memory profiling to further assess our streaming capabilities. > > I’ve taken a short look at Google Benchmarks’ memory profiling, of which > I > > could really find https://github.com/google/benchmark/issues/1217, as > the > > most salient example usage. It allows you to plug in your own Memory > > Manager, and specify what to return at the beginning and end of every > > benchmark. > > > > > > > > To my understanding, we would need to rework our existing memory pool / > > execution context to aggregate the number_of_allocs and bytes_used that > are > > reported by Google Benchmarks, but I’d imagine there could be better > tools > > for the job which might yield more interesting information (line by line > > analysis, time plots, etc., peak stats and other metrics, etc.) > > > > > > > > Do you have any advice on what direction I should take for this or know > > someone who does? I’ve run some one-off tests using valgrind but I am > > wondering if I could help implement something more general (and helpful) > > for the cpp arrow codebase. > > > > > > > > Best, > > > > Ivan > >