Hi Ivan,
Inside of Acero, we can think of allocations as coming in two classes: 
- "Big” allocations, which go through `MemoryPool`, using `Buffer`. These are 
used for representing columns of input data and hash tables. 
- “Small” allocations, which are usually small, local STL containers like 
std::vector and std::unordered_map. These go through `operator new` (in more 
detail: they are templated to use `std::allocator::allocate` which ends up 
calling `operator new`). 

You’ll need to do different things to track these two classes. 

For big allocations, you can make your own implementation of the MemoryPool 
interface which performs all of the statistics gathering you’d need to do (you 
can see `LoggingMemoryPool` as an example, which just prints to stdout every 
time there is an allocation). You can then pass this memory pool in via the 
`ExecPlan`’s `ExecContext`. 

For small allocations, I think you should just be able to implement your own 
`operator new` and `operator delete` inside of your own benchmark file. This 
will replace the default `operator new` and `operator delete` and let you 
gather statistics. One note is that you’ll have to call `malloc` and `free` in 
your implementations as the default `operator new` and `operator delete` will 
be inaccessible. 

Sasha

> On Jul 6, 2022, at 12:42 PM, Ivan Chau <ivan.m.c...@gmail.com> wrote:
> 
> Hi all,
> 
> 
> My name is Ivan -- some of you may know me from some of my contributions
> benchmarking node performances on Acero. Thank you for all the help so far!
> 
> 
> 
> In addition to my runtime benchmarking, I am interested in pursuing some
> method of memory profiling to further assess our streaming capabilities.
> I’ve taken a short look at Google Benchmarks’ memory profiling, of which I
> could really find https://github.com/google/benchmark/issues/1217, as the
> most salient example usage. It allows you to plug in your own Memory
> Manager, and specify what to return at the beginning and end of every
> benchmark.
> 
> 
> 
> To my understanding, we would need to rework our existing memory pool /
> execution context to aggregate the number_of_allocs and bytes_used that are
> reported by Google Benchmarks, but I’d imagine there could be better tools
> for the job which might yield more interesting information (line by line
> analysis, time plots, etc., peak stats and other metrics, etc.)
> 
> 
> 
> Do you have any advice on what direction I should take for this or know
> someone who does? I’ve run some one-off tests using valgrind but I am
> wondering if I could help implement something more general (and helpful)
> for the cpp arrow codebase.
> 
> 
> 
> Best,
> 
> Ivan

Reply via email to