[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler

GitBox Fri, 17 May 2019 19:27:54 -0700

ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory 
profiler
URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-493641565
 
 
   @anirudh2290 Thanks for your comment. Given below is my thought when 
developing this memory profiler.
   
   1. I have included the high-level design ideas as one of the examples 
`./example/gpu_memory_profiler/README.md`. Please let me know if you think more 
details are needed.
   2. I am not using the existing Python profiler API because the memory usage 
profiling is very different from the VTune-based CPU profiling or NVTX-based 
GPU profiling. For instance, it does not really make sense to say `pause` or 
`resume` in the memory profiling because you always care about the total memory 
consumption rather than some portion of your application. Furthermore, since 
the format of dump files (`.csv` vs. `.json`) and the visualization tools 
(`matplotlib` vs. chrome tracing) are completely different, I do not see a very 
decent way of integrating the GPU memory profiler into the current profiler API.
   3. Those environmental variables are added to give users fine-grained 
control on the path and name of the output files. Since they both have default 
values, users do not necessarily need to set them explicitly.
   4. To my best knowledge, we cannot avoid adding those APIs, because 
otherwise there is no such a *path* to propagate the name tags from the Python 
frontend to the C++ backend.
   5. Those build flags are needed to switch on and off the GPU memory 
profiler. They are different from the current `USE_PROFILER` build flag since 
they are targeting at the GPU memory consumption instead of performance. I have 
been doing experiments using the memory profiler and so far **do not see any 
performance degradation** (after all, in most cases the memory allocations are 
only done once for the entire training process). It is really hard to argue 
about the exact runtime overhead since the GPU memory profiling requires 
changes on both the Python frontend and the C++ backend. However, considering 
that there might be people who just want to work with a "clean" version of 
MXNet, I added those compilation flags and default them to off.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler

Reply via email to