[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler

2019-05-16 Thread GitBox
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory 
profiler
URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-493317611
 
 
   @szha Thanks. Any feedback or suggestions are welcome.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler

2019-05-17 Thread GitBox
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory 
profiler
URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-493641565
 
 
   @anirudh2290 Thanks for your comment. Given below is my thought when 
developing this memory profiler.
   
   1. I have included the high-level design ideas as one of the examples 
`./example/gpu_memory_profiler/README.md`. Please let me know if you think more 
details are needed.
   2. I am not using the existing Python profiler API because the memory usage 
profiling is very different from the VTune-based CPU profiling or NVTX-based 
GPU profiling. For instance, it does not really make sense to say `pause` or 
`resume` in the memory profiling because you always care about the total memory 
consumption rather than some portion of your application. Furthermore, since 
the format of dump files (`.csv` vs. `.json`) and the visualization tools 
(`matplotlib` vs. chrome tracing) are completely different, I do not see a very 
decent way of integrating the GPU memory profiler into the current profiler API.
   3. Those environmental variables are added to give users fine-grained 
control on the path and name of the output files. Since they both have default 
values, users do not necessarily need to set them explicitly.
   4. To my best knowledge, we cannot avoid adding those APIs, because 
otherwise there is no such a *path* to propagate the name tags from the Python 
frontend to the C++ backend.
   5. Those build flags are needed to switch on and off the GPU memory 
profiler. They are different from the current `USE_PROFILER` build flag since 
they are targeting at the GPU memory consumption instead of performance. I have 
been doing experiments using the memory profiler and so far **do not see any 
performance degradation** (after all, in most cases the memory allocations are 
only done once for the entire training process). It is really hard to argue 
about the exact runtime overhead since the GPU memory profiling requires 
changes on both the Python frontend and the C++ backend. However, considering 
that there might be people who just want to work with a "clean" version of 
MXNet, I added those compilation flags and default them to off. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler

2019-05-18 Thread GitBox
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory 
profiler
URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-493714021
 
 
   @larroy That would be great. I took out the large `.gif` file from the pull 
request but I would really appreciate if we could have it in the MXNet channel. 
The `.gif` file was actually compressed and converted from the original `.mp4` 
video and hence is in low-resolution and does not give people the ability of 
pausing to learn more details. Please let us know if there is any way we could 
forward the original video to you.
   
   Also, thank you so much for your valuable feedbacks. I think I have 
addressed most of them


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler

2019-05-18 Thread GitBox
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory 
profiler
URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-493720472
 
 
   @eric-haibin-lin An excellent suggestion. I have opened the pull request at 
the `dmlc/web-data` repository (https://github.com/dmlc/web-data/pull/175). 
Please kindly accept it. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler

2019-05-18 Thread GitBox
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory 
profiler
URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-493720629
 
 
   @szha Wow. That's quick. Thanks a lot.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler

2019-05-23 Thread GitBox
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory 
profiler
URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-495481289
 
 
   Hi @anirudh2290 ,
   
   Thanks for your valuable feedback. Let me record a list of Discussion and 
TODO items:
   
   ## Discussion
   
   - [ ] Preprocessor Directives (1. 2.). 
   - [ ]
   - [ ] Imperative Support (4.). We should **NOT** drop the support for 
imperative, as this will significantly boost the amount of GPU memory allocated 
with the unknown tag. For instance, currently most optimizer states are 
allocated using the pure imperative approach. In fact, almost all the current 
optimizer implementations (e.g., SGD, Adam) initialize the optimizer states 
with `mx.nd.zero`. If we drop the imperative support, then all of those 
allocations will fall to the `unknown` category, which can be `1 GB` for large 
models.
   - [ ] Profiler API Integration (5. 6. 7. 8.). The GPU memory profiler is 
different from the existing profilers in many ways: (1) It is not using the 
`chrome://tracing` visualization backend, and the reason is because it needs to 
accept the users' input on defining the **keyword dictionaries for grouping 
storage tags** (also, I do not see a very good way of visualizing percentage 
using `chrome://tracing`). (2) Because it 
   
   ## TODO 
   
   - [ ] Add minimum working example in the `example` directory to show how 
`SETME`, analyzer, and plotter work.
   - [ ]


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler

2019-05-26 Thread GitBox
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory 
profiler
URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-496024566
 
 
   @szha The problem with this proposed approach is that it might work well for 
pure imperative programming paradigm, but **NOT** for symbolic graphs or Gluon.
   
   The reason is because in those approaches memory allocations do not happen 
immediately. They happen only when the computation graphs are materialized. 
Consider the same piece of code, but with the symbolic graph approach:
   
   ```Python
   def function1(sym1, sym2):
   with mx.profiler.scope('function1'):
   r1 = mx.sym.op1(sym1)
   r2 = mx.sym.op1(sym2)
   r3 = mx.sym.op2(r1, r2)
   return r3
   ```
   
   Because `mx.sym` only builds up the graph but does not do actual memory 
allocations or compute, by the time when the actual GPU memory allocations 
happen (i.e., when the computation graphs are materialized using `bind()` or 
`simple_bind()`), the profiler scope information is already lost and I do not 
think there is a very good way of recovering such information.
   
   Besides, with your proposed approach, all the existing MXNet implementations 
need to be modified with the `mx.profiler.scope('...')` injected if they want 
to use the memory profiler. Given that frontend programmers always need to 
provide extra annotations, why not put them all inside one file rather than 
letting them scattering around in the source code?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler

2019-05-26 Thread GitBox
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory 
profiler
URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-496053934
 
 
   @szha But this means that we need to change the `nnvm` codebase, since 
`nnvm::Graph` is the one that is responsible for managing those symbols. We 
also need to somehow propagating this scope information from the forward 
computation node to its corresponding backward one. Such things require changes 
that are completely different from what we are doing now.
   
   Let me summarize my primary concerns with the proposed solution.
   
   1. It requires **more code changes**, both in terms of **breadth & depth**.
 - In terms of breadth, **NONE** of the existing Python frontend API (e.g., 
[`LSTMCell`](https://github.com/apache/incubator-mxnet/blob/01cf29d0aaf8af7424fa82d8100cbb967860c712/python/mxnet/rnn/rnn_cell.py#L408))
 and applications (e.g., [**Sockeye**](https://github.com/awslabs/sockeye), 
[**MXNet 
Examples**](https://github.com/apache/incubator-mxnet/tree/master/example)) are 
having this profiler scope information, which means that they ALL need to be 
modified with such information added.
 - In terms of depth, because the GPU memory allocations are delayed for 
symbolic graphs until they are materialized. `nnvm::Graph` now needs to be 
aware of the profiler scope information, which means that we need to modify the 
*nnvm* codebase to achieve this.
   2. **Automatic aggregation** can be **undesirable**.
 - I do admit that automatic aggregation seems really cool, and I think 
this is the currently adopted approach by the *Tensorflow* GPU memory profiler. 
However, there are problems with this approach:
   - "*The GPU memory profiler tells me that the attention layers consume 
50% of the memory footprint. Now what?*" As we group symbols together into 
profiler scope, we lose the information of the memory consumption on each 
individual symbol. Although this information can be recovered by gradually 
refining the profiler scope, it also requires multiple run of the application 
to find the appropriate granularity, which can be really tedious for 
applications such as [**Sockeye**](https://github.com/awslabs/sockeye) that 
spend large amount of time in data preprocessing.
   - This is why I suggest we should instead go for the "*measure once, 
aggregate multiple times*" approach. We start with the big picture on where the 
most of the memory consumptions go, and then gradually focus ourselves on the 
exact symbol(s) that are causing trouble.
   
   To conclude, the table below shows the comparison
   
   |  | Proposed Solution | Current Solution |
   | --- | --- | --- |
   | Code Changes | All existing Application Codebase | `SETME.py` (usually 3~5 
lines of code) |
   | Profiler Scope | Fixed. Need to rerun the application for a more coarse or 
fine-grained profiler scope.   |  Flexible, keyword dictionary can be redefined 
multiple times in `SETME.py`. No need to rerun the application since the 
profiler log is present. |


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler

2019-05-27 Thread GitBox
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory 
profiler
URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-496244249
 
 
   @szha I am afraid that there is some serious misunderstanding going on. For 
all the MXNet benchmarks we have examined so far, none of them require https://www.codecogs.com/eqnedit.php?latex=O(\text{Number&space;of&space;Arrays})"
 target="_blank">https://latex.codecogs.com/gif.latex?O(\text{Number&space;of&space;Arrays})"
 title="O(\text{Number of Arrays})" height="15pt" /> lines of code changes 
to work.
   
   For the LSTM-based machine translation and BERT, we made no changes to the 
original source code. Simply adding the following lines to `SETME.py` suffice 
to complete the GPU memory profiling job.
   
   ```Python
   # LSTM-based Machine Translation
   layer_kw_dict = \
   {
   "RNN"   : [ "rnn" ],
   "Attention" : [ "att" ],
   "Embed" : [ "embed" ],
   "Output": [ "softmax", "logits" ]
   }
   ```
   
   ```Python
   # BERT
   layer_kw_dict = \
   {
   "Embed"   : [ "embed" ],
   "AttentionCell"   : [  "multiheadattentioncell",
 "dotproductattentioncell" ],
   "PositionwiseFFN" : [ "bertpositionwiseffn" ]
   }
   ```
   
   I do not see why you keep saying it requires changing "number of arrays" 
lines of code for the GPU memory profiler to work.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler

2020-02-21 Thread GitBox
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory 
profiler
URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-589827768
 
 
   Thanks to all for the valuable feedbacks. I opened [**another 
PR**](https://github.com/apache/incubator-mxnet/pull/17376) to address them. 
Closing this PR now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services