[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-493317611 @szha Thanks. Any feedback or suggestions are welcome. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-493641565 @anirudh2290 Thanks for your comment. Given below is my thought when developing this memory profiler. 1. I have included the high-level design ideas as one of the examples `./example/gpu_memory_profiler/README.md`. Please let me know if you think more details are needed. 2. I am not using the existing Python profiler API because the memory usage profiling is very different from the VTune-based CPU profiling or NVTX-based GPU profiling. For instance, it does not really make sense to say `pause` or `resume` in the memory profiling because you always care about the total memory consumption rather than some portion of your application. Furthermore, since the format of dump files (`.csv` vs. `.json`) and the visualization tools (`matplotlib` vs. chrome tracing) are completely different, I do not see a very decent way of integrating the GPU memory profiler into the current profiler API. 3. Those environmental variables are added to give users fine-grained control on the path and name of the output files. Since they both have default values, users do not necessarily need to set them explicitly. 4. To my best knowledge, we cannot avoid adding those APIs, because otherwise there is no such a *path* to propagate the name tags from the Python frontend to the C++ backend. 5. Those build flags are needed to switch on and off the GPU memory profiler. They are different from the current `USE_PROFILER` build flag since they are targeting at the GPU memory consumption instead of performance. I have been doing experiments using the memory profiler and so far **do not see any performance degradation** (after all, in most cases the memory allocations are only done once for the entire training process). It is really hard to argue about the exact runtime overhead since the GPU memory profiling requires changes on both the Python frontend and the C++ backend. However, considering that there might be people who just want to work with a "clean" version of MXNet, I added those compilation flags and default them to off. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-493714021 @larroy That would be great. I took out the large `.gif` file from the pull request but I would really appreciate if we could have it in the MXNet channel. The `.gif` file was actually compressed and converted from the original `.mp4` video and hence is in low-resolution and does not give people the ability of pausing to learn more details. Please let us know if there is any way we could forward the original video to you. Also, thank you so much for your valuable feedbacks. I think I have addressed most of them This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-493720472 @eric-haibin-lin An excellent suggestion. I have opened the pull request at the `dmlc/web-data` repository (https://github.com/dmlc/web-data/pull/175). Please kindly accept it. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-493720629 @szha Wow. That's quick. Thanks a lot. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-495481289 Hi @anirudh2290 , Thanks for your valuable feedback. Let me record a list of Discussion and TODO items: ## Discussion - [ ] Preprocessor Directives (1. 2.). - [ ] - [ ] Imperative Support (4.). We should **NOT** drop the support for imperative, as this will significantly boost the amount of GPU memory allocated with the unknown tag. For instance, currently most optimizer states are allocated using the pure imperative approach. In fact, almost all the current optimizer implementations (e.g., SGD, Adam) initialize the optimizer states with `mx.nd.zero`. If we drop the imperative support, then all of those allocations will fall to the `unknown` category, which can be `1 GB` for large models. - [ ] Profiler API Integration (5. 6. 7. 8.). The GPU memory profiler is different from the existing profilers in many ways: (1) It is not using the `chrome://tracing` visualization backend, and the reason is because it needs to accept the users' input on defining the **keyword dictionaries for grouping storage tags** (also, I do not see a very good way of visualizing percentage using `chrome://tracing`). (2) Because it ## TODO - [ ] Add minimum working example in the `example` directory to show how `SETME`, analyzer, and plotter work. - [ ] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-496024566 @szha The problem with this proposed approach is that it might work well for pure imperative programming paradigm, but **NOT** for symbolic graphs or Gluon. The reason is because in those approaches memory allocations do not happen immediately. They happen only when the computation graphs are materialized. Consider the same piece of code, but with the symbolic graph approach: ```Python def function1(sym1, sym2): with mx.profiler.scope('function1'): r1 = mx.sym.op1(sym1) r2 = mx.sym.op1(sym2) r3 = mx.sym.op2(r1, r2) return r3 ``` Because `mx.sym` only builds up the graph but does not do actual memory allocations or compute, by the time when the actual GPU memory allocations happen (i.e., when the computation graphs are materialized using `bind()` or `simple_bind()`), the profiler scope information is already lost and I do not think there is a very good way of recovering such information. Besides, with your proposed approach, all the existing MXNet implementations need to be modified with the `mx.profiler.scope('...')` injected if they want to use the memory profiler. Given that frontend programmers always need to provide extra annotations, why not put them all inside one file rather than letting them scattering around in the source code? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-496053934 @szha But this means that we need to change the `nnvm` codebase, since `nnvm::Graph` is the one that is responsible for managing those symbols. We also need to somehow propagating this scope information from the forward computation node to its corresponding backward one. Such things require changes that are completely different from what we are doing now. Let me summarize my primary concerns with the proposed solution. 1. It requires **more code changes**, both in terms of **breadth & depth**. - In terms of breadth, **NONE** of the existing Python frontend API (e.g., [`LSTMCell`](https://github.com/apache/incubator-mxnet/blob/01cf29d0aaf8af7424fa82d8100cbb967860c712/python/mxnet/rnn/rnn_cell.py#L408)) and applications (e.g., [**Sockeye**](https://github.com/awslabs/sockeye), [**MXNet Examples**](https://github.com/apache/incubator-mxnet/tree/master/example)) are having this profiler scope information, which means that they ALL need to be modified with such information added. - In terms of depth, because the GPU memory allocations are delayed for symbolic graphs until they are materialized. `nnvm::Graph` now needs to be aware of the profiler scope information, which means that we need to modify the *nnvm* codebase to achieve this. 2. **Automatic aggregation** can be **undesirable**. - I do admit that automatic aggregation seems really cool, and I think this is the currently adopted approach by the *Tensorflow* GPU memory profiler. However, there are problems with this approach: - "*The GPU memory profiler tells me that the attention layers consume 50% of the memory footprint. Now what?*" As we group symbols together into profiler scope, we lose the information of the memory consumption on each individual symbol. Although this information can be recovered by gradually refining the profiler scope, it also requires multiple run of the application to find the appropriate granularity, which can be really tedious for applications such as [**Sockeye**](https://github.com/awslabs/sockeye) that spend large amount of time in data preprocessing. - This is why I suggest we should instead go for the "*measure once, aggregate multiple times*" approach. We start with the big picture on where the most of the memory consumptions go, and then gradually focus ourselves on the exact symbol(s) that are causing trouble. To conclude, the table below shows the comparison | | Proposed Solution | Current Solution | | --- | --- | --- | | Code Changes | All existing Application Codebase | `SETME.py` (usually 3~5 lines of code) | | Profiler Scope | Fixed. Need to rerun the application for a more coarse or fine-grained profiler scope. | Flexible, keyword dictionary can be redefined multiple times in `SETME.py`. No need to rerun the application since the profiler log is present. | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-496244249 @szha I am afraid that there is some serious misunderstanding going on. For all the MXNet benchmarks we have examined so far, none of them require https://www.codecogs.com/eqnedit.php?latex=O(\text{Number&space;of&space;Arrays})" target="_blank">https://latex.codecogs.com/gif.latex?O(\text{Number&space;of&space;Arrays})" title="O(\text{Number of Arrays})" height="15pt" /> lines of code changes to work. For the LSTM-based machine translation and BERT, we made no changes to the original source code. Simply adding the following lines to `SETME.py` suffice to complete the GPU memory profiling job. ```Python # LSTM-based Machine Translation layer_kw_dict = \ { "RNN" : [ "rnn" ], "Attention" : [ "att" ], "Embed" : [ "embed" ], "Output": [ "softmax", "logits" ] } ``` ```Python # BERT layer_kw_dict = \ { "Embed" : [ "embed" ], "AttentionCell" : [ "multiheadattentioncell", "dotproductattentioncell" ], "PositionwiseFFN" : [ "bertpositionwiseffn" ] } ``` I do not see why you keep saying it requires changing "number of arrays" lines of code for the GPU memory profiler to work. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler
ArmageddonKnight commented on issue #14973: [MXNET-1404] Added the GPU memory profiler URL: https://github.com/apache/incubator-mxnet/pull/14973#issuecomment-589827768 Thanks to all for the valuable feedbacks. I opened [**another PR**](https://github.com/apache/incubator-mxnet/pull/17376) to address them. Closing this PR now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services