[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-589927989 Hi @TaoLv I reverted my last commit " It makes the code too bristle and also involves very complicate logic to check memory allocation at runtime. Here are the main reasons: (1) MKL dropout support is currently not complete. It does not work if the input data type is smaller than int32 and it does not support broadcast option (when the option axes is specified). This limitation enforces a check at runtime which is not possible in the InferShape function e.g. In this function, I will need to check if the dtype is greater than int32 in order to use a different shape for MKL Dropout. https://github.com/apache/incubator-mxnet/pull/16735/files#diff-74c4dc433970c5df31a5e2c4b57c8d71R127 (2) Having different Dropout engine at runtime (based on data type and ) may cause inconsistency in the mixed precision case. Introducing another difference in mask memory allocation complicates this even further. I think we should focus on enhancing MKL Dropout so that it (1) supports all the different cases as non MKL dropout (2) supports bit-mask. Please let me know what you think. Thanks! Lin This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-588491750 Given your concern about the performance degradation in the case of MKL dropout, I have disabled this feature when MKL dropout is used. Please review the PR again and let me know if you think this is good to go. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-586737774 > > Why does it increase memory load? > > If there are N elements, per the Bernoulli distribution generation in VSL, we still need to allocate memory and write `N*4` bytes to it. To generate bit mask, we need load the `N*4` bytes back and write `N/8` bytes with bits. The memory for bit-mask is not extra memory. `N*sizeof(DType)` was used in the master branch: https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/dropout.cc#L124 So for the MKL dropout case, master branch uses memory `N*4` + `N*sizeof(DType)` vs. this PR `N*4` + `N/8`. This memory reduction is verified through the MXNet profiler results reported in the PR description section. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-586650517 > > Does the slow down come from more computation in the new algorithm or the sub-optimal implementation? > > The new implementation increases both memory load and additional bit-wise operations. So performance slow down is expected. The memory load is actually reduced even in the case of MKL, right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-586519348 > Does the slow down come from more computation in the new algorithm or the sub-optimal implementation? @PatricZhao The slowdown comes from extra computation in the new algorithm when dropout uses MKL. MKL already does the masking but stores each mask as integer. The new algorithm simply repackage this int32 based mask into bit-based mask and therefore introduced extra runtime. In the ideal case, it would be to enhance MKL dropout to store mask using bits. But it requires modification of the VSL APIs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-586116890 > Does the `avg_time_Dropout` include backward time? @apeforest Yes, it includes backward time as my `run_backward` is set to True This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-586062120 @roywei Using the test script in https://github.com/apache/incubator-mxnet/pull/13896 Build | runtime (before) | runtime (after) ---|---|--- CPU w/ MKL | 262 ms ± 1.2 ms | 337 ms ± 12.5 ms Using python timer to measure CPU performance with MKL: This PR: ``` [{'Dropout': [{'avg_time_Dropout': 1.1714265774935484, 'p50_time_Dropout': 1.1715246364474297, 'p90_time_Dropout': 1.190436165779829, 'p99_time_Dropout': 1.2154309218749404, 'inputs': {'data': (1024, 1024)}}]}] ``` Master: ``` [{'Dropout': [{'avg_time_Dropout': 0.6394564639776945, 'p50_time_Dropout': 0.6996351294219494, 'p90_time_Dropout': 1.045508868992329, 'p99_time_Dropout': 1.59036863129586, 'inputs': {'data': (1024, 1024)}}]}] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-584867739 @eric-haibin-lin I posted the latest performance results in the PR description. Please review this PR again and let me know if it's good to ship. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-584766500 @TaoLv Thanks for your review and suggestion. Using blocking does help: ``` [{'Dropout': [{'avg_time_Dropout': 2.9556092573329806, 'p50_time_Dropout': 2.9536145739257336, 'p90_time_Dropout': 2.9735330026596785, 'p99_time_Dropout': 3.0410749791190033, 'inputs': {'data': (1024, 1024)}}]}] ``` Here is my performance script: ``` #!/usr/bin/python import mxnet as mx from mxnet import nd from benchmark.opperf.utils.benchmark_utils import run_performance_test mx.random.seed(17) res = run_performance_test(nd.Dropout, run_backward=True, dtype='float32', ctx=mx.cpu(), inputs=[ {"data" : (1024, 1024)} ], warmup=20, runs=100, profiler='python') print(res) ``` I am not in favor of adding an option. It exposes internal implementation (e.g. using bit mask or not) to users and is also counterintuitive (why not use bit-mask). If sacrificing performance (to some extent) can help improve usability, I think we need to consider the trade off. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-584486168 @TaoLv Could you please help to review the change to MKL Dropout part? Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-584304458 @eric-haibin-lin The big slow down was due to the use of dynamic loop scheduling in omp when assigning a chunk of 8 to each thread. After I changed it to static scheduling, the performance is improved, but still slower than master due to the extra bit operations needed in the kernel and the fact that we need to assign a batch of 8 when parallelizing the for loop. @TaoLv Any other suggestion to speed up the omp section? Thanks. This PR: ``` [{'Dropout': [{'avg_time_Dropout': 3.4434730419889092, 'p50_time_Dropout': 3.2913365866988897, 'p90_time_Dropout': 4.017965029925109, 'p99_time_Dropout': 4.7498174966312945, 'inputs': {'data': (1024, 1024)}}]}] ``` Master: ``` [{'Dropout': [{'avg_time_Dropout': 2.1684831893071532, 'p50_time_Dropout': 1.962667447514832, 'p90_time_Dropout': 2.8079885290935636, 'p99_time_Dropout': 2.967591730412097, 'inputs': {'data': (1024, 1024)}}]}] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-583539243 > so we get 15x slow down? Yes, it seems so. Not sure if it is because the extra bit-operation logic we introduced or the modification to omp parallel clause. Need more investigation. @TaoLv Can MKL VSL support bit-wise mask so we don't have to do the extra conversion which seems redundant in the MKL case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-583247325 @TaoLv With MKL Dropout: [✖ CUDA, ✖ CUDNN, ✖ NCCL, ✖ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✖ BLAS_OPEN, ✖ BLAS_ATLAS, ✔ BLAS_MKL, ✖ BLAS_APPLE, ✔ LAPACK, ✔ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✔ DEBUG, ✖ TVM_OP] This PR: ``` [{'Dropout': [{'avg_time_Dropout': 28.989554492291063, 'p50_time_Dropout': 30.037737917155027, 'p90_time_Dropout': 30.27475003618747, 'p99_time_Dropout': 30.33297127345577, 'inputs': {'data': (1024, 1024)}}]}] [{'Dropout': [{'avg_time_forward_Dropout': 25.1095, 'avg_time_backward_Dropout': 5.6493, 'max_storage_mem_alloc_cpu/0': 4259.8398, 'inputs': {'data': (1024, 1024)}}]}] ``` Master: ``` [{'Dropout': [{'avg_time_Dropout': 2.1497764391824603, 'p50_time_Dropout': 1.9523266237229109, 'p90_time_Dropout': 2.6631804881617427, 'p99_time_Dropout': 2.8045210894197234, 'inputs': {'data': (1024, 1024)}}]}] [{'Dropout': [{'avg_time_forward_Dropout': 0.7749, 'avg_time_backward_Dropout': 0.3057, 'max_storage_mem_alloc_cpu/0': 6291.4561, 'inputs': {'data': (1024, 1024)}}]}] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-581565582 Update: Merged https://github.com/apache/incubator-mxnet/pull/17403 to enable unit tests of dropout on CPU and GPU w/o cuDNN. I will build MXNet with MKL and run the perf test again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-575835293 @TaoLv I had problem building/running mxnet with USE_BLAS=mkl. I will appreciate if you could give a hand: https://github.com/apache/incubator-mxnet/issues/17366 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator
apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-572830100 @TaoLv Thanks for your review. I ran operator profining using benchmark.opperf.utils.benchmark_utils.run_performance_test. The result shows speed up in forward but some degradation in backward pass. w/ this change: ``` [{'Dropout': [{'avg_time_forward_Dropout': 1.3266, 'max_storage_mem_alloc_cpu/0': 4259.8398, 'avg_time_backward_Dropout': 0.2682, 'inputs': {'data': (1024, 1024), 'p': 0.5}}]}] ``` w/o this change: ``` [{'Dropout': [{'avg_time_forward_Dropout': 1.7864, 'max_storage_mem_alloc_cpu/0': 6291.4561, 'avg_time_backward_Dropout': 0.1836, 'inputs': {'data': (1024, 1024), 'p': 0.5}}]}] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services