[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-02-21 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-589927989
 
 
   Hi @TaoLv I reverted my last commit "
   It makes the code too bristle and also involves very complicate logic to 
check memory allocation at runtime. Here are the main reasons:
   
   (1) MKL dropout support is currently not complete. It does not work if the 
input data type is smaller than int32 and it does not support broadcast option 
(when the option axes is specified). This limitation enforces a check at 
runtime which is not possible in the InferShape function
   
   e.g. In this function, I will need to check if the dtype is greater than 
int32 in order to use a different shape for MKL Dropout. 
   
https://github.com/apache/incubator-mxnet/pull/16735/files#diff-74c4dc433970c5df31a5e2c4b57c8d71R127
   
   (2) Having different Dropout engine at runtime (based on data type and ) may 
cause inconsistency in the mixed precision case. Introducing another difference 
in mask memory allocation complicates this even further.
   
   I think we should focus on enhancing MKL Dropout so that it (1) supports all 
the different cases as non MKL dropout (2) supports bit-mask.
   
   Please let me know what you think. Thanks!
   
   Lin 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-02-19 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-588491750
 
 
   Given your concern about the performance degradation in the case of MKL 
dropout, I have disabled this feature when MKL dropout is used. Please review 
the PR again and let me know if you think this is good to go. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-02-16 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-586737774
 
 
   > > Why does it increase memory load?
   > 
   > If there are N elements, per the Bernoulli distribution generation in VSL, 
we still need to allocate memory and write `N*4` bytes to it. To generate bit 
mask, we need load the `N*4` bytes back and write `N/8` bytes with bits.
   
   The memory for bit-mask is not extra memory. `N*sizeof(DType)` was used in 
the master branch: 
https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/dropout.cc#L124
   
   So for the MKL dropout case, 
   master branch uses memory `N*4` + `N*sizeof(DType)` vs. this PR `N*4` + 
`N/8`. This memory reduction is verified through the MXNet profiler results 
reported in the PR description section. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-02-15 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-586650517
 
 
   > > Does the slow down come from more computation in the new algorithm or 
the sub-optimal implementation?
   > 
   > The new implementation increases both memory load and additional bit-wise 
operations. So performance slow down is expected.
   
   The memory load is actually reduced even in the case of MKL, right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-02-14 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-586519348
 
 
   > Does the slow down come from more computation in the new algorithm or the 
sub-optimal implementation?
   @PatricZhao The slowdown comes from extra computation in the new algorithm 
when dropout uses MKL. MKL already does the masking but stores each mask as 
integer. The new algorithm simply repackage this int32 based mask into 
bit-based mask and therefore introduced extra runtime. In the ideal case, it 
would be to enhance MKL dropout to store mask using bits. But it requires 
modification of the VSL APIs.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-02-13 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-586116890
 
 
   > Does the `avg_time_Dropout` include backward time? @apeforest
   
   Yes, it includes backward time as my `run_backward` is set to True


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-02-13 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-586062120
 
 
   @roywei Using the test script in 
https://github.com/apache/incubator-mxnet/pull/13896
   Build | runtime (before) | runtime (after) 
   ---|---|---
   CPU w/ MKL | 262 ms ± 1.2 ms | 337 ms ± 12.5 ms
   
   
   Using python timer to measure CPU performance with MKL:
   
   This PR:
   
   ```
   [{'Dropout': [{'avg_time_Dropout': 1.1714265774935484, 'p50_time_Dropout': 
1.1715246364474297, 'p90_time_Dropout': 1.190436165779829, 'p99_time_Dropout': 
1.2154309218749404, 'inputs': {'data': (1024, 1024)}}]}]
   ```
   
   Master:
   ```
   [{'Dropout': [{'avg_time_Dropout': 0.6394564639776945, 'p50_time_Dropout': 
0.6996351294219494, 'p90_time_Dropout': 1.045508868992329, 'p99_time_Dropout': 
1.59036863129586, 'inputs': {'data': (1024, 1024)}}]}]
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-02-11 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-584867739
 
 
   @eric-haibin-lin I posted the latest performance results in the PR 
description. Please review this PR again and let me know if it's good to ship.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-02-11 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-584766500
 
 
   @TaoLv Thanks for your review and suggestion. 
   
   Using blocking does help:
   
   ```
   [{'Dropout': [{'avg_time_Dropout': 2.9556092573329806, 'p50_time_Dropout': 
2.9536145739257336, 'p90_time_Dropout': 2.9735330026596785, 'p99_time_Dropout': 
3.0410749791190033, 'inputs': {'data': (1024, 1024)}}]}]
   ```
   
   Here is my performance script:
   
   ```
   #!/usr/bin/python
   import mxnet as mx
   from mxnet import nd
   
   from benchmark.opperf.utils.benchmark_utils import run_performance_test
   
   mx.random.seed(17)
   res = run_performance_test(nd.Dropout, run_backward=True, dtype='float32', 
ctx=mx.cpu(),
  inputs=[
  {"data" : (1024, 1024)}
  ],
  warmup=20, runs=100, profiler='python')
   print(res)
   ```
   
   I am not in favor of adding an option. It exposes internal implementation 
(e.g. using bit mask or not) to users and is also counterintuitive (why not use 
bit-mask). If sacrificing performance (to some extent) can help improve 
usability, I think we need to consider the trade off.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-02-10 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-584486168
 
 
   @TaoLv Could you please help to review the change to MKL Dropout part? Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-02-10 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-584304458
 
 
   @eric-haibin-lin The big slow down was due to the use of dynamic loop 
scheduling in omp when assigning a chunk of 8 to each thread. After I changed 
it to static scheduling, the performance is improved, but still slower than 
master due to the extra bit operations needed in the kernel and the fact that 
we need to assign a batch of 8 when parallelizing the for loop. @TaoLv Any 
other suggestion to speed up the omp section? Thanks.
   
   This PR:
   ```
   [{'Dropout': [{'avg_time_Dropout': 3.4434730419889092, 'p50_time_Dropout': 
3.2913365866988897, 'p90_time_Dropout': 4.017965029925109, 'p99_time_Dropout': 
4.7498174966312945, 'inputs': {'data': (1024, 1024)}}]}]
   ```
   
   Master:
   ```
   [{'Dropout': [{'avg_time_Dropout': 2.1684831893071532, 'p50_time_Dropout': 
1.962667447514832, 'p90_time_Dropout': 2.8079885290935636, 'p99_time_Dropout': 
2.967591730412097, 'inputs': {'data': (1024, 1024)}}]}]
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-02-07 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-583539243
 
 
   > so we get 15x slow down?
   
   Yes, it seems so. Not sure if it is because the extra bit-operation logic we 
introduced or the modification to omp parallel clause. Need more investigation. 
@TaoLv Can MKL VSL support bit-wise mask so we don't have to do the extra 
conversion which seems redundant in the MKL case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-02-06 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-583247325
 
 
   @TaoLv With MKL Dropout:
   [✖ CUDA, ✖ CUDNN, ✖ NCCL, ✖ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ 
CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ 
OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✖ BLAS_OPEN, ✖ BLAS_ATLAS, ✔ BLAS_MKL, ✖ 
BLAS_APPLE, ✔ LAPACK, ✔ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, 
✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✔ DEBUG, ✖ TVM_OP]
   
   This PR:
   ```
   [{'Dropout': [{'avg_time_Dropout': 28.989554492291063, 'p50_time_Dropout': 
30.037737917155027, 'p90_time_Dropout': 30.27475003618747, 'p99_time_Dropout': 
30.33297127345577, 'inputs': {'data': (1024, 1024)}}]}]
   [{'Dropout': [{'avg_time_forward_Dropout': 25.1095, 
'avg_time_backward_Dropout': 5.6493, 'max_storage_mem_alloc_cpu/0': 4259.8398, 
'inputs': {'data': (1024, 1024)}}]}]
   ```
   
   Master:
   ```
   [{'Dropout': [{'avg_time_Dropout': 2.1497764391824603, 'p50_time_Dropout': 
1.9523266237229109, 'p90_time_Dropout': 2.6631804881617427, 'p99_time_Dropout': 
2.8045210894197234, 'inputs': {'data': (1024, 1024)}}]}]
   [{'Dropout': [{'avg_time_forward_Dropout': 0.7749, 
'avg_time_backward_Dropout': 0.3057, 'max_storage_mem_alloc_cpu/0': 6291.4561, 
'inputs': {'data': (1024, 1024)}}]}]
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-02-03 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-581565582
 
 
   Update: Merged https://github.com/apache/incubator-mxnet/pull/17403  to 
enable unit tests of dropout on CPU and GPU w/o cuDNN. I will build MXNet with 
MKL and run the perf test again. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-01-17 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-575835293
 
 
   @TaoLv I had problem building/running mxnet with USE_BLAS=mkl. I will 
appreciate if you could give a hand: 
https://github.com/apache/incubator-mxnet/issues/17366


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-mxnet] apeforest commented on issue #16735: Use single-bit for mask in dropout operator

2020-01-09 Thread GitBox
apeforest commented on issue #16735: Use single-bit for mask in dropout operator
URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-572830100
 
 
   @TaoLv Thanks for your review. I ran operator profining using 
benchmark.opperf.utils.benchmark_utils.run_performance_test. The result shows 
speed up in forward but some degradation in backward pass.
   
   w/ this change:
   ```
   [{'Dropout': [{'avg_time_forward_Dropout': 1.3266, 
'max_storage_mem_alloc_cpu/0': 4259.8398, 'avg_time_backward_Dropout': 0.2682, 
'inputs': {'data': (1024, 1024), 'p': 0.5}}]}]
   ```
   
   w/o this change:
   ```
   [{'Dropout': [{'avg_time_forward_Dropout': 1.7864, 
'max_storage_mem_alloc_cpu/0': 6291.4561, 'avg_time_backward_Dropout': 0.1836, 
'inputs': {'data': (1024, 1024), 'p': 0.5}}]}]
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services