btw trying to override a veto with a “lazy consensus” is not a valid approach.
On Fri, Dec 6, 2019 at 8:44 PM Lausen, Leonard <lau...@amazon.com.invalid> wrote: > I think it's reasonable to assume that the Intel MKLDNN team is an > "authorative" > source about the issue of compilation with OpenMP and the OpenMP runtime > library > related issues. Thus I suggest we follow the recommendation of Intel > MKLDNN team > within the MXNet project. > > Looking through the Intel MKLDNN documentation, I find [1]: > > > DNNL uses OpenMP runtime library provided by the compiler. > > as well as > > > it's important to ensure that only one OpenMP runtime is used throughout > the > > application. Having more than one OpenMP runtime linked to an executable > may > > lead to undefined behavior including incorrect results or crashes. > > To keep our project maintainable and error free, I thus suggest we follow > DNNL > and use the OpenMP runtime library provided by the compiler. > We have limited ressources and finding the root cause for any bugs > resulting > from linking multiple OpenMP libraries as currently done is, in my > opinion. not > a good use of time. We know it's due to undefined behavior and we know > it's best > practice to use OpenMP runtime library provided by the compiler. So let's > just > do that. > > I think given that MKL-DNN has also adopted the "OpenMP runtime library > provided > by the compiler" approach, this issue is not contentious anymore and > qualifies > for lazy consensus. > > Thus if there is no objection within 72 hours (lazy consensus), let's drop > bundled LLVM OpenMP from master [2]. If we find any issues due to > droppeing the > bundled LLVM OpenMP, we can always add it back prior to the next release. > > Best regards > Leonard > > [1]: > > https://github.com/intel/mkl-dnn/blob/433e086bf5d9e5ccfc9ec0b70322f931b6b1921d/doc/build/build_options.md#openmp > (This is the updated reference from Anton's previous comment, based on the > changes in MKLDNN done in the meantime > https://github.com/apache/incubator-mxnet/pull/12160#issuecomment-415078066 > ) > [2]: Alike https://github.com/apache/incubator-mxnet/pull/12160 > > > On Fri, 2019-12-06 at 12:16 -0800, Pedro Larroy wrote: > > I will try to stay on the sidelines for now since previous conversations > > about OMP have not been productive here and I have spent way too much > time > > on this already, I'm not the first one giving up on trying to help with > > this topic. > > > > I would be glad if you guys can work together and find a solution. I will > > just put my understanding of the big picture hoping that it helps move it > > forward. > > > > > > Recently the intel omp library which seemed to have the best performance > of > > the 3 was removed from MKL. > > > > - There's 3 libraries in play, GNU Omp which is shipped with gcc (gomp), > > LLVM openmp in 3rdparty (llvm-omp), Intel OMP when using MKL, which is > > recently removed (iomp) > > > > - IOMP seems to have the best performance, there's stability issues > > producing crashes sometimes but the impact seems relatively small for > users > > and developers. In general seems linking with a different OMP version > that > > the one shipped with the compiler is known to cause stability issues but > > it's done anyway. > > > > - LLVM-OMP used when building with CMake, not used in the PIP releases or > > when building with Make. Has stability issues, hangs when running in > debug > > mode during test execution and produces tons of assertions in debug mode. > > Might have some small performance gains but there is no clear cut data > that > > showcases significant performance gains. > > > > - GOMP is the version shipped with GCC and the PIP wheels without MKL, > has > > no stability problems. > > > > As a ballpark, IOMP might give 10% performance improvement in some cases. > > > > We need to document well how users should tune and configure MXNet when > > using OMP. > > > > As a developer, the safest bet is to use GOMP to be able to debug and > > develop without issues. As a user of CPU inference / training you want to > > run MKL so depends on how the Intel guys want to do things. My preference > > as an engineer is always stability > speed. > > > > Related tickets: > > > > https://github.com/apache/incubator-mxnet/issues/16891 > > > > > https://github.com/apache/incubator-mxnet/issues/10856#issuecomment-562637931 > > > > > > https://github.com/apache/incubator-mxnet/issues/11417 > > > > https://github.com/apache/incubator-mxnet/issues/15690 > > > > > > > > On Fri, Dec 6, 2019 at 12:39 AM Lausen, Leonard > <lau...@amazon.com.invalid> > > wrote: > > > > > Is this related to > https://github.com/apache/incubator-mxnet/issues/10856? > > > > > > I unlocked that Github issue based on the Apache Code of Conduct > > > https://www.apache.org/foundation/policies/conduct#specific-guidelines > > > > > > > > > On Sat, 2019-11-30 at 02:47 -0800, Pedro Larroy wrote: > > > > (py3_venv) piotr@34-215-197-42:1:~/mxnet_1.6 (upstream_master)+$ ldd > > > > build/libmxnet.so| grep -i openmp > > > > libomp.so => > > > > /home/piotr/mxnet_1.6/build/3rdparty/openmp/runtime/src/libomp.so > > > > (0x00007fde0991d000) > > > > (py3_venv) piotr@34-215-197-42:0:~/mxnet_1.6 (upstream_master)+$ > python > > > > ~/deeplearning-benchmark/image_classification/infer_imagenet.py > --use-rec > > > > --batch-size 256 --dtype float32 --num-data-workers 40 --mode hybrid > > > > --model resnet50_v2 --use-pretrained --kvstore local --log-interval 1 > > > > --rec-val ~/data/val-passthrough.rec --rec-val-idx > > > > ~/data/val-passthrough.idx > > > > INFO:root:Namespace(batch_norm=False, batch_size=256, > > > > data_dir='~/.mxnet/datasets/imagenet', dataset_size=32, > dtype='float32', > > > > kvstore='local', last_gamma=False, log_interval=1, > logging_dir='logs', > > > > lr=0.1, lr_decay=0.1, lr_decay_epoch='40,60', lr_mode='step', > > > > lr_poly_power=2, mode='hybrid', model='resnet50_v2', momentum=0.9, > > > > num_epochs=3, num_gpus=0, num_workers=40, > > > > rec_val='/home/piotr/data/val-passthrough.rec', > > > > rec_val_idx='/home/piotr/data/val-passthrough.idx', > save_dir='params', > > > > save_frequency=0, top_k=0, use_pretrained=True, use_rec=True, > > > use_se=False, > > > > warmup_epochs=0, warmup_lr=0.0, wd=0.0001) > > > > [10:42:02] ../src/io/iter_image_recordio_2.cc:178: > ImageRecordIOParser2: > > > > /home/piotr/data/val-passthrough.rec, use 36 threads for decoding.. > > > > INFO:root:Batch [0] > > > > INFO:root:Top 1 accuracy: 0 > > > > INFO:root:warmup_throughput: 5 samples/sec warmup_time 43.150922 > > > > INFO:root:Batch [1] > > > > INFO:root:Top 1 accuracy: 0 > > > > INFO:root:warmup_throughput: 6 samples/sec warmup_time 37.971927 > > > > INFO:root:Batch [2] > > > > INFO:root:Top 1 accuracy: 0 > > > > INFO:root:warmup_throughput: 7 samples/sec warmup_time 35.755363 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (py3_venv) piotr@34-215-197-42:0:~/mxnet_1.6_plat_omp > > > (upstream_master)+$ > > > > git st > > > > On branch upstream_master > > > > Your branch is up to date with 'origin/upstream_master'. > > > > > > > > Changes not staged for commit: > > > > (use "git add/rm <file>..." to update what will be committed) > > > > (use "git checkout -- <file>..." to discard changes in working > > > directory) > > > > deleted: 3rdparty/openmp > > > > > > > > no changes added to commit (use "git add" and/or "git commit -a") > > > > (py3_venv) piotr@34-215-197-42:1:~/mxnet_1.6_plat_omp > > > (upstream_master)+$ > > > > ldd build/libmxnet.so | grep -i omp > > > > libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 > > > > (0x00007f941241c000) > > > > > > > > (py3_venv) piotr@34-215-197-42:130:~/mxnet_1.6_plat_omp > > > (upstream_master)+$ > > > > python > ~/deeplearning-benchmark/image_classification/infer_imagenet.py > > > > --use-rec --batch-size 256 --dtype float32 --num-data-workers 40 > --mode > > > > hybrid --model resnet50_v2 --use-pretrained --kvstore local > > > --log-interval > > > > 1 --rec-val ~/data/val-passthrough.rec --rec-val-idx > > > > ~/data/val-passthrough.idx > > > > INFO:root:warmup_throughput: 147 samples/sec warmup_time 1.735117 > > > > INFO:root:Batch [16] > > > > INFO:root:Top 1 accuracy: 0 > > > > INFO:root:warmup_throughput: 143 samples/sec warmup_time 1.785760 > > > > INFO:root:Batch [17] > > > > INFO:root:Top 1 accuracy: 0 > > > > INFO:root:warmup_throughput: 148 samples/sec warmup_time 1.729033 >