I will try to stay on the sidelines for now since previous conversations
about OMP have not been productive here and I have spent way too much time
on this already, I'm not the first one giving up on trying to help with
this topic.

I would be glad if you guys can work together and find a solution. I will
just put my understanding of the big picture hoping that it helps move it
forward.


Recently the intel omp library which seemed to have the best performance of
the 3 was removed from MKL.

- There's 3 libraries in play, GNU Omp which is shipped with gcc (gomp),
LLVM openmp in 3rdparty (llvm-omp), Intel OMP when using MKL, which is
recently removed (iomp)

- IOMP seems to have the best performance, there's stability issues
producing crashes sometimes but the impact seems relatively small for users
and developers. In general seems linking with a different OMP version that
the one shipped with the compiler is known to cause stability issues but
it's done anyway.

- LLVM-OMP used when building with CMake, not used in the PIP releases or
when building with Make. Has stability issues, hangs when running in debug
mode during test execution and produces tons of assertions in debug mode.
Might have some small performance gains but there is no clear cut data that
showcases significant performance gains.

- GOMP is the version shipped with GCC and the PIP wheels without MKL, has
no stability problems.

As a ballpark, IOMP might give 10% performance improvement in some cases.

We need to document well how users should tune and configure MXNet when
using OMP.

As a developer, the safest bet is to use GOMP to be able to debug and
develop without issues. As a user of CPU inference / training you want to
run MKL so depends on how the Intel guys want to do things. My preference
as an engineer is always stability > speed.

Related tickets:

https://github.com/apache/incubator-mxnet/issues/16891

https://github.com/apache/incubator-mxnet/issues/10856#issuecomment-562637931


https://github.com/apache/incubator-mxnet/issues/11417

https://github.com/apache/incubator-mxnet/issues/15690



On Fri, Dec 6, 2019 at 12:39 AM Lausen, Leonard <lau...@amazon.com.invalid>
wrote:

> Is this related to https://github.com/apache/incubator-mxnet/issues/10856?
>
> I unlocked that Github issue based on the Apache Code of Conduct
> https://www.apache.org/foundation/policies/conduct#specific-guidelines
>
>
> On Sat, 2019-11-30 at 02:47 -0800, Pedro Larroy wrote:
> > (py3_venv) piotr@34-215-197-42:1:~/mxnet_1.6 (upstream_master)+$ ldd
> > build/libmxnet.so| grep -i openmp
> >         libomp.so =>
> > /home/piotr/mxnet_1.6/build/3rdparty/openmp/runtime/src/libomp.so
> > (0x00007fde0991d000)
> > (py3_venv) piotr@34-215-197-42:0:~/mxnet_1.6 (upstream_master)+$ python
> > ~/deeplearning-benchmark/image_classification/infer_imagenet.py --use-rec
> > --batch-size 256 --dtype float32 --num-data-workers 40 --mode hybrid
> > --model resnet50_v2 --use-pretrained --kvstore local --log-interval 1
> > --rec-val ~/data/val-passthrough.rec --rec-val-idx
> > ~/data/val-passthrough.idx
> > INFO:root:Namespace(batch_norm=False, batch_size=256,
> > data_dir='~/.mxnet/datasets/imagenet', dataset_size=32, dtype='float32',
> > kvstore='local', last_gamma=False, log_interval=1, logging_dir='logs',
> > lr=0.1, lr_decay=0.1, lr_decay_epoch='40,60', lr_mode='step',
> > lr_poly_power=2, mode='hybrid', model='resnet50_v2', momentum=0.9,
> > num_epochs=3, num_gpus=0, num_workers=40,
> > rec_val='/home/piotr/data/val-passthrough.rec',
> > rec_val_idx='/home/piotr/data/val-passthrough.idx', save_dir='params',
> > save_frequency=0, top_k=0, use_pretrained=True, use_rec=True,
> use_se=False,
> > warmup_epochs=0, warmup_lr=0.0, wd=0.0001)
> > [10:42:02] ../src/io/iter_image_recordio_2.cc:178: ImageRecordIOParser2:
> > /home/piotr/data/val-passthrough.rec, use 36 threads for decoding..
> > INFO:root:Batch [0]
> > INFO:root:Top 1 accuracy: 0
> > INFO:root:warmup_throughput: 5 samples/sec warmup_time 43.150922
> > INFO:root:Batch [1]
> > INFO:root:Top 1 accuracy: 0
> > INFO:root:warmup_throughput: 6 samples/sec warmup_time 37.971927
> > INFO:root:Batch [2]
> > INFO:root:Top 1 accuracy: 0
> > INFO:root:warmup_throughput: 7 samples/sec warmup_time 35.755363
> >
> >
> >
> >
> >
> >
> >
> > (py3_venv) piotr@34-215-197-42:0:~/mxnet_1.6_plat_omp
> (upstream_master)+$
> > git st
> > On branch upstream_master
> > Your branch is up to date with 'origin/upstream_master'.
> >
> > Changes not staged for commit:
> >   (use "git add/rm <file>..." to update what will be committed)
> >   (use "git checkout -- <file>..." to discard changes in working
> directory)
> >
> >         deleted:    3rdparty/openmp
> >
> > no changes added to commit (use "git add" and/or "git commit -a")
> > (py3_venv) piotr@34-215-197-42:1:~/mxnet_1.6_plat_omp
> (upstream_master)+$
> > ldd build/libmxnet.so | grep -i omp
> >         libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> > (0x00007f941241c000)
> >
> > (py3_venv) piotr@34-215-197-42:130:~/mxnet_1.6_plat_omp
> (upstream_master)+$
> > python ~/deeplearning-benchmark/image_classification/infer_imagenet.py
> > --use-rec --batch-size 256 --dtype float32 --num-data-workers 40 --mode
> > hybrid --model resnet50_v2 --use-pretrained --kvstore local
> --log-interval
> > 1 --rec-val ~/data/val-passthrough.rec --rec-val-idx
> > ~/data/val-passthrough.idx
> > INFO:root:warmup_throughput: 147 samples/sec warmup_time 1.735117
> > INFO:root:Batch [16]
> > INFO:root:Top 1 accuracy: 0
> > INFO:root:warmup_throughput: 143 samples/sec warmup_time 1.785760
> > INFO:root:Batch [17]
> > INFO:root:Top 1 accuracy: 0
> > INFO:root:warmup_throughput: 148 samples/sec warmup_time 1.729033
>

Reply via email to