I am very reluctant to feed the trolls again, and this will be teh last time I address Pedro or Anton on the subject, but since I think the numbers being presented are incorrect (either by te builders not really understanding what they are building, or possibly intentional misdirection):
Turning Intel OMP on and off (and MKL as well, since it tends to pull in omp, depending which one is linked in). There is a HUGE difference. This is consistent with my experience before when it was added. default mnist: python ../example/image-classification/train_mnist.py INFO:root:start with arguments Namespace(add_stn=False, batch_size=64, disp_batches=100, dtype='float32', gc_threshold=0.5, gc_type='none', gpus=None, image_shape='1, 28, 28', initializer='default', kv_store='device', load_epoch=None, loss='', lr=0.05, lr_factor=0.1, lr_step_epochs='10', macrobatch_size=0, model_prefix=None, mom=0.9, monitor=0, network='mlp', num_classes=10, num_epochs=20, num_examples=60000, num_layers=None, optimizer='sgd', profile_server_suffix='', profile_worker_suffix='', save_period=1, test_io=0, top_k=0, warmup_epochs=5, warmup_strategy='linear', wd=0.0001) INTEL OMP: ldd libmxnet.so | grep omp libomp.so => /home/chris/src/mxnet/cmake_omp/3rdparty/openmp/runtime/src/libomp.so (0x00007f978fde7000) :root:Epoch[0] Batch [0-100] Speed: 31548.09 samples/sec accuracy=0.780012 INFO:root:Epoch[0] Batch [100-200] Speed: 16073.21 samples/sec accuracy=0.920469 INFO:root:Epoch[0] Batch [200-300] Speed: 19075.91 samples/sec accuracy=0.928281 INFO:root:Epoch[0] Batch [300-400] Speed: 23211.36 samples/sec accuracy=0.942813 INFO:root:Epoch[0] Batch [400-500] Speed: 22139.79 samples/sec accuracy=0.938750 INFO:root:Epoch[0] Batch [500-600] Speed: 23225.52 samples/sec accuracy=0.946562 INFO:root:Epoch[0] Batch [600-700] Speed: 19547.41 samples/sec accuracy=0.953281 INFO:root:Epoch[0] Batch [700-800] Speed: 24111.73 samples/sec accuracy=0.951562 INFO:root:Epoch[0] Batch [800-900] Speed: 13959.88 samples/sec accuracy=0.957500 INFO:root:Epoch[0] Train-accuracy=0.925423 INFO:root:Epoch[0] Time cost=3.806 INFO:root:Epoch[0] Validation-accuracy=0.962580 INFO:root:Epoch[1] Batch [0-100] Speed: 24560.21 samples/sec accuracy=0.968131 INFO:root:Epoch[1] Batch [100-200] Speed: 23457.03 samples/sec accuracy=0.966250 LIBGOMP: ldd libmxnet.so | grep omp libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f25c25dd000) INFO:root:Epoch[0] Batch [0-100] Speed: 1731.01 samples/sec accuracy=0.782488 INFO:root:Epoch[0] Batch [100-200] Speed: 3551.32 samples/sec accuracy=0.907813 INFO:root:Epoch[0] Batch [200-300] Speed: 1991.00 samples/sec accuracy=0.927188 INFO:root:Epoch[0] Batch [300-400] Speed: 2175.45 samples/sec accuracy=0.937969 INFO:root:Epoch[0] Batch [400-500] Speed: 1644.95 samples/sec accuracy=0.942187 INFO:root:Epoch[0] Batch [500-600] Speed: 6444.58 samples/sec accuracy=0.950156 INFO:root:Epoch[0] Batch [600-700] Speed: 7842.16 samples/sec accuracy=0.947969 INFO:root:Epoch[0] Batch [700-800] Speed: 9412.07 samples/sec accuracy=0.953750 INFO:root:Epoch[0] Batch [800-900] Speed: 12707.58 samples/sec accuracy=0.953125 That being said, there's other issued beyond speed. The DEFAULT build from makefile (not CMake) uses Intel OMP mkl (I showed before) and mysteriously it has no issues? This seems highly suspicious. All I see is a lot of hand-waving and conjecture and pointing to StackOverflow posts made by people who may be of questionable pedigree to begin with. This smells of a Pedro-ego-fight rather than one of purely technical merit. Also, if one knows how OMP works, they would be very suspicious of the "intermittent hangs" claim -- that's probably just broken race conditions elsewhere until proven differently. It'd tend freeze on the first use if something is wrong (try using libgomp after a fork and see), since worker threads" wouldn't be assigned/joined properly. IntelOMP is faster, but also has other advantages, such as allowing OMP after a fork. I actually addressed a lot of issues and ask for clarification in the original PR's way back when, but they're all just ignored. -Chris