First of all, thanks for following up on this topic and not swiping the problem under the rug. You might very well be right and have some numbers which corroborate your findings, this might be something to celebrate. Before continuing our technical discussion I would like to take a step back and remind you of the code of conduct, since I think the way your are handling the communication about this issue is not conductive for a healthy community, It is also not a good leadership example from a respected engineer and Apache PMC member.
We are all trying to do the best we can for the project and not everyone is an expert on everything. There are technical decisions made long ago, sometimes lacking proper documentation and justifications which even if they are right, constitute technical debt as it takes a big effort to reverse-engineer or deep dive to understand all the ramifications which are non-obvious. I called a vote to clarify the issue and have an opportunity to move a long standing problem that remains unaddressed and unclear, this is not trolling, nothing personal nor against anyone nor their work. I actually just know the basics about OpenMP, so this is hardly about ego, as it's also not my contribution, I tried to help by providing some benchmarks requested since I felt the original contributors gave up trying to help. After we provided info and benchmarks one after another, you closed the PR in a way that was not well understood. If there's a flaw on the benchmark you are right to point it out. If someone doesn't have time or willingness to coach contributors or properly explain why a PR is not doing the right thing or document your technical contributions in a way that we can all align behind and understand the tradeoffs they shouldn't be exercising the power to close PRs. Please take some time to read the code of conduct: https://www.apache.org/foundation/policies/conduct There's also other materials about building healthy communities: https://www.jonobacon.com/books/artofcommunity/ Since we don't all share your particular sense of humor I would suggest to be prudent, have politeness, patience explaining your technical decisions and refrain from calling other people's names or using ad-hominem, as well as assuming good intentions. I suggested to you before in a private channel to have your findings and benchmarks documented in the wiki so we can have constructive conversations and help contributors improve the existing issues with OpenMP, people come and go to projects, so you can't assume that everyone knows the reasons why something was done some way two years ago, also the reasons might change with time. Pedro. On Tue, Jun 18, 2019 at 9:24 AM Chris Olivier <cjolivie...@gmail.com> wrote: > > I am very reluctant to feed the trolls again, and this will be teh last > time I address Pedro or Anton on the subject, but since I think the numbers > being presented are incorrect (either by te builders not really > understanding what they are building, or possibly intentional misdirection): > > Turning Intel OMP on and off (and MKL as well, since it tends to pull in > omp, depending which one is linked in). > There is a HUGE difference. This is consistent with my experience before > when it was added. > > > default mnist: > > python ../example/image-classification/train_mnist.py > INFO:root:start with arguments Namespace(add_stn=False, batch_size=64, > disp_batches=100, dtype='float32', gc_threshold=0.5, gc_type='none', > gpus=None, image_shape='1, 28, 28', initializer='default', > kv_store='device', load_epoch=None, loss='', lr=0.05, lr_factor=0.1, > lr_step_epochs='10', macrobatch_size=0, model_prefix=None, mom=0.9, > monitor=0, network='mlp', num_classes=10, num_epochs=20, > num_examples=60000, num_layers=None, optimizer='sgd', > profile_server_suffix='', profile_worker_suffix='', save_period=1, > test_io=0, top_k=0, warmup_epochs=5, warmup_strategy='linear', wd=0.0001) > > INTEL OMP: > > ldd libmxnet.so | grep omp > libomp.so => > /home/chris/src/mxnet/cmake_omp/3rdparty/openmp/runtime/src/libomp.so > (0x00007f978fde7000) > > :root:Epoch[0] Batch [0-100] Speed: 31548.09 samples/sec > accuracy=0.780012 > INFO:root:Epoch[0] Batch [100-200] Speed: 16073.21 samples/sec > accuracy=0.920469 > INFO:root:Epoch[0] Batch [200-300] Speed: 19075.91 samples/sec > accuracy=0.928281 > INFO:root:Epoch[0] Batch [300-400] Speed: 23211.36 samples/sec > accuracy=0.942813 > INFO:root:Epoch[0] Batch [400-500] Speed: 22139.79 samples/sec > accuracy=0.938750 > INFO:root:Epoch[0] Batch [500-600] Speed: 23225.52 samples/sec > accuracy=0.946562 > INFO:root:Epoch[0] Batch [600-700] Speed: 19547.41 samples/sec > accuracy=0.953281 > INFO:root:Epoch[0] Batch [700-800] Speed: 24111.73 samples/sec > accuracy=0.951562 > INFO:root:Epoch[0] Batch [800-900] Speed: 13959.88 samples/sec > accuracy=0.957500 > INFO:root:Epoch[0] Train-accuracy=0.925423 > INFO:root:Epoch[0] Time cost=3.806 > INFO:root:Epoch[0] Validation-accuracy=0.962580 > INFO:root:Epoch[1] Batch [0-100] Speed: 24560.21 samples/sec > accuracy=0.968131 > INFO:root:Epoch[1] Batch [100-200] Speed: 23457.03 samples/sec > accuracy=0.966250 > > > LIBGOMP: > > ldd libmxnet.so | grep omp > libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 > (0x00007f25c25dd000) > > INFO:root:Epoch[0] Batch [0-100] Speed: 1731.01 samples/sec > accuracy=0.782488 > INFO:root:Epoch[0] Batch [100-200] Speed: 3551.32 samples/sec > accuracy=0.907813 > INFO:root:Epoch[0] Batch [200-300] Speed: 1991.00 samples/sec > accuracy=0.927188 > INFO:root:Epoch[0] Batch [300-400] Speed: 2175.45 samples/sec > accuracy=0.937969 > INFO:root:Epoch[0] Batch [400-500] Speed: 1644.95 samples/sec > accuracy=0.942187 > INFO:root:Epoch[0] Batch [500-600] Speed: 6444.58 samples/sec > accuracy=0.950156 > INFO:root:Epoch[0] Batch [600-700] Speed: 7842.16 samples/sec > accuracy=0.947969 > INFO:root:Epoch[0] Batch [700-800] Speed: 9412.07 samples/sec > accuracy=0.953750 > INFO:root:Epoch[0] Batch [800-900] Speed: 12707.58 samples/sec > accuracy=0.953125 > > That being said, there's other issued beyond speed. The DEFAULT build from > makefile (not CMake) uses Intel OMP mkl (I showed before) and mysteriously > it has no issues? This seems highly suspicious. All I see is a lot of > hand-waving and conjecture and pointing to StackOverflow posts made by > people who may be of questionable pedigree to begin with. This smells of a > Pedro-ego-fight rather than one of purely technical merit. Also, if one > knows how OMP works, they would be very suspicious of the "intermittent > hangs" claim -- that's probably just broken race conditions elsewhere until > proven differently. It'd tend freeze on the first use if something is > wrong (try using libgomp after a fork and see), since worker threads" > wouldn't be assigned/joined properly. IntelOMP is faster, but also has > other advantages, such as allowing OMP after a fork. > > I actually addressed a lot of issues and ask for clarification in the > original PR's way back when, but they're all just ignored. > > -Chris