pengzhao-intel commented on issue #14087: Poor performance of the libmxnet if OMP_PLACES environment variable is present URL: https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461704672 @stsukrov thanks for the analysis. I have done several experiments in the local and I can reproduce your results in step 2. Agree with you OMP_PLACE should be similar with the two OMP env setting but the result is totally different and I will look into the details in the next week (this week we are on the vacation). In step 3, two env works well in myside and could you double check? In step 4, you can leverage the latest optimization of subgraph to get further perforamnce. In step 5, if you are runing the case with 2 sockets for big batchsize, it is a good trick method. Hopes this can help to unblock your works during we are debugging the OMP_PLACES issue. ============================================================== 1. without OMP_PLACE setting, resnet50-v1, 83.5 images/sec ``` (base) [patric@mlt-skx057 image-classification]$ python benchmark_score.py --network resnetv1-50 --batch-size 1 INFO:root:network: resnetv1-50 INFO:root:device: cpu(0) /home/patric/develop/incubator-mxnet/python/mxnet/module/base_module.py:67: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label']) warnings.warn(msg) INFO:root:batch size 1, dtype float32, images/sec: 83.457609 ``` 2. set OMP_PLACES=56, yes, the perforamnce is +40X drop to 2.5 images/sec ``` (base) [patric@mlt-skx057 image-classification]$ export OMP_PLACES=56 (base) [patric@mlt-skx057 image-classification]$ python benchmark_score.py --network resnetv1-50 --batch-size 1 INFO:root:network: resnetv1-50 INFO:root:device: cpu(0) /home/patric/develop/incubator-mxnet/python/mxnet/module/base_module.py:67: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label']) warnings.warn(msg) INFO:root:batch size 1, dtype float32, images/sec: 2.511049 ``` 3. set OMP_NUM_THREAD and KMP_AFFILITY and got 104.0 images/sec ``` (base) [patric@mlt-skx057 image-classification]$ export KMP_AFFINITY=granularity=fine,compact,1,0 (base) [patric@mlt-skx057 image-classification]$ export OMP_NUM_THREADS=56 (base) [patric@mlt-skx057 image-classification]$ python benchmark_score.py --network resnetv1-50 --batch-size 1 INFO:root:network: resnetv1-50 INFO:root:device: cpu(0) /home/patric/develop/incubator-mxnet/python/mxnet/module/base_module.py:67: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label']) warnings.warn(msg) INFO:root:batch size 1, dtype float32, images/sec: 104.027203 ``` 4. using new MKLDNN subgraph optimization by `MXNET_SUBGRAPH_BACKEND=MKLDNN` and got 176.4 images/sec ``` (base) [patric@mlt-skx057 image-classification]$ export MXNET_SUBGRAPH_BACKEND=MKLDNN (base) [patric@mlt-skx057 image-classification]$ python benchmark_score.py --network resnetv1-50 --batch-size 1 INFO:root:network: resnetv1-50 INFO:root:device: cpu(0) /home/patric/develop/incubator-mxnet/python/mxnet/module/base_module.py:67: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label']) warnings.warn(msg) [13:57:30] src/operator/subgraph/mkldnn/mkldnn_conv_property.cc:138: Start to execute MKLDNN Convolution optimization pass. INFO:root:batch size 1, dtype float32, images/sec: 176.417314 ``` 5. sometime NNVM_EXEC_MATCH_RANGE can help for the big batchsize on the 2 sockets. Such as mobilenet perf can be improved from 1941.6 images/sec to 2690 images/sec ``` (base) [patric@mlt-skx057 image-classification]$ python benchmark_score.py --network mobilenet --batch-size 128 INFO:root:network: mobilenet INFO:root:device: cpu(0) /home/patric/develop/incubator-mxnet/python/mxnet/module/base_module.py:67: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label']) warnings.warn(msg) [14:17:54] src/operator/subgraph/mkldnn/mkldnn_conv_property.cc:138: Start to execute MKLDNN Convolution optimization pass. INFO:root:batch size 128, dtype float32, images/sec: 1941.671165 (base) [patric@mlt-skx057 image-classification]$ export NNVM_EXEC_MATCH_RANGE=1 (base) [patric@mlt-skx057 image-classification]$ python benchmark_score.py --network mobilenet --batch-size 128 INFO:root:network: mobilenet INFO:root:device: cpu(0) /home/patric/develop/incubator-mxnet/python/mxnet/module/base_module.py:67: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label']) warnings.warn(msg) [14:18:07] src/operator/subgraph/mkldnn/mkldnn_conv_property.cc:138: Start to execute MKLDNN Convolution optimization pass. INFO:root:batch size 128, dtype float32, images/sec: 2690.545505 ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services