Great news, it was the debug flag: [success] 16.54% test_gluon_model_zoo.test_models: 320.0898s [success] 6.64% test_random.test_shuffle: 128.5430s [success] 5.67% test_sparse_operator.test_elemwise_binary_ops: 109.6650s [success] 4.41% test_metric_perf.test_metric_performance: 85.3215s [success] 4.28% test_random.test_negative_binomial_generator: 82.8046s [success] 3.91% test_operator.test_pick: 75.7241s [success] 3.34% test_operator.test_psroipooling: 64.7008s [success] 3.30% test_random.test_poisson_generator: 63.9218s [success] 3.24% test_operator.test_broadcast_binary_op: 62.7417s [success] 2.95% test_random.test_random: 57.0268s [success] 2.95% test_gluon.test_slice_pooling2d_slice_pooling2d: 57.0118s [success] 2.03% test_random.test_normal_generator: 39.3641s [success] 1.86% test_io.test_Cifar10Rec: 36.0722s [success] 1.76% test_random.test_gamma_generator: 34.0995s [success] 1.65% test_gluon.test_slice_batchnorm: 31.9859s [success] 1.63% test_gluon.test_slice_pooling2d: 31.5945s
[...] Ran 703 tests in 1941.053s http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13379/runs/1/nodes/155/steps/409/log/?start=0 We are all good and set then. Thanks Tao and Patric for your support! Best regards, Marco On Fri, Nov 23, 2018 at 3:44 AM Marco de Abreu <marco.g.ab...@googlemail.com> wrote: > Sure, good idea! https://github.com/apache/incubator-mxnet/pull/13379 > > -Marco > > On Fri, Nov 23, 2018 at 3:38 AM Zhao, Patric <patric.z...@intel.com> > wrote: > >> Thanks, it should be the most time-consuming parts. >> >> @Marco, could you try to disable this env and see the performance again? >> >> > -----Original Message----- >> > From: Lv, Tao A [mailto:tao.a...@intel.com] >> > Sent: Friday, November 23, 2018 10:26 AM >> > To: dev@mxnet.incubator.apache.org >> > Subject: RE: MKLDNN performance in CI >> > >> > I think yes, except the cpp test. >> > >> > -----Original Message----- >> > From: Zhao, Patric [mailto:patric.z...@intel.com] >> > Sent: Friday, November 23, 2018 10:06 AM >> > To: dev@mxnet.incubator.apache.org >> > Subject: RE: MKLDNN performance in CI >> > >> > Good point, Tao! >> > Is this env enabled in all MKL-DNN CI? >> > >> > > -----Original Message----- >> > > From: Lv, Tao A [mailto:tao.a...@intel.com] >> > > Sent: Friday, November 23, 2018 9:53 AM >> > > To: dev@mxnet.incubator.apache.org >> > > Subject: RE: MKLDNN performance in CI >> > > >> > > Thanks for bringing this up, Marco. It's really weird since most of >> > > those tests listed in "worth noting" are not related to mkldnn >> backend. >> > > >> > > I can understand that some tests for mkldnn operator may be slower >> > > because MXNET_MKLDNN_DEBUG is enabled in the CI: >> > > https://github.com/apache/incubator- >> > > mxnet/blob/master/ci/docker/runtime_functions.sh#L713 >> > > >> > > -----Original Message----- >> > > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] >> > > Sent: Friday, November 23, 2018 9:22 AM >> > > To: dev@mxnet.incubator.apache.org >> > > Subject: MKLDNN performance in CI >> > > >> > > Hello, >> > > >> > > I have noticed that our Python tests have been increasing in duration >> > recently. >> > > In order to analyse this further, I created the PR [1] which allows to >> > > record test durations. Please note that I did not dive deep on these >> > > numbers and that they have to be taken with a grain of salt since >> > > slaves have varying resource utilizations. >> > > >> > > Please have a look at the two following logs: >> > > Python3 CPU MKLDNN: >> > > http://jenkins.mxnet-ci.amazon- >> > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- >> > > validation/pipelines/unix-cpu/branches/PR- >> > > 13377/runs/2/nodes/155/steps/409/log/?start=0 >> > > Python3 CPU Openblas: >> > > http://jenkins.mxnet-ci.amazon- >> > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- >> > > validation/pipelines/unix-cpu/branches/PR- >> > > 13377/runs/2/nodes/152/steps/398/log/?start=0 >> > > >> > > If you scroll to the end (note that there are multiple test stages and >> > > summaries being printed in these logs), you will find the following >> > > statements: >> > > >> > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" >> > > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" >> > > >> > > This shows that the MKLDNN is generally being about 40% slower than >> > > the Openblas backend. If we go into the details, we can see that some >> > > tests are significantly slower: >> > > >> > > Python3 CPU MKLDNN: >> > > >> > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% >> > > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] >> > > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% >> > > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% >> > > >test_operator.test_pick: 74.4041s [success] 2.39% >> > > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% >> > > >test_random.test_negative_binomial_generator: 72.1751s [success] >> > > >1.84% >> > > >test_operator.test_psroipooling: 55.9432s [success] 1.78% >> > > >test_random.test_poisson_generator: 54.0104s [success] 1.72% >> > > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success] >> > > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41% >> > > >test_random.test_random: 42.8712s [success] 1.03% >> > > >test_operator.test_layer_norm: 31.1242s >> > > >> > > >> > > Python3 CPU Openblas: >> > > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s >> > > > [success] 4.34% test_random.test_shuffle: 93.3157s [success] 4.31% >> > > > test_random.test_negative_binomial_generator: 92.6899s [success] >> > > > 3.78% >> > > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s [success] >> > > > 3.30% test_operator.test_psroipooling: 70.9090s [success] 3.20% >> > > > test_random.test_poisson_generator: 68.7500s [success] 3.10% >> > > > test_metric_perf.test_metric_performance: 66.6085s [success] 2.79% >> > > > test_operator.test_layer_norm: 59.9566s [success] 2.66% >> > > > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s [success] >> > > > 2.62% test_operator.test_pick: 56.2312s [success] 2.60% >> > > > test_random.test_random: 55.8920s [success] 2.19% >> > > > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96% >> > > > test_contrib_control_flow.test_cond: 20.6908s >> > > >> > > Tests worth noting: >> > > - test_random.test_shuffle: 700% increase - but I don't know how this >> > > may be related to MKLDNN. Are we doing random number generation in >> > > either of those backends? >> > > - test_sparse_operator.test_elemwise_binary_ops: 700% increase >> > > - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and >> > > to be expect :) >> > > - test_operator.test_broadcast_binary_op: 80% increase >> > > - test_contrib_control_flow.test_cond: 250% increase >> > > - test_operator.test_layer_norm: 50% decrease - nice! >> > > >> > > As I have stated previously, these numbers might not mean anything >> > > since the CI is not a benchmarking environment (sorry if these are >> > > false negatives), but I thought it might be worth mentioning so Intel >> > > could follow up and dive deeper. >> > > >> > > Does anybody here create 1:1 operator comparisons (e.g. running >> > > layer_norm in the different backends to compare the performance) who >> > > could provide us with those numbers? >> > > >> > > Best regards, >> > > Marco >> > > >> > > [1]: https://github.com/apache/incubator-mxnet/pull/13377 >> >