Requesting Slack Access
-- Sam Bean *StockX* *Tech Lead, Machine Learning and Personalization* *––* samb...@stockx.com stockx.com
Re: Updating MXNet's Cub
I created a PR to address this issue: https://github.com/apache/incubator-mxnet/pull/13322 Simply change cub submodule's URL will impact every developer. The recommended command: "git submodule update" won't work. Developer has to run "git submodule sync" first. To minimize the impact, I deleted CUB submodule and added a new submodule: "nvidia_cub". the only side effect is there will be a dangling untracked folder "cub" in local disk. Developer can delete it manually. Thanks, Frank On 2018/08/24 17:00:54, Hagay Lupesko wrote: > Hi all, > > > One of MXNet’s submodule dependencies is a snapshot of Nvidia Cub ( > https://github.com/dmlc/cub) – the snapshot is of an older version of Cub > (1.7), while the latest Nvidia Cub release is 1.8. Note that dmlc/cub has > no customizations of the source Cub repo. > > > I’d like to suggest to update the existing Cub submodule to Nvidia’s Cub > repo. Instead of the snapshot, MXNet will be using Nvidia’s repo and the > latest release (both repos have the same BSD-3 license, so licensing should > not be an issue). > > > Wanted to get feedback from the community to make sure I'm not missing > anything. > > if there are no objections I'll submit a PR for the change. > > > Cheers, > > Hagay >
Scala Symbol API Question
Hello, I have some questions about the Scala API for the Symbol library. I'm trying to figure out how to do something like this https://github.com/ufoym/mxnet/blob/master/example/vae/VAE.py#L83, however it seems the Scala Symbol API does not allow the mixing of symbols and constants like the python library does. It seems like if I want to use constants in my loss functions I'm going to have to have a very large argsDict when I go to train since every variable in the loss function definition will have to be symbolic. Is there a better way to do this? -- Sam Bean *StockX* *Tech Lead, Machine Learning and Personalization* *––* samb...@stockx.com stockx.com
Re: Updating MXNet's Cub
I create a PR: https://github.com/apache/incubator-mxnet/pull/13322 to address this issue. Updating CUB submodule URL will impact every developer. The recommended command: "git submodule update" won't work. Developer has to run "git submodule sync" first. To reduce impact for developer, I deleted CUB and added a new submodule nvidia_cub. With this workaround, developer can get update with "git submodule update". The only side effect is, there will be a cub folder left in local disk untracked. User can delete it manually. Thanks, Frank On 2018/08/24 17:00:54, Hagay Lupesko wrote: > Hi all, > > > One of MXNet’s submodule dependencies is a snapshot of Nvidia Cub ( > https://github.com/dmlc/cub) – the snapshot is of an older version of Cub > (1.7), while the latest Nvidia Cub release is 1.8. Note that dmlc/cub has > no customizations of the source Cub repo. > > > I’d like to suggest to update the existing Cub submodule to Nvidia’s Cub > repo. Instead of the snapshot, MXNet will be using Nvidia’s repo and the > latest release (both repos have the same BSD-3 license, so licensing should > not be an issue). > > > Wanted to get feedback from the community to make sure I'm not missing > anything. > > if there are no objections I'll submit a PR for the change. > > > Cheers, > > Hagay >
Re: MKLDNN performance in CI
It’s great to hear that works. Thanks for your effort on this, Marco. 👍 -Tao Sent from my iPhone > On Nov 23, 2018, at 7:31 PM, Marco de Abreu > wrote: > > Great news, it was the debug flag: > > [success] 16.54% test_gluon_model_zoo.test_models: 320.0898s > [success] 6.64% test_random.test_shuffle: 128.5430s > [success] 5.67% test_sparse_operator.test_elemwise_binary_ops: 109.6650s > [success] 4.41% test_metric_perf.test_metric_performance: 85.3215s > [success] 4.28% test_random.test_negative_binomial_generator: 82.8046s > [success] 3.91% test_operator.test_pick: 75.7241s > [success] 3.34% test_operator.test_psroipooling: 64.7008s > [success] 3.30% test_random.test_poisson_generator: 63.9218s > [success] 3.24% test_operator.test_broadcast_binary_op: 62.7417s > [success] 2.95% test_random.test_random: 57.0268s > [success] 2.95% test_gluon.test_slice_pooling2d_slice_pooling2d: 57.0118s > [success] 2.03% test_random.test_normal_generator: 39.3641s > [success] 1.86% test_io.test_Cifar10Rec: 36.0722s > [success] 1.76% test_random.test_gamma_generator: 34.0995s > [success] 1.65% test_gluon.test_slice_batchnorm: 31.9859s > [success] 1.63% test_gluon.test_slice_pooling2d: 31.5945s > > [...] > > Ran 703 tests in 1941.053s > > http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13379/runs/1/nodes/155/steps/409/log/?start=0 > > We are all good and set then. Thanks Tao and Patric for your support! > > Best regards, > Marco > > On Fri, Nov 23, 2018 at 3:44 AM Marco de Abreu > wrote: > >> Sure, good idea! https://github.com/apache/incubator-mxnet/pull/13379 >> >> -Marco >> >> On Fri, Nov 23, 2018 at 3:38 AM Zhao, Patric >> wrote: >> >>> Thanks, it should be the most time-consuming parts. >>> >>> @Marco, could you try to disable this env and see the performance again? >>> -Original Message- From: Lv, Tao A [mailto:tao.a...@intel.com] Sent: Friday, November 23, 2018 10:26 AM To: dev@mxnet.incubator.apache.org Subject: RE: MKLDNN performance in CI I think yes, except the cpp test. -Original Message- From: Zhao, Patric [mailto:patric.z...@intel.com] Sent: Friday, November 23, 2018 10:06 AM To: dev@mxnet.incubator.apache.org Subject: RE: MKLDNN performance in CI Good point, Tao! Is this env enabled in all MKL-DNN CI? > -Original Message- > From: Lv, Tao A [mailto:tao.a...@intel.com] > Sent: Friday, November 23, 2018 9:53 AM > To: dev@mxnet.incubator.apache.org > Subject: RE: MKLDNN performance in CI > > Thanks for bringing this up, Marco. It's really weird since most of > those tests listed in "worth noting" are not related to mkldnn >>> backend. > > I can understand that some tests for mkldnn operator may be slower > because MXNET_MKLDNN_DEBUG is enabled in the CI: > https://github.com/apache/incubator- > mxnet/blob/master/ci/docker/runtime_functions.sh#L713 > > -Original Message- > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] > Sent: Friday, November 23, 2018 9:22 AM > To: dev@mxnet.incubator.apache.org > Subject: MKLDNN performance in CI > > Hello, > > I have noticed that our Python tests have been increasing in duration recently. > In order to analyse this further, I created the PR [1] which allows to > record test durations. Please note that I did not dive deep on these > numbers and that they have to be taken with a grain of salt since > slaves have varying resource utilizations. > > Please have a look at the two following logs: > Python3 CPU MKLDNN: > http://jenkins.mxnet-ci.amazon- > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > validation/pipelines/unix-cpu/branches/PR- > 13377/runs/2/nodes/155/steps/409/log/?start=0 > Python3 CPU Openblas: > http://jenkins.mxnet-ci.amazon- > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- > validation/pipelines/unix-cpu/branches/PR- > 13377/runs/2/nodes/152/steps/398/log/?start=0 > > If you scroll to the end (note that there are multiple test stages and > summaries being printed in these logs), you will find the following > statements: > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" > > This shows that the MKLDNN is generally being about 40% slower than > the Openblas backend. If we go into the details, we can see that some > tests are significantly slower: > > Python3 CPU MKLDNN: > >> [success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% >> test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] >> 10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% >> test_operator.test_broadcast_binary_
Re: MKLDNN performance in CI
Great news, it was the debug flag: [success] 16.54% test_gluon_model_zoo.test_models: 320.0898s [success] 6.64% test_random.test_shuffle: 128.5430s [success] 5.67% test_sparse_operator.test_elemwise_binary_ops: 109.6650s [success] 4.41% test_metric_perf.test_metric_performance: 85.3215s [success] 4.28% test_random.test_negative_binomial_generator: 82.8046s [success] 3.91% test_operator.test_pick: 75.7241s [success] 3.34% test_operator.test_psroipooling: 64.7008s [success] 3.30% test_random.test_poisson_generator: 63.9218s [success] 3.24% test_operator.test_broadcast_binary_op: 62.7417s [success] 2.95% test_random.test_random: 57.0268s [success] 2.95% test_gluon.test_slice_pooling2d_slice_pooling2d: 57.0118s [success] 2.03% test_random.test_normal_generator: 39.3641s [success] 1.86% test_io.test_Cifar10Rec: 36.0722s [success] 1.76% test_random.test_gamma_generator: 34.0995s [success] 1.65% test_gluon.test_slice_batchnorm: 31.9859s [success] 1.63% test_gluon.test_slice_pooling2d: 31.5945s [...] Ran 703 tests in 1941.053s http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13379/runs/1/nodes/155/steps/409/log/?start=0 We are all good and set then. Thanks Tao and Patric for your support! Best regards, Marco On Fri, Nov 23, 2018 at 3:44 AM Marco de Abreu wrote: > Sure, good idea! https://github.com/apache/incubator-mxnet/pull/13379 > > -Marco > > On Fri, Nov 23, 2018 at 3:38 AM Zhao, Patric > wrote: > >> Thanks, it should be the most time-consuming parts. >> >> @Marco, could you try to disable this env and see the performance again? >> >> > -Original Message- >> > From: Lv, Tao A [mailto:tao.a...@intel.com] >> > Sent: Friday, November 23, 2018 10:26 AM >> > To: dev@mxnet.incubator.apache.org >> > Subject: RE: MKLDNN performance in CI >> > >> > I think yes, except the cpp test. >> > >> > -Original Message- >> > From: Zhao, Patric [mailto:patric.z...@intel.com] >> > Sent: Friday, November 23, 2018 10:06 AM >> > To: dev@mxnet.incubator.apache.org >> > Subject: RE: MKLDNN performance in CI >> > >> > Good point, Tao! >> > Is this env enabled in all MKL-DNN CI? >> > >> > > -Original Message- >> > > From: Lv, Tao A [mailto:tao.a...@intel.com] >> > > Sent: Friday, November 23, 2018 9:53 AM >> > > To: dev@mxnet.incubator.apache.org >> > > Subject: RE: MKLDNN performance in CI >> > > >> > > Thanks for bringing this up, Marco. It's really weird since most of >> > > those tests listed in "worth noting" are not related to mkldnn >> backend. >> > > >> > > I can understand that some tests for mkldnn operator may be slower >> > > because MXNET_MKLDNN_DEBUG is enabled in the CI: >> > > https://github.com/apache/incubator- >> > > mxnet/blob/master/ci/docker/runtime_functions.sh#L713 >> > > >> > > -Original Message- >> > > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID] >> > > Sent: Friday, November 23, 2018 9:22 AM >> > > To: dev@mxnet.incubator.apache.org >> > > Subject: MKLDNN performance in CI >> > > >> > > Hello, >> > > >> > > I have noticed that our Python tests have been increasing in duration >> > recently. >> > > In order to analyse this further, I created the PR [1] which allows to >> > > record test durations. Please note that I did not dive deep on these >> > > numbers and that they have to be taken with a grain of salt since >> > > slaves have varying resource utilizations. >> > > >> > > Please have a look at the two following logs: >> > > Python3 CPU MKLDNN: >> > > http://jenkins.mxnet-ci.amazon- >> > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- >> > > validation/pipelines/unix-cpu/branches/PR- >> > > 13377/runs/2/nodes/155/steps/409/log/?start=0 >> > > Python3 CPU Openblas: >> > > http://jenkins.mxnet-ci.amazon- >> > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet- >> > > validation/pipelines/unix-cpu/branches/PR- >> > > 13377/runs/2/nodes/152/steps/398/log/?start=0 >> > > >> > > If you scroll to the end (note that there are multiple test stages and >> > > summaries being printed in these logs), you will find the following >> > > statements: >> > > >> > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s" >> > > Python3 CPU Openblas: "Ran 702 tests in 2158.458s" >> > > >> > > This shows that the MKLDNN is generally being about 40% slower than >> > > the Openblas backend. If we go into the details, we can see that some >> > > tests are significantly slower: >> > > >> > > Python3 CPU MKLDNN: >> > > >> > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79% >> > > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success] >> > > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62% >> > > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45% >> > > >test_operator.test_pick: 74.4041s [success] 2.39% >> > > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38% >> > > >test_random.test_negative_binom