Requesting Slack Access

2018-11-23 Thread Sam Bean
-- 
Sam Bean
*StockX*
*Tech Lead, Machine Learning and Personalization*
*––*
samb...@stockx.com
stockx.com


Re: Updating MXNet's Cub

2018-11-23 Thread frankfliu2000
I created a PR to address this issue:
https://github.com/apache/incubator-mxnet/pull/13322

Simply change cub submodule's URL will impact every developer. The recommended 
command: "git submodule update" won't work. Developer has to run "git submodule 
sync" first.

To minimize the impact, I deleted CUB submodule and added a new submodule: 
"nvidia_cub". the only side effect is there will be a dangling untracked folder 
"cub" in local disk. Developer can delete it manually.

Thanks,
Frank


On 2018/08/24 17:00:54, Hagay Lupesko  wrote: 
> Hi all,
> 
> 
> One of MXNet’s submodule dependencies is a snapshot of Nvidia Cub (
> https://github.com/dmlc/cub) – the snapshot is of an older version of Cub
> (1.7), while the latest Nvidia Cub release is 1.8.  Note that dmlc/cub has
> no customizations of the source Cub repo.
> 
> 
> I’d like to suggest to update the existing Cub submodule to Nvidia’s Cub
> repo. Instead of the snapshot, MXNet will be using Nvidia’s repo and the
> latest release (both repos have the same BSD-3 license, so licensing should
> not be an issue).
> 
> 
> Wanted to get feedback from the community to make sure I'm not missing
> anything.
> 
> if there are no objections I'll submit a PR for the change.
> 
> 
> Cheers,
> 
> Hagay
> 


Scala Symbol API Question

2018-11-23 Thread Sam Bean
Hello, I have some questions about the Scala API for the Symbol library.

I'm trying to figure out how to do something like this
https://github.com/ufoym/mxnet/blob/master/example/vae/VAE.py#L83, however
it seems the Scala Symbol API does not allow the mixing of symbols and
constants like the python library does.

It seems like if I want to use constants in my loss functions I'm going to
have to have a very large argsDict when I go to train since every variable
in the loss function definition will have to be symbolic. Is there a better
way to do this?

-- 
Sam Bean
*StockX*
*Tech Lead, Machine Learning and Personalization*
*––*
samb...@stockx.com
stockx.com


Re: Updating MXNet's Cub

2018-11-23 Thread frankfliu2000
I create a PR: https://github.com/apache/incubator-mxnet/pull/13322 to address 
this issue.

Updating CUB submodule URL will impact every developer. The recommended 
command: "git submodule update" won't work. Developer has to run "git submodule 
sync" first.

To reduce impact for developer, I deleted CUB and added a new submodule 
nvidia_cub. With this workaround, developer can get update with "git submodule 
update". The only side effect is, there will be a cub folder left in local disk 
untracked. User can delete it manually.

Thanks,
Frank
On 2018/08/24 17:00:54, Hagay Lupesko  wrote: 
> Hi all,
> 
> 
> One of MXNet’s submodule dependencies is a snapshot of Nvidia Cub (
> https://github.com/dmlc/cub) – the snapshot is of an older version of Cub
> (1.7), while the latest Nvidia Cub release is 1.8.  Note that dmlc/cub has
> no customizations of the source Cub repo.
> 
> 
> I’d like to suggest to update the existing Cub submodule to Nvidia’s Cub
> repo. Instead of the snapshot, MXNet will be using Nvidia’s repo and the
> latest release (both repos have the same BSD-3 license, so licensing should
> not be an issue).
> 
> 
> Wanted to get feedback from the community to make sure I'm not missing
> anything.
> 
> if there are no objections I'll submit a PR for the change.
> 
> 
> Cheers,
> 
> Hagay
> 


Re: MKLDNN performance in CI

2018-11-23 Thread Lv, Tao A
It’s great to hear that works. Thanks for your effort on this, Marco. 👍

-Tao

Sent from my iPhone

> On Nov 23, 2018, at 7:31 PM, Marco de Abreu 
>  wrote:
> 
> Great news, it was the debug flag:
> 
> [success] 16.54% test_gluon_model_zoo.test_models: 320.0898s
> [success] 6.64% test_random.test_shuffle: 128.5430s
> [success] 5.67% test_sparse_operator.test_elemwise_binary_ops: 109.6650s
> [success] 4.41% test_metric_perf.test_metric_performance: 85.3215s
> [success] 4.28% test_random.test_negative_binomial_generator: 82.8046s
> [success] 3.91% test_operator.test_pick: 75.7241s
> [success] 3.34% test_operator.test_psroipooling: 64.7008s
> [success] 3.30% test_random.test_poisson_generator: 63.9218s
> [success] 3.24% test_operator.test_broadcast_binary_op: 62.7417s
> [success] 2.95% test_random.test_random: 57.0268s
> [success] 2.95% test_gluon.test_slice_pooling2d_slice_pooling2d: 57.0118s
> [success] 2.03% test_random.test_normal_generator: 39.3641s
> [success] 1.86% test_io.test_Cifar10Rec: 36.0722s
> [success] 1.76% test_random.test_gamma_generator: 34.0995s
> [success] 1.65% test_gluon.test_slice_batchnorm: 31.9859s
> [success] 1.63% test_gluon.test_slice_pooling2d: 31.5945s
> 
> [...]
> 
> Ran 703 tests in 1941.053s
> 
> http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13379/runs/1/nodes/155/steps/409/log/?start=0
> 
> We are all good and set then. Thanks Tao and Patric for your support!
> 
> Best regards,
> Marco
> 
> On Fri, Nov 23, 2018 at 3:44 AM Marco de Abreu 
> wrote:
> 
>> Sure, good idea! https://github.com/apache/incubator-mxnet/pull/13379
>> 
>> -Marco
>> 
>> On Fri, Nov 23, 2018 at 3:38 AM Zhao, Patric 
>> wrote:
>> 
>>> Thanks, it should be the most time-consuming parts.
>>> 
>>> @Marco, could you try to disable this env and see the performance again?
>>> 
 -Original Message-
 From: Lv, Tao A [mailto:tao.a...@intel.com]
 Sent: Friday, November 23, 2018 10:26 AM
 To: dev@mxnet.incubator.apache.org
 Subject: RE: MKLDNN performance in CI
 
 I think yes, except the cpp test.
 
 -Original Message-
 From: Zhao, Patric [mailto:patric.z...@intel.com]
 Sent: Friday, November 23, 2018 10:06 AM
 To: dev@mxnet.incubator.apache.org
 Subject: RE: MKLDNN performance in CI
 
 Good point, Tao!
 Is this env enabled in all MKL-DNN CI?
 
> -Original Message-
> From: Lv, Tao A [mailto:tao.a...@intel.com]
> Sent: Friday, November 23, 2018 9:53 AM
> To: dev@mxnet.incubator.apache.org
> Subject: RE: MKLDNN performance in CI
> 
> Thanks for bringing this up, Marco. It's really weird since most of
> those tests listed in "worth noting" are not related to mkldnn
>>> backend.
> 
> I can understand that some tests for mkldnn operator may be slower
> because MXNET_MKLDNN_DEBUG is enabled in the CI:
> https://github.com/apache/incubator-
> mxnet/blob/master/ci/docker/runtime_functions.sh#L713
> 
> -Original Message-
> From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
> Sent: Friday, November 23, 2018 9:22 AM
> To: dev@mxnet.incubator.apache.org
> Subject: MKLDNN performance in CI
> 
> Hello,
> 
> I have noticed that our Python tests have been increasing in duration
 recently.
> In order to analyse this further, I created the PR [1] which allows to
> record test durations. Please note that I did not dive deep on these
> numbers and that they have to be taken with a grain of salt since
> slaves have varying resource utilizations.
> 
> Please have a look at the two following logs:
> Python3 CPU MKLDNN:
> http://jenkins.mxnet-ci.amazon-
> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> validation/pipelines/unix-cpu/branches/PR-
> 13377/runs/2/nodes/155/steps/409/log/?start=0
> Python3 CPU Openblas:
> http://jenkins.mxnet-ci.amazon-
> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> validation/pipelines/unix-cpu/branches/PR-
> 13377/runs/2/nodes/152/steps/398/log/?start=0
> 
> If you scroll to the end (note that there are multiple test stages and
> summaries being printed in these logs), you will find the following
> statements:
> 
> Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
> Python3 CPU Openblas: "Ran 702 tests in 2158.458s"
> 
> This shows that the MKLDNN is generally being about 40% slower than
> the Openblas backend. If we go into the details, we can see that some
> tests are significantly slower:
> 
> Python3 CPU MKLDNN:
> 
>> [success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79%
>> test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success]
>> 10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62%
>> test_operator.test_broadcast_binary_

Re: MKLDNN performance in CI

2018-11-23 Thread Marco de Abreu
Great news, it was the debug flag:

[success] 16.54% test_gluon_model_zoo.test_models: 320.0898s
[success] 6.64% test_random.test_shuffle: 128.5430s
[success] 5.67% test_sparse_operator.test_elemwise_binary_ops: 109.6650s
[success] 4.41% test_metric_perf.test_metric_performance: 85.3215s
[success] 4.28% test_random.test_negative_binomial_generator: 82.8046s
[success] 3.91% test_operator.test_pick: 75.7241s
[success] 3.34% test_operator.test_psroipooling: 64.7008s
[success] 3.30% test_random.test_poisson_generator: 63.9218s
[success] 3.24% test_operator.test_broadcast_binary_op: 62.7417s
[success] 2.95% test_random.test_random: 57.0268s
[success] 2.95% test_gluon.test_slice_pooling2d_slice_pooling2d: 57.0118s
[success] 2.03% test_random.test_normal_generator: 39.3641s
[success] 1.86% test_io.test_Cifar10Rec: 36.0722s
[success] 1.76% test_random.test_gamma_generator: 34.0995s
[success] 1.65% test_gluon.test_slice_batchnorm: 31.9859s
[success] 1.63% test_gluon.test_slice_pooling2d: 31.5945s

[...]

Ran 703 tests in 1941.053s

http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-13379/runs/1/nodes/155/steps/409/log/?start=0

We are all good and set then. Thanks Tao and Patric for your support!

Best regards,
Marco

On Fri, Nov 23, 2018 at 3:44 AM Marco de Abreu 
wrote:

> Sure, good idea! https://github.com/apache/incubator-mxnet/pull/13379
>
> -Marco
>
> On Fri, Nov 23, 2018 at 3:38 AM Zhao, Patric 
> wrote:
>
>> Thanks, it should be the most time-consuming parts.
>>
>> @Marco, could you try to disable this env and see the performance again?
>>
>> > -Original Message-
>> > From: Lv, Tao A [mailto:tao.a...@intel.com]
>> > Sent: Friday, November 23, 2018 10:26 AM
>> > To: dev@mxnet.incubator.apache.org
>> > Subject: RE: MKLDNN performance in CI
>> >
>> > I think yes, except the cpp test.
>> >
>> > -Original Message-
>> > From: Zhao, Patric [mailto:patric.z...@intel.com]
>> > Sent: Friday, November 23, 2018 10:06 AM
>> > To: dev@mxnet.incubator.apache.org
>> > Subject: RE: MKLDNN performance in CI
>> >
>> > Good point, Tao!
>> > Is this env enabled in all MKL-DNN CI?
>> >
>> > > -Original Message-
>> > > From: Lv, Tao A [mailto:tao.a...@intel.com]
>> > > Sent: Friday, November 23, 2018 9:53 AM
>> > > To: dev@mxnet.incubator.apache.org
>> > > Subject: RE: MKLDNN performance in CI
>> > >
>> > > Thanks for bringing this up, Marco. It's really weird since most of
>> > > those tests listed in "worth noting" are not related to mkldnn
>> backend.
>> > >
>> > > I can understand that some tests for mkldnn operator may be slower
>> > > because MXNET_MKLDNN_DEBUG is enabled in the CI:
>> > > https://github.com/apache/incubator-
>> > > mxnet/blob/master/ci/docker/runtime_functions.sh#L713
>> > >
>> > > -Original Message-
>> > > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
>> > > Sent: Friday, November 23, 2018 9:22 AM
>> > > To: dev@mxnet.incubator.apache.org
>> > > Subject: MKLDNN performance in CI
>> > >
>> > > Hello,
>> > >
>> > > I have noticed that our Python tests have been increasing in duration
>> > recently.
>> > > In order to analyse this further, I created the PR [1] which allows to
>> > > record test durations. Please note that I did not dive deep on these
>> > > numbers and that they have to be taken with a grain of salt since
>> > > slaves have varying resource utilizations.
>> > >
>> > > Please have a look at the two following logs:
>> > > Python3 CPU MKLDNN:
>> > > http://jenkins.mxnet-ci.amazon-
>> > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
>> > > validation/pipelines/unix-cpu/branches/PR-
>> > > 13377/runs/2/nodes/155/steps/409/log/?start=0
>> > > Python3 CPU Openblas:
>> > > http://jenkins.mxnet-ci.amazon-
>> > > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
>> > > validation/pipelines/unix-cpu/branches/PR-
>> > > 13377/runs/2/nodes/152/steps/398/log/?start=0
>> > >
>> > > If you scroll to the end (note that there are multiple test stages and
>> > > summaries being printed in these logs), you will find the following
>> > > statements:
>> > >
>> > > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
>> > > Python3 CPU Openblas: "Ran 702 tests in 2158.458s"
>> > >
>> > > This shows that the MKLDNN is generally being about 40% slower than
>> > > the Openblas backend. If we go into the details, we can see that some
>> > > tests are significantly slower:
>> > >
>> > > Python3 CPU MKLDNN:
>> > >
>> > > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79%
>> > > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success]
>> > > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62%
>> > > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45%
>> > > >test_operator.test_pick: 74.4041s [success] 2.39%
>> > > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38%
>> > > >test_random.test_negative_binom