Hey Denis,

I don't think something like an experimental release is something that the
Apache release process supports. Also, I would be afraid of automated
systems consuming MXNet by simply fetching the latest release version.
These users would then get the experimental version without being aware.

For the sake of the best user experience, I'd prefer if we could take a few
days to track down the root causes for all these regressions. While I agree
that releasing the new features and optimizations is certainly overdue, I
think that the most important point is to keep up with the existing users
and their trust. If a new release performs worse for the same kind of
workload, they might lose trust into our release process and in future
might be less willing to adopt a new release early-on.

-Marco

Davydenko, Denis <dzianis.davydze...@gmail.com> schrieb am Fr., 28. Juni
2019, 18:55:

> According to Sandeep's evaluation of perf regression on operator level [1]
> we have 77 op/input combinations for forward pass and 50 for backward pass
> where regression is 5%+ (biggest regressions observed are about 86% and 84%
> respectively) out of 290 tests. If I raise threshold of degradation to 10%+
> corresponding numbers are 70 for forward and 42 for backward. This, from my
> perspective, constitutes significant scale performance impact, at least on
> individual operator level. In light of keeping every next release as
> performant as previous (at least to feasible extent) I suggest we can only
> move forward with 1.5.0 release if we call it experimental. Current
> landscape of operators having potentially negative performance impact on
> customers could (and I consider it will) put MXNet one step behind its
> current market position of being a choice for performance optimized DL
> workloads. Tagging it as experimental, from my point of view, would help to
> release new features so that customers could enjoy them while being
> explicit about performance optimizations going on.
>
> [1]
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>
>
>
> On 6/28/19, 9:38 AM, "Lai Wei" <roywei...@gmail.com> wrote:
>
>     Hi,
>
>     Some more data points:
>
>     I ran the same cifar10.py scripts with same setup, BUT added a fixed
> seed
>
>     Ran 50 epochs, and first 10 epoch as warmup.
>     I have the following average time per epoch:
>     1.4.1: 164.95 s
>     1.5.0: 170.44 s
>     Detailed data at [1]
>     This is about 3% regression, less than Manu’s result but more close to
> the
>     Gluon result.
>
>     As for the operator benchmarks from Sandeep[2],  I have calculated the
>     percentage of speed increase/regression here[1]. Looks like not all
>     operators mentioned before slowed down. should it be treated as an
> separate
>     issue as it’s testing on fake data with different shape than CIFAR10
>     dataset? For example, batch norm has no regression in the report but
> it’s
>     slowed down in cifar10.py script profiling.
>
>     [1] https://gist.github.com/roywei/41fce930f013ff3b54cda6e86eaaf66b
>     [2]
>
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>
>
>     On Fri, Jun 28, 2019 at 2:47 PM Pedro Larroy <
> pedro.larroy.li...@gmail.com>
>     wrote:
>
>     > Thanks Manu.
>     >
>     > @all: I observed other strange stuff that I don't understand at the
> moment:
>     >
>     > I installed rc for 1.5 from pip to check that I'm not doing something
>     > wrong when building. And I found out that the usage of CPU is quite
>     > subpar ( https://imgur.com/fRmbQNc ) compared to a version compiled
>     > from source. The pip package is using 4-5 cores of the 32. When I
>     > compile from source I get good core utilization. (
>     > https://imgur.com/e8BB425 ). I verified this also on a c5d.18xlarge
>     > and a 32 core AMD bare metal machine.
>     >
>     > Seems to me also that the version from pip is using gomp instead of
>     > llvm's omp. I'm not sure why.
>     >
>     > pip install mxnet==1.5.0b20190627
>     > /home/piotr/py3_1.5rc/lib/python3.6/site-packages/mxnet
>     > piotr@panther:0: ~/p/l/p/s/mxnet> ldd libmxnet.so | grep omp
>     >     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
>     > (0x00007f99d1832000)
>     >
>     > I tried cifar10 on a bare metal 32 core AMD Zen machine and is
>     > extremely slow, doesn't seem to make much progress, when compared to
> a
>     > c5d.18xlarge, I couldn't even do 1 epoch, tried with and without MKL
>     > without much success. Will continue digging into this when possible.
>     >
>     >
>     > Pedro.
>     >
>     > On Thu, Jun 27, 2019 at 9:41 PM Manu Seth <manuseth1...@gmail.com>
> wrote:
>     > >
>     > > Hi all,
>     > >
>     > > I ran the same cifar10.py script as Pedro, but for 20 epochs.
> Considering
>     > > the first 10 epochs for warm-up, I averaged time per epoch for the
> last
>     > 10
>     > > epochs.
>     > >
>     > > With MXNet 1.4.1 average time is 164.23 s
>     > > With MXNet 1.5.0 average time is 174.59 s (~6.3% regression)
>     > >
>     > >
>     > > For a second data point, I ran Gluon speed test benchmark script -
>     > >
>     >
> https://github.com/apache/incubator-mxnet/blob/master/benchmark/python/gluon/benchmark_gluon.py
>     > > using the following command:
>     > > python3 benchmark_gluon.py --model 'resnet152_v2' --batch-size 128
>     > > --num-batches 200 --type 'training'
>     > >
>     > > I got the following speeds:
>     > > With MXNet 1.4.1, average speed is 25.677534 img/s
>     > > With MXNet 1.5.0, average speed is 25.082130 img/s (~2.3%
> regression)
>     > >
>     > > Note:
>     > > For 1.4.1 version, I used pip install mxnet-mkl==1.4.1
>     > > For 1.5.0 version, I used pip install mxnet-mkl==1.5.0b20190619
> which
>     > > corresponds to commit# ccbbf6b4b76ea536a6583c99497c83b65a20817b
> which is
>     > > behind 1.5.x branch by 4 commits
>     > >
>     > >
>     > > Best,
>     > > Manu
>     > >
>     > >
>     > > On 6/27/19, 3:37 PM, "sandeep krishnamurthy" <
>     > sandeep.krishn...@gmail.com>
>     > > wrote:
>     > >
>     > >     Hello Ciyong/Pedro,
>     > >
>     > >     Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not complete,
>     > doesn’t
>     > >     cover all MXNet operators, not presented in best possible way,
> still
>     > > WIP)
>     > >
>     > >
>     >
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>     > >
>     > >     Following operators looks slower in 1.5 compared to 1.4.1:
>     > >     - BatchNorm
>     > >     - Pooling
>     > >     - FullyConnected
>     > >     - batch_dot
>     > >     - Dot
>     > >     - broadcast_mul
>     > >     - log_softmax
>     > >     and few other operators
>     > >
>     > >     Also, several operators runs a lot faster on 1.5 compared to
> 1.4.1.
>     > For
>     > >     example - Convolution, flatten, elementwise operators etc. So
> I see
>     > that
>     > >     likely few operators have regressed noticeably, however, due
> to other
>     > >     operator performance improvements, the end effect is not that
>     > > significant
>     > >     hiding a lot of regression. We need more detailed analysis per
>     > operator
>     > >     performance. We will not be able to do this for current
> release, we
>     > > should
>     > >     have a more concrete way to determining such performance
> regression
>     > > before
>     > >     next release.
>     > >
>     > >     Setup:
>     > >     1.5 => Build from source (head of 1.5.rc2 tag), built with
> MKLDNN
>     > >     1.4.1 => PyPi mxnet-mkl==1.4.1
>     > >     Machine: C5.18X
>     > >     No explicit environment variable were set
>     > >     Operator benchmark code -
>     > >
>     >
> https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
>     > >
>     > >     Best,
>     > >     Sandeep
>     > >
>     > >
>     > >     On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
>     > > pedro.larroy.li...@gmail.com>
>     > >     wrote:
>     > >
>     > >     > I will try to run a few benchmarks in a bare metal instance
>     > tonight to
>     > >     > remove virtualization variance for the measurements and
> provide
>     > some
>     > >     > numbers.
>     > >     >
>     > >     > Please propose a set of models / examples that would be
> desirable
>     > to
>     > >     > run before the release and provide a link to an easy to run
> script
>     > >     > with instructions so we can validate the release better.
>     > >     >
>     > >     > Thank you.
>     > >     >
>     > >     > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <
> roywei...@gmail.com>
>     > wrote:
>     > >     > >
>     > >     > > Dear @dev,
>     > >     > >
>     > >     > > I m cancelling the vote for cached op fix:
>     > >     > >
>     > >     > > https://github.com/apache/incubator-mxnet/pull/15298
>     > >     > >
>     > >     > > As for the possible cpu training regression, it looks like
> not a
>     > > blocker
>     > >     > > for now.
>     > >     > >
>     > >     > > I will start a new rc2 vote, please help to validate.
>     > >     > >
>     > >     > > Thanks!
>     > >     > >
>     > >     > >
>     > >     > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <
>     > ciyong.c...@intel.com
>     > > >
>     > >     > wrote:
>     > >     > >
>     > >     > > > Hi Pedro,
>     > >     > > >
>     > >     > > > I was able to reproduced the similar result (v1.5 is
> ~%5.6
>     > slower
>     > > than
>     > >     > > > v1.4, I was using 18 cores for computing) with your
> script on
>     > >     > C5.18xlarge.
>     > >     > > > But need to bind the cores with below command when
> running the
>     > > script,
>     > >     > > > (without setting the env variables, I got a close time
> (<1%)
>     > with
>     > > v1.5
>     > >     > and
>     > >     > > > v1.4)
>     > >     > > >         export
>     > > KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
>     > >     > > >         export OMP_NUM_THREADS=18
>     > >     > > >
>     > >     > > > Did you set any env variables during running?
>     > >     > > >
>     > >     > > > The performance result I got as below:
>     > >     > > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>     > >     > > > real    12m10.856s
>     > >     > > > user    234m49.576s
>     > >     > > > sys     4m38.044s
>     > >     > > >
>     > >     > > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>     > >     > > > real    12m52.140s
>     > >     > > > user    246m30.740s
>     > >     > > > sys     5m8.188s
>     > >     > > >
>     > >     > > > As I looked at the profiling data, most of the ops have
> same
>     > perf
>     > >     > between
>     > >     > > > v1.4 and v1.5. But some ops like " _backward_BatchNorm"
> and
>     > > "Pooling"
>     > >     > is
>     > >     > > > ~1.37x slower on v1.5 compared with v1.4.
>     > >     > > > Will do further analysis on these ops.
>     > >     > > >
>     > >     > > > Here's the hardware/OS info from my side:
>     > >     > > > ----------Python Info----------
>     > >     > > > Version      : 3.6.8
>     > >     > > > Compiler     : GCC 7.3.0
>     > >     > > > Build        : ('default', 'Dec 30 2018 01:22:34')
>     > >     > > > Arch         : ('64bit', '')
>     > >     > > > ------------Pip Info-----------
>     > >     > > > Version      : 19.0.3
>     > >     > > > Directory    :
>     > >     > > >
>     > >
> /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
>     > >     > > > ----------MXNet Info-----------
>     > >     > > > Version      : 1.5.0
>     > >     > > > Directory    :
> /home/ubuntu/ws/incubator-mxnet/python/mxnet
>     > >     > > > Hashtag not found. Not installed from pre-built package.
>     > >     > > > ----------System Info----------
>     > >     > > > Platform     :
>     > Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
>     > >     > > > system       : Linux
>     > >     > > > node         : ip-172-31-32-129
>     > >     > > > release      : 4.4.0-1085-aws
>     > >     > > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC
> 2019
>     > >     > > > ----------Hardware Info----------
>     > >     > > > machine      : x86_64
>     > >     > > > processor    : x86_64
>     > >     > > > Architecture:          x86_64
>     > >     > > > CPU op-mode(s):        32-bit, 64-bit
>     > >     > > > Byte Order:            Little Endian
>     > >     > > > CPU(s):                72
>     > >     > > > On-line CPU(s) list:   0-71
>     > >     > > > Thread(s) per core:    2
>     > >     > > > Core(s) per socket:    18
>     > >     > > > Socket(s):             2
>     > >     > > > NUMA node(s):          2
>     > >     > > > Vendor ID:             GenuineIntel
>     > >     > > > CPU family:            6
>     > >     > > > Model:                 85
>     > >     > > > Model name:            Intel(R) Xeon(R) Platinum 8124M
> CPU @
>     > > 3.00GHz
>     > >     > > > Stepping:              3
>     > >     > > > CPU MHz:               3000.000
>     > >     > > > BogoMIPS:              6000.00
>     > >     > > > Hypervisor vendor:     KVM
>     > >     > > > Virtualization type:   full
>     > >     > > > L1d cache:             32K
>     > >     > > > L1i cache:             32K
>     > >     > > > L2 cache:              1024K
>     > >     > > > L3 cache:              25344K
>     > >     > > > NUMA node0 CPU(s):     0-17,36-53
>     > >     > > > NUMA node1 CPU(s):     18-35,54-71
>     > >     > > > Flags:                 fpu vme de pse tsc msr pae mce
> cx8 apic
>     > > sep mtrr
>     > >     > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> syscall
>     > nx
>     > >     > pdpe1gb
>     > >     > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl
> xtopology
>     > > nonstop_tsc
>     > >     > > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3
> fma cx16
>     > > pcid
>     > >     > sse4_1
>     > >     > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave
> avx
>     > f16c
>     > > rdrand
>     > >     > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single
> kaiser
>     > > fsgsbase
>     > >     > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx
> avx512f
>     > > rdseed
>     > >     > adx
>     > >     > > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1
> ida arat
>     > pku
>     > >     > > > ----------Network Test----------
>     > >     > > >
>     > >     > > >
>     > >     > > > -Ciyong
>     > >     > > >
>     > >     > > >
>     > >     > > > -----Original Message-----
>     > >     > > > From: Zhao, Patric [mailto:patric.z...@intel.com]
>     > >     > > > Sent: Thursday, June 27, 2019 9:55 AM
>     > >     > > > To: dev@mxnet.incubator.apache.org
>     > >     > > > Cc: d...@mxnet.apache.org
>     > >     > > > Subject: RE: [VOTE] Release Apache MXNet (incubating)
> version
>     > > 1.5.0.rc1
>     > >     > > >
>     > >     > > > Could we run more epochs to see the performance
> difference or
>     > > profiling
>     > >     > > > the difference between good and bad run?
>     > >     > > >
>     > >     > > > > -----Original Message-----
>     > >     > > > > From: Pedro Larroy [mailto:
> pedro.larroy.li...@gmail.com]
>     > >     > > > > Sent: Thursday, June 27, 2019 9:35 AM
>     > >     > > > > To: dev@mxnet.incubator.apache.org
>     > >     > > > > Cc: d...@mxnet.apache.org
>     > >     > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
> version
>     > >     > > > > 1.5.0.rc1
>     > >     > > > >
>     > >     > > > > I run again and the gap is again bigger, I guess we
> need to
>     > > average
>     > >     > > > > out the times across several runs:
>     > >     > > > >
>     > >     > > > > piotr@ip-172-31-63-171
> :0:~/deeplearning-benchmark/dawnbench
>     > >     > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python
> cifar10.py
>     > > --epochs 5
>     > >     > > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py
> --epochs 5
>     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > ImageRecordIOParser2:
>     > >     > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
>     > 4
>     > >     > threads
>     > >     > > > > for decoding..
>     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > completed
>     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > ImageRecordIOParser2:
>     > >     > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>     > > threads
>     > >     > > > > for decoding..
>     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > completed
>     > >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
> 0.0005,
>     > > 300:
>     > >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05
> [23:17:09]
>     > >     > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > >     > > > > 147456 bytes with malloc directly
>     > >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     > Allocate
>     > >     > > > > 589824 bytes with malloc directly
>     > >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     > Allocate
>     > >     > > > > 2359296 bytes with malloc directly
>     > >     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     > Allocate
>     > >     > > > > 9437184 bytes with malloc directly
>     > >     > > > > Epoch 0, Batch 199, Speed=384.149839
>     > >     > > > > Epoch 0, Duration=140.919567
>     > >     > > > > Epoch 0, Training accuracy=0.115169
>     > >     > > > > Epoch 0, Validation accuracy=0.141317
>     > >     > > > > Epoch 1, Batch 199, Speed=433.380512
>     > >     > > > > Epoch 1, Duration=119.553233
>     > >     > > > > Epoch 1, Training accuracy=0.170956
>     > >     > > > > Epoch 1, Validation accuracy=0.216146
>     > >     > > > > Epoch 2, Batch 199, Speed=434.864699
>     > >     > > > > Epoch 2, Duration=123.278490
>     > >     > > > > Epoch 2, Training accuracy=0.209455
>     > >     > > > > Epoch 2, Validation accuracy=0.247296
>     > >     > > > > Epoch 3, Batch 199, Speed=433.401854
>     > >     > > > > Epoch 3, Duration=118.327797
>     > >     > > > > Epoch 3, Training accuracy=0.248701
>     > >     > > > > Epoch 3, Validation accuracy=0.302083
>     > >     > > > > Epoch 4, Batch 199, Speed=419.713707
>     > >     > > > > Epoch 4, Duration=126.468409
>     > >     > > > > Epoch 4, Training accuracy=0.260949
>     > >     > > > > Epoch 4, Validation accuracy=0.269030
>     > >     > > > >
>     > >     > > > > real    10m55.796s
>     > >     > > > > user    399m33.567s
>     > >     > > > > sys     13m55.904s
>     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > ImageRecordIOParser2:
>     > >     > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use
>     > 4
>     > >     > threads
>     > >     > > > > for decoding..
>     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > completed
>     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > ImageRecordIOParser2:
>     > >     > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>     > > threads
>     > >     > > > > for decoding..
>     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248:
> Load mean
>     > > image
>     > >     > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > completed
>     > >     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
> 0.0005,
>     > > 300:
>     > >     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch
> 0, Batch
>     > > 199,
>     > >     > > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0,
>     > Training
>     > >     > > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359
>     > Epoch 1,
>     > >     > Batch
>     > >     > > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399
> Epoch 1,
>     > > Training
>     > >     > > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419
>     > Epoch 2,
>     > >     > Batch
>     > >     > > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770
> Epoch 2,
>     > > Training
>     > >     > > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073
>     > Epoch 3,
>     > >     > Batch
>     > >     > > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316
> Epoch 3,
>     > > Training
>     > >     > > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870
>     > Epoch 4,
>     > >     > Batch
>     > >     > > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325
> Epoch 4,
>     > > Training
>     > >     > > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
>     > >     > > > >
>     > >     > > > > real    11m45.329s
>     > >     > > > > user    426m13.908s
>     > >     > > > > sys     16m45.093s
>     > >     > > > >
>     > >     > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
>     > >     > > > > <pedro.larroy.li...@gmail.com> wrote:
>     > >     > > > > >
>     > >     > > > > > The difference looks smaller now, more like your
> numbers. I
>     > > wonder
>     > >     > > > > > if something happened during the previous benchmark
> like a
>     > > system
>     > >     > > > > > update...
>     > >     > > > > >
>     > >     > > > > >
>     > >     > > > > > piotr@ip-172-31-63-171
>     > :0:~/deeplearning-benchmark/dawnbench
>     > >     > > > > (master)+$
>     > >     > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
> --epochs 5
>     > &&
>     > > time
>     > >     > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
>     > > [22:49:41]
>     > >     > > > > > ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > > ImageRecordIOParser2:
>     > >     > > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
>     > use 4
>     > >     > > > > > threads for decoding..
>     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > completed
>     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > > ImageRecordIOParser2:
>     > >     > > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
>     > use 4
>     > >     > > > > > threads for decoding..
>     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > completed
>     > >     > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
>     > 0.0005,
>     > > 300:
>     > >     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05
> [22:49:42]
>     > >     > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > >     > > > > > 147456 bytes with malloc directly
>     > >     > > > > > [22:49:42]
> ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     > > Allocate
>     > >     > > > > > 589824 bytes with malloc directly
>     > >     > > > > > [22:49:42]
> ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     > > Allocate
>     > >     > > > > > 2359296 bytes with malloc directly
>     > >     > > > > > [22:49:42]
> ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
>     > > Allocate
>     > >     > > > > > 9437184 bytes with malloc directly
>     > >     > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
>     > > Duration=134.868458
>     > >     > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0,
> Validation
>     > >     > > > > > accuracy=0.206388 Epoch 1, Batch 199,
> Speed=313.127156
>     > Epoch
>     > > 1,
>     > >     > > > > > Duration=128.041775 Epoch 1, Training
> accuracy=0.182065
>     > Epoch
>     > > 1,
>     > >     > > > > > Validation accuracy=0.202524 Epoch 2, Batch 199,
>     > > Speed=410.931187
>     > >     > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
>     > > accuracy=0.202584
>     > >     > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch
> 199,
>     > >     > > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch
> 3,
>     > > Training
>     > >     > > > > > accuracy=0.235854 Epoch 3, Validation
> accuracy=0.291066
>     > Epoch
>     > > 4,
>     > >     > > > > > Batch 199, Speed=430.473733 Epoch 4,
> Duration=130.181724
>     > > Epoch 4,
>     > >     > > > > > Training accuracy=0.257773 Epoch 4, Validation
>     > > accuracy=0.304988
>     > >     > > > > >
>     > >     > > > > > real    11m7.356s
>     > >     > > > > > user    406m9.910s
>     > >     > > > > > sys     14m18.349s
>     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > > ImageRecordIOParser2:
>     > >     > > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/train.rec,
>     > use 4
>     > >     > > > > > threads for decoding..
>     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > completed
>     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
>     > >     > > > > > ImageRecordIOParser2:
>     > >     > > > > >
> /home/piotr/deeplearning-benchmark/data/cifar/test.rec,
>     > use 4
>     > >     > > > > > threads for decoding..
>     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248:
> Load
>     > mean
>     > > image
>     > >     > > > > > from
> /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > >     > > > > completed
>     > >     > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123:
>     > 0.0005,
>     > > 300:
>     > >     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch
> 0,
>     > Batch
>     > > 199,
>     > >     > > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch
> 0,
>     > > Training
>     > >     > > > > > accuracy=0.124121 Epoch 0, Validation
> accuracy=0.167227
>     > Epoch
>     > > 1,
>     > >     > > > > > Batch 199, Speed=452.790825 Epoch 1,
> Duration=130.199421
>     > > Epoch 1,
>     > >     > > > > > Training
>     > >     > > > > > accuracy=0.183863 Epoch 1, Validation
> accuracy=0.237079
>     > Epoch
>     > > 2,
>     > >     > > > > > Batch 199, Speed=451.406559 Epoch 2,
> Duration=126.320823
>     > > Epoch 2,
>     > >     > > > > > Training
>     > >     > > > > > accuracy=0.214844 Epoch 2, Validation
> accuracy=0.244692
>     > Epoch
>     > > 3,
>     > >     > > > > > Batch 199, Speed=403.161873 Epoch 3,
> Duration=125.331660
>     > > Epoch 3,
>     > >     > > > > > Training
>     > >     > > > > > accuracy=0.243506 Epoch 3, Validation
> accuracy=0.301182
>     > Epoch
>     > > 4,
>     > >     > > > > > Batch 199, Speed=450.826598 Epoch 4,
> Duration=126.426253
>     > > Epoch 4,
>     > >     > > > > > Training
>     > >     > > > > > accuracy=0.266424 Epoch 4, Validation
> accuracy=0.311899
>     > >     > > > > >
>     > >     > > > > > real    11m21.930s
>     > >     > > > > > user    415m3.855s
>     > >     > > > > > sys     13m53.975s
>     > >     > > > > >
>     > >     > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
>     > >     > > > > > <pedro.larroy.li...@gmail.com> wrote:
>     > >     > > > > > >
>     > >     > > > > > > Hi Ciyong, thanks for trying to reproduce:
>     > >     > > > > > >
>     > >     > > > > > > I used this one:
>     > >     > > > > > > https://github.com/awslabs/deeplearning-
>     > >     > > > > benchmark/blob/master/dawnbe
>     > >     > > > > > > nch/cifar10.py
>     > >     > > > > > >
>     > >     > > > > > > Could you provide hardware and OS details?
>     > >     > > > > > >
>     > >     > > > > > > I will rerun and repost numbers in a few minutes.
>     > >     > > > > > >
>     > >     > > > > > > Pedro.
>     > >     > > > > > >
>     > >     > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
>     > >     > > > > > > <ciyong.c...@intel.com>
>     > >     > > > > wrote:
>     > >     > > > > > > >
>     > >     > > > > > > > Hi Pedro,
>     > >     > > > > > > >
>     > >     > > > > > > > I'm looking at this case, and using the script of
>     > >     > > > > > > >
>     > > "incubator-mxnet/example/image-classification/train_cifar10.py"
>     > >     > > > > > > > to get
>     > >     > > > > the timing data, but seems there's not much difference
>     > between
>     > > mxnet
>     > >     > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
>     > >     > > > > > > >
>     > >     > > > > > > > Not sure if there's any difference in the python
>     > script,
>     > > can
>     > >     > you
>     > >     > > > > > > > point me
>     > >     > > > > the link to get your script (cifar10.py)?
>     > >     > > > > > > > Or you can also have a try with MXNet's script
>     > >     > > > > > > > (train_cifar10.py) and see
>     > >     > > > > the performance.
>     > >     > > > > > > >
>     > >     > > > > > > > Here's the command I used to collect the time:
>     > >     > > > > > > >         python train_cifar10.py --num-epoch=5
>     > >     > > > > > > >
>     > >     > > > > > > > 1) 1.5.0.rc1
> (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>     > >     > > > > > > >         real    9m4.880s
>     > >     > > > > > > >         user    333m13.340s
>     > >     > > > > > > >         sys     14m36.100s
>     > >     > > > > > > >
>     > >     > > > > > > > 2) 1.4.1.rc0
> (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>     > >     > > > > > > >         real    9m2.155s
>     > >     > > > > > > >         user    329m37.092s
>     > >     > > > > > > >         sys     16m8.668s
>     > >     > > > > > > >
>     > >     > > > > > > > -Ciyong
>     > >     > > > > > > >
>     > >     > > > > > > >
>     > >     > > > > > > > -----Original Message-----
>     > >     > > > > > > > From: Pedro Larroy [mailto:
>     > pedro.larroy.li...@gmail.com]
>     > >     > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
>     > >     > > > > > > > To: dev@mxnet.incubator.apache.org
>     > >     > > > > > > > Cc: d...@mxnet.apache.org
>     > >     > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
> (incubating)
>     > > version
>     > >     > > > > > > > 1.5.0.rc1
>     > >     > > > > > > >
>     > >     > > > > > > > Hi these were my build flags and system info:
>     > >     > > > > > > >
>     > >     > > > > > > >
>     > >     > > > > > > > --- # CMake configuration
>     > >     > > > > > > > USE_CUDA: "OFF" # Build with CUDA support
>     > >     > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake
> cuda
>     > >     > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
>     > >     > > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
>     > >     > > > > > > > USE_OPENMP: "ON" # Build with Openmp support
>     > >     > > > > > > > USE_CUDNN: "ON" # Build with cudnn support) #
> one could
>     > > set
>     > >     > > > > > > > CUDNN_ROOT for search path
>     > >     > > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction
> support
>     > IF
>     > > NOT
>     > >     > > > > > > > ARM
>     > >     > > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction
>     > support)
>     > > #
>     > >     > > > > autodetects support if "ON"
>     > >     > > > > > > > USE_LAPACK: "ON" # Build with lapack support
>     > >     > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
>     > >     > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL
> (if MKL
>     > > found)
>     > >     > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>     > >     > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if
> MKL
>     > > found) IF
>     > >     > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>     > >     > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of
>     > > operators IF
>     > >     > > > > NOT
>     > >     > > > > > > > MSVC
>     > >     > > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools
> support
>     > (if
>     > > found)
>     > >     > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
>     > >     > > > > > > > USE_PROFILER: "ON" # Build with Profiler support
>     > >     > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE
>     > support
>     > >     > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
>     > >     > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
>     > >     > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
>     > >     > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library
> naming
>     > >     > > > > conventions.
>     > >     > > > > > > > USE_GPROF: "OFF" # Compile with gprof
> (profiling) flag
>     > >     > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14
> if the
>     > >     > compiler
>     > >     > > > > > > > supports it
>     > >     > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier
> XE
>     > > (VTune)) #
>     > >     > > > > > > > one could set VTUNE_ROOT for search path
>     > >     > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime
>     > > compilation
>     > >     > > > > > > > support
>     > >     > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
>     > >     > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example
> source
>     > > files.
>     > >     > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on
>     > > segfaults.
>     > >     > > > > > > > USE_TENSORRT: "OFF" # Enable infeference
> optimization
>     > with
>     > >     > > > TensorRT.
>     > >     > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN
> sanitizers.
>     > >     > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation
> with
>     > test
>     > >     > > > > > > > coverage metric output
>     > >     > > > > > > > CMAKE_BUILD_TYPE: "Release"
>     > >     > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
>     > >     > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
>     > >     > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
>     > >     > > > > > > >
>     > >     > > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde
> (HEAD,
>     > > tag:
>     > >     > > > > > > > 1.5.0.rc1,
>     > >     > > > > > > > upstream/v1.5.x)
>     > >     > > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590
> (HEAD,
>     > > tag:
>     > >     > > > > > > > 1.4.1.rc0,
>     > >     > > > > > > > upstream/v1.4.x)
>     > >     > > > > > > >
>     > >     > > > > > > > curl
>     > http://169.254.169.254/latest/meta-data/instance-type
>     > >     > > > > > > > c5d.18xlarge
>     > >     > > > > > > >
>     > >     > > > > > > >
>     > >     > > > > > > > Version      : 3.6.7
>     > >     > > > > > > > Compiler     : GCC 8.2.0
>     > >     > > > > > > > Build        : ('default', 'Oct 22 2018
> 11:32:17')
>     > >     > > > > > > > Arch         : ('64bit', 'ELF')
>     > >     > > > > > > > ------------Pip Info-----------
>     > >     > > > > > > > Version      : 19.1.1
>     > >     > > > > > > > Directory    :
>     > >     > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
>     > >     > > > > packages/pip
>     > >     > > > > > > > ----------MXNet Info-----------
>     > >     > > > > > > > Version      : 1.5.0
>     > >     > > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
>     > >     > > > > > > > Hashtag not found. Not installed from pre-built
>     > package.
>     > >     > > > > > > > ----------System Info----------
>     > >     > > > > > > > Platform     :
>     > >     > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>     > >     > > > > > > > system       : Linux
>     > >     > > > > > > > node         : ip-172-31-63-171
>     > >     > > > > > > > release      : 4.15.0-1035-aws
>     > >     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18
> 16:15:14 UTC
>     > 2019
>     > >     > > > > > > > ----------Hardware Info----------
>     > >     > > > > > > > machine      : x86_64
>     > >     > > > > > > > processor    : x86_64
>     > >     > > > > > > > Architecture:        x86_64
>     > >     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
>     > >     > > > > > > > Byte Order:          Little Endian
>     > >     > > > > > > > CPU(s):              72
>     > >     > > > > > > > On-line CPU(s) list: 0-71
>     > >     > > > > > > > Thread(s) per core:  2
>     > >     > > > > > > > Core(s) per socket:  18
>     > >     > > > > > > > Socket(s):           2
>     > >     > > > > > > > NUMA node(s):        2
>     > >     > > > > > > > Vendor ID:           GenuineIntel
>     > >     > > > > > > > CPU family:          6
>     > >     > > > > > > > Model:               85
>     > >     > > > > > > > Model name:          Intel(R) Xeon(R) Platinum
> 8124M
>     > CPU @
>     > >     > 3.00GHz
>     > >     > > > > > > > Stepping:            4
>     > >     > > > > > > > CPU MHz:             1326.446
>     > >     > > > > > > > BogoMIPS:            6000.00
>     > >     > > > > > > > Hypervisor vendor:   KVM
>     > >     > > > > > > > Virtualization type: full
>     > >     > > > > > > > L1d cache:           32K
>     > >     > > > > > > > L1i cache:           32K
>     > >     > > > > > > > L2 cache:            1024K
>     > >     > > > > > > > L3 cache:            25344K
>     > >     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
>     > >     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
>     > >     > > > > > > > Flags:               fpu vme de pse tsc msr pae
> mce cx8
>     > > apic
>     > >     > sep
>     > >     > > > mtrr
>     > >     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2
> ss ht
>     > > syscall
>     > >     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon
> rep_good
>     > > nopl
>     > >     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni
> pclmulqdq
>     > > monitor
>     > >     > > > > > > > ssse3 fma cx16 pcid
>     > >     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt
> tsc_deadline_timer
>     > aes
>     > > xsave
>     > >     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm
> 3dnowprefetch
>     > >     > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle
> avx2
>     > smep
>     > > bmi2
>     > >     > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx
> smap
>     > >     > clflushopt
>     > >     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec
> xgetbv1
>     > > xsaves
>     > >     > > > > > > > ida arat pku ospke ----------Network
> Test----------
>     > >     > > > > > > >
>     > >     > > > > > > > ----------Python Info----------
>     > >     > > > > > > > Version      : 3.6.7
>     > >     > > > > > > > Compiler     : GCC 8.2.0
>     > >     > > > > > > > Build        : ('default', 'Oct 22 2018
> 11:32:17')
>     > >     > > > > > > > Arch         : ('64bit', 'ELF')
>     > >     > > > > > > > ------------Pip Info-----------
>     > >     > > > > > > > Version      : 19.1.1
>     > >     > > > > > > > Directory    :
>     > >     > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
>     > >     > > > > packages/pip
>     > >     > > > > > > > ----------MXNet Info-----------
>     > >     > > > > > > > Version      : 1.4.1
>     > >     > > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
>     > >     > > > > > > > Hashtag not found. Not installed from pre-built
>     > package.
>     > >     > > > > > > > ----------System Info----------
>     > >     > > > > > > > Platform     :
>     > >     > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>     > >     > > > > > > > system       : Linux
>     > >     > > > > > > > node         : ip-172-31-63-171
>     > >     > > > > > > > release      : 4.15.0-1035-aws
>     > >     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18
> 16:15:14 UTC
>     > 2019
>     > >     > > > > > > > ----------Hardware Info----------
>     > >     > > > > > > > machine      : x86_64
>     > >     > > > > > > > processor    : x86_64
>     > >     > > > > > > > Architecture:        x86_64
>     > >     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
>     > >     > > > > > > > Byte Order:          Little Endian
>     > >     > > > > > > > CPU(s):              72
>     > >     > > > > > > > On-line CPU(s) list: 0-71
>     > >     > > > > > > > Thread(s) per core:  2
>     > >     > > > > > > > Core(s) per socket:  18
>     > >     > > > > > > > Socket(s):           2
>     > >     > > > > > > > NUMA node(s):        2
>     > >     > > > > > > > Vendor ID:           GenuineIntel
>     > >     > > > > > > > CPU family:          6
>     > >     > > > > > > > Model:               85
>     > >     > > > > > > > Model name:          Intel(R) Xeon(R) Platinum
> 8124M
>     > CPU @
>     > >     > 3.00GHz
>     > >     > > > > > > > Stepping:            4
>     > >     > > > > > > > CPU MHz:             1223.344
>     > >     > > > > > > > BogoMIPS:            6000.00
>     > >     > > > > > > > Hypervisor vendor:   KVM
>     > >     > > > > > > > Virtualization type: full
>     > >     > > > > > > > L1d cache:           32K
>     > >     > > > > > > > L1i cache:           32K
>     > >     > > > > > > > L2 cache:            1024K
>     > >     > > > > > > > L3 cache:            25344K
>     > >     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
>     > >     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
>     > >     > > > > > > > Flags:               fpu vme de pse tsc msr pae
> mce cx8
>     > > apic
>     > >     > sep
>     > >     > > > mtrr
>     > >     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2
> ss ht
>     > > syscall
>     > >     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon
> rep_good
>     > > nopl
>     > >     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni
> pclmulqdq
>     > > monitor
>     > >     > > > > > > > ssse3 fma cx16 pcid
>     > >     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt
> tsc_deadline_timer
>     > aes
>     > > xsave
>     > >     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm
> 3dnowprefetch
>     > >     > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle
> avx2
>     > smep
>     > > bmi2
>     > >     > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx
> smap
>     > >     > clflushopt
>     > >     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec
> xgetbv1
>     > > xsaves
>     > >     > > > > > > > ida arat pku ospke ----------Network
> Test----------
>     > >     > > > > > > >
>     > >     > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
>     > >     > > > > <pedro.larroy.li...@gmail.com> wrote:
>     > >     > > > > > > > >
>     > >     > > > > > > > > I did a training of cifar10 in CPU and seems
> there's
>     > > some
>     > >     > > > > > > > > regressions in the range of 7% increase of
> training
>     > time
>     > >     > against
>     > >     > > > 1.4.1:
>     > >     > > > > > > > >
>     > >     > > > > > > > > (py3_venv)
>     > >     > > > > > > > > piotr@ip-172-31-63-171
>     > > :0:~/deeplearning-benchmark/dawnbench
>     > >     > > > > > > > > (master)+$ time python cifar10.py --epochs 5
>     > >     > > > > > > > > real    11m30.388s
>     > >     > > > > > > > > user    417m7.766s
>     > >     > > > > > > > > sys     16m57.315s
>     > >     > > > > > > > >
>     > >     > > > > > > > > VS 1.4.1:
>     > >     > > > > > > > > real    10m41.994s
>     > >     > > > > > > > > user    392m40.646s
>     > >     > > > > > > > > sys     12m30.601s
>     > >     > > > > > > > >
>     > >     > > > > > > > >
>     > >     > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
>     > >     > roywei...@gmail.com>
>     > >     > > > > wrote:
>     > >     > > > > > > > > >
>     > >     > > > > > > > > > Hi Anirudh,
>     > >     > > > > > > > > >
>     > >     > > > > > > > > > Thanks for jumping into this quickly, I
> followed up
>     > > on the
>     > >     > > > issue.
>     > >     > > > > > > > > >
>     > >     > > > > > > > > > I was meant for sockeye
> developer/maintainers to
>     > help
>     > > setup
>     > >     > > > > > > > > > nightly tests and raise issues early.
>     > >     > > > > > > > > >
>     > >     > > > > > > > > > Thanks!
>     > >     > > > > > > > > >
>     > >     > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
>     > >     > > > > > > > > > <haibin.lin....@gmail.com>
>     > >     > > > > > > > > > wrote:
>     > >     > > > > > > > > >
>     > >     > > > > > > > > > > In GluonNLP we are testing with MXNET
> nightly
>     > build
>     > > for
>     > >     > > > > > > > > > > each PR, and we did find some MXNet
> related issue
>     > > caught
>     > >     > by
>     > >     > > > the CI.
>     > >     > > > > > > > > > > I recommend other toolkits also add
> integration
>     > > tests
>     > >     > with
>     > >     > > > > > > > > > > MXNet
>     > >     > > > > nightly.
>     > >     > > > > > > > > > > It helps identify issues early.
>     > >     > > > > > > > > > >
>     > >     > > > > > > > > > > Best,
>     > >     > > > > > > > > > > Haibin
>     > >     > > > > > > > > > >
>     > >     > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
>     > >     > > > > > > > > > > <patric.z...@intel.com>
>     > >     > > > > wrote:
>     > >     > > > > > > > > > >
>     > >     > > > > > > > > > > > Thanks to raise the issue and we will
> take a
>     > look
>     > > ASAP.
>     > >     > > > > > > > > > > >
>     > >     > > > > > > > > > > > The downstream cases is not in the MXNet
> CI so
>     > > it's
>     > >     > hard
>     > >     > > > > > > > > > > > to catch the potential bugs or
> performance
>     > > degradation
>     > >     > > > > > > > > > > > for
>     > >     > > > > MXNet developers.
>     > >     > > > > > > > > > > >
>     > >     > > > > > > > > > > > In the future, I suggest adding the major
>     > > downstream
>     > >     > > > > > > > > > > > test cases, like
>     > >     > > > > > > > > > > from
>     > >     > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL,
> Gluon-TS, into
>     > > the
>     > >     > > > > > > > > > > > nightly
>     > >     > > > > test.
>     > >     > > > > > > > > > > > If it's still too heavy,  maybe testing
> it
>     > weekly
>     > > or
>     > >     > > > > > > > > > > > monthly :)
>     > >     > > > > > > > > > > >
>     > >     > > > > > > > > > > > Thanks,
>     > >     > > > > > > > > > > >
>     > >     > > > > > > > > > > > --Patric
>     > >     > > > > > > > > > > >
>     > >     > > > > > > > > > > > > -----Original Message-----
>     > >     > > > > > > > > > > > > From: Anirudh Subramanian
>     > >     > > > > > > > > > > > > [mailto:anirudh2...@gmail.com]
>     > >     > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
>     > >     > > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
>     > >     > > > > > > > > > > > > Cc: d...@mxnet.apache.org
>     > >     > > > > > > > > > > > > Subject: Re: [VOTE] Release Apache
> MXNet
>     > > (incubating)
>     > >     > > > > > > > > > > > > version
>     > >     > > > > > > > > > > > > 1.5.0.rc1
>     > >     > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > Hi Lai,
>     > >     > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > I have opened an issue:
>     > >     > > > > > > > > > > > >
>     > >     > https://github.com/apache/incubator-mxnet/issues/15297
>     > >     > > > > > > > > > > > > I came to know about this issue only
> today
>     > and
>     > > I have
>     > >     > > > > > > > > > > > > not been
>     > >     > > > > > > > > > > monitoring
>     > >     > > > > > > > > > > > > sockeye.
>     > >     > > > > > > > > > > > > I jumped onto this issue to make sure
> it
>     > wasn't
>     > >     > caused
>     > >     > > > > > > > > > > > > by the dlpack
>     > >     > > > > > > > > > > > changes.
>     > >     > > > > > > > > > > > > Also, I don't  think sockeye CI checks
>     > against
>     > >     > master,
>     > >     > > > > > > > > > > > > it is using
>     > >     > > > > > > > > > > 1.4.1.
>     > >     > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > Anirudh
>     > >     > > > > > > > > > > > >
>     > >     > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
>     > >     > > > > > > > > > > > > <roywei...@gmail.com>
>     > >     > > > > wrote:
>     > >     > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > Hi,
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > Could you share which test failed and
>     > what’s
>     > > the
>     > >     > > > > > > > > > > > > > crash? How to reproduce it?
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > I was able to install sockeye and
> run all
>     > > tests
>     > >     > passed.
>     > >     > > > > > > > > > > > > > Using python setup.py test
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > I have tested both nightly pip
> package and
>     > >     > 1.5.0.rc1
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > It would be great to create an issue
> with
>     > >     > > > > > > > > > > > > > reproducible steps and move the
> discussion
>     > > there.
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > Also I see sockeye nightly build[1]
> has
>     > been
>     > >     > failing
>     > >     > > > > > > > > > > > > > for some time,
>     > >     > > > > > > > > > > if
>     > >     > > > > > > > > > > > > > it’s due to MXNet change, please
> raise this
>     > > early
>     > >     > so
>     > >     > > > > > > > > > > > > > we can track and solve it in time
> rather
>     > than
>     > > block
>     > >     > > > > > > > > > > > > > the release
>     > >     > > > > during vote time.
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > [1]
> https://travis-ci.org/awslabs/sockeye
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > >
>     > >     > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM
> Anirudh
>     > > Subramanian
>     > >     > > > > > > > > > > > >

Reply via email to