Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Pedro Larroy Thu, 27 Jun 2019 23:47:22 -0700

Thanks Manu.

@all: I observed other strange stuff that I don't understand at the moment:


I installed rc for 1.5 from pip to check that I'm not doing something
wrong when building. And I found out that the usage of CPU is quite
subpar ( https://imgur.com/fRmbQNc ) compared to a version compiled
from source. The pip package is using 4-5 cores of the 32. When I
compile from source I get good core utilization. (
https://imgur.com/e8BB425 ). I verified this also on a c5d.18xlarge
and a 32 core AMD bare metal machine.

Seems to me also that the version from pip is using gomp instead of
llvm's omp. I'm not sure why.

pip install mxnet==1.5.0b20190627
/home/piotr/py3_1.5rc/lib/python3.6/site-packages/mxnet
piotr@panther:0: ~/p/l/p/s/mxnet> ldd libmxnet.so | grep omp
    libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f99d1832000)

I tried cifar10 on a bare metal 32 core AMD Zen machine and is
extremely slow, doesn't seem to make much progress, when compared to a
c5d.18xlarge, I couldn't even do 1 epoch, tried with and without MKL
without much success. Will continue digging into this when possible.


Pedro.

On Thu, Jun 27, 2019 at 9:41 PM Manu Seth <manuseth1...@gmail.com> wrote:
>
> Hi all,
>
> I ran the same cifar10.py script as Pedro, but for 20 epochs. Considering
> the first 10 epochs for warm-up, I averaged time per epoch for the last 10
> epochs.
>
> With MXNet 1.4.1 average time is 164.23 s
> With MXNet 1.5.0 average time is 174.59 s (~6.3% regression)
>
>
> For a second data point, I ran Gluon speed test benchmark script -
> https://github.com/apache/incubator-mxnet/blob/master/benchmark/python/gluon/benchmark_gluon.py
> using the following command:
> python3 benchmark_gluon.py --model 'resnet152_v2' --batch-size 128
> --num-batches 200 --type 'training'
>
> I got the following speeds:
> With MXNet 1.4.1, average speed is 25.677534 img/s
> With MXNet 1.5.0, average speed is 25.082130 img/s (~2.3% regression)
>
> Note:
> For 1.4.1 version, I used pip install mxnet-mkl==1.4.1
> For 1.5.0 version, I used pip install mxnet-mkl==1.5.0b20190619 which
> corresponds to commit# ccbbf6b4b76ea536a6583c99497c83b65a20817b which is
> behind 1.5.x branch by 4 commits
>
>
> Best,
> Manu
>
>
> On 6/27/19, 3:37 PM, "sandeep krishnamurthy" <sandeep.krishn...@gmail.com>
> wrote:
>
>     Hello Ciyong/Pedro,
>
>     Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not complete, doesn’t
>     cover all MXNet operators, not presented in best possible way, still
> WIP)
>
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
>
>     Following operators looks slower in 1.5 compared to 1.4.1:
>     - BatchNorm
>     - Pooling
>     - FullyConnected
>     - batch_dot
>     - Dot
>     - broadcast_mul
>     - log_softmax
>     and few other operators
>
>     Also, several operators runs a lot faster on 1.5 compared to 1.4.1. For
>     example - Convolution, flatten, elementwise operators etc. So I see that
>     likely few operators have regressed noticeably, however, due to other
>     operator performance improvements, the end effect is not that
> significant
>     hiding a lot of regression. We need more detailed analysis per operator
>     performance. We will not be able to do this for current release, we
> should
>     have a more concrete way to determining such performance regression
> before
>     next release.
>
>     Setup:
>     1.5 => Build from source (head of 1.5.rc2 tag), built with MKLDNN
>     1.4.1 => PyPi mxnet-mkl==1.4.1
>     Machine: C5.18X
>     No explicit environment variable were set
>     Operator benchmark code -
>     https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
>
>     Best,
>     Sandeep
>
>
>     On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
> pedro.larroy.li...@gmail.com>
>     wrote:
>
>     > I will try to run a few benchmarks in a bare metal instance tonight to
>     > remove virtualization variance for the measurements and provide some
>     > numbers.
>     >
>     > Please propose a set of models / examples that would be desirable to
>     > run before the release and provide a link to an easy to run script
>     > with instructions so we can validate the release better.
>     >
>     > Thank you.
>     >
>     > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei <roywei...@gmail.com> wrote:
>     > >
>     > > Dear @dev,
>     > >
>     > > I m cancelling the vote for cached op fix:
>     > >
>     > > https://github.com/apache/incubator-mxnet/pull/15298
>     > >
>     > > As for the possible cpu training regression, it looks like not a
> blocker
>     > > for now.
>     > >
>     > > I will start a new rc2 vote, please help to validate.
>     > >
>     > > Thanks!
>     > >
>     > >
>     > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <ciyong.c...@intel.com
> >
>     > wrote:
>     > >
>     > > > Hi Pedro,
>     > > >
>     > > > I was able to reproduced the similar result (v1.5 is ~%5.6 slower
> than
>     > > > v1.4, I was using 18 cores for computing) with your script on
>     > C5.18xlarge.
>     > > > But need to bind the cores with below command when running the
> script,
>     > > > (without setting the env variables, I got a close time (<1%) with
> v1.5
>     > and
>     > > > v1.4)
>     > > >         export
> KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
>     > > >         export OMP_NUM_THREADS=18
>     > > >
>     > > > Did you set any env variables during running?
>     > > >
>     > > > The performance result I got as below:
>     > > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>     > > > real    12m10.856s
>     > > > user    234m49.576s
>     > > > sys     4m38.044s
>     > > >
>     > > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>     > > > real    12m52.140s
>     > > > user    246m30.740s
>     > > > sys     5m8.188s
>     > > >
>     > > > As I looked at the profiling data, most of the ops have same perf
>     > between
>     > > > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and
> "Pooling"
>     > is
>     > > > ~1.37x slower on v1.5 compared with v1.4.
>     > > > Will do further analysis on these ops.
>     > > >
>     > > > Here's the hardware/OS info from my side:
>     > > > ----------Python Info----------
>     > > > Version      : 3.6.8
>     > > > Compiler     : GCC 7.3.0
>     > > > Build        : ('default', 'Dec 30 2018 01:22:34')
>     > > > Arch         : ('64bit', '')
>     > > > ------------Pip Info-----------
>     > > > Version      : 19.0.3
>     > > > Directory    :
>     > > >
> /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
>     > > > ----------MXNet Info-----------
>     > > > Version      : 1.5.0
>     > > > Directory    : /home/ubuntu/ws/incubator-mxnet/python/mxnet
>     > > > Hashtag not found. Not installed from pre-built package.
>     > > > ----------System Info----------
>     > > > Platform     : Linux-4.4.0-1085-aws-x86_64-with-debian-stretch-sid
>     > > > system       : Linux
>     > > > node         : ip-172-31-32-129
>     > > > release      : 4.4.0-1085-aws
>     > > > version      : #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019
>     > > > ----------Hardware Info----------
>     > > > machine      : x86_64
>     > > > processor    : x86_64
>     > > > Architecture:          x86_64
>     > > > CPU op-mode(s):        32-bit, 64-bit
>     > > > Byte Order:            Little Endian
>     > > > CPU(s):                72
>     > > > On-line CPU(s) list:   0-71
>     > > > Thread(s) per core:    2
>     > > > Core(s) per socket:    18
>     > > > Socket(s):             2
>     > > > NUMA node(s):          2
>     > > > Vendor ID:             GenuineIntel
>     > > > CPU family:            6
>     > > > Model:                 85
>     > > > Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @
> 3.00GHz
>     > > > Stepping:              3
>     > > > CPU MHz:               3000.000
>     > > > BogoMIPS:              6000.00
>     > > > Hypervisor vendor:     KVM
>     > > > Virtualization type:   full
>     > > > L1d cache:             32K
>     > > > L1i cache:             32K
>     > > > L2 cache:              1024K
>     > > > L3 cache:              25344K
>     > > > NUMA node0 CPU(s):     0-17,36-53
>     > > > NUMA node1 CPU(s):     18-35,54-71
>     > > > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic
> sep mtrr
>     > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
>     > pdpe1gb
>     > > > rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology
> nonstop_tsc
>     > > > aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16
> pcid
>     > sse4_1
>     > > > sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c
> rdrand
>     > > > hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser
> fsgsbase
>     > > > tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
> rdseed
>     > adx
>     > > > smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
>     > > > ----------Network Test----------
>     > > >
>     > > >
>     > > > -Ciyong
>     > > >
>     > > >
>     > > > -----Original Message-----
>     > > > From: Zhao, Patric [mailto:patric.z...@intel.com]
>     > > > Sent: Thursday, June 27, 2019 9:55 AM
>     > > > To: dev@mxnet.incubator.apache.org
>     > > > Cc: d...@mxnet.apache.org
>     > > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
> 1.5.0.rc1
>     > > >
>     > > > Could we run more epochs to see the performance difference or
> profiling
>     > > > the difference between good and bad run?
>     > > >
>     > > > > -----Original Message-----
>     > > > > From: Pedro Larroy [mailto:pedro.larroy.li...@gmail.com]
>     > > > > Sent: Thursday, June 27, 2019 9:35 AM
>     > > > > To: dev@mxnet.incubator.apache.org
>     > > > > Cc: d...@mxnet.apache.org
>     > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
>     > > > > 1.5.0.rc1
>     > > > >
>     > > > > I run again and the gap is again bigger, I guess we need to
> average
>     > > > > out the times across several runs:
>     > > > >
>     > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
>     > > > > (master)+$ time ~/mxnet_1.4/py3_venv/bin/python cifar10.py
> --epochs 5
>     > > > > && time ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
>     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > ImageRecordIOParser2:
>     > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
>     > threads
>     > > > > for decoding..
>     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
>     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > ImageRecordIOParser2:
>     > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> threads
>     > > > > for decoding..
>     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > [23:17:09] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
>     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
>     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [23:17:09]
>     > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > > 147456 bytes with malloc directly
>     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > > 589824 bytes with malloc directly
>     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > > 2359296 bytes with malloc directly
>     > > > > [23:17:09] ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > > 9437184 bytes with malloc directly
>     > > > > Epoch 0, Batch 199, Speed=384.149839
>     > > > > Epoch 0, Duration=140.919567
>     > > > > Epoch 0, Training accuracy=0.115169
>     > > > > Epoch 0, Validation accuracy=0.141317
>     > > > > Epoch 1, Batch 199, Speed=433.380512
>     > > > > Epoch 1, Duration=119.553233
>     > > > > Epoch 1, Training accuracy=0.170956
>     > > > > Epoch 1, Validation accuracy=0.216146
>     > > > > Epoch 2, Batch 199, Speed=434.864699
>     > > > > Epoch 2, Duration=123.278490
>     > > > > Epoch 2, Training accuracy=0.209455
>     > > > > Epoch 2, Validation accuracy=0.247296
>     > > > > Epoch 3, Batch 199, Speed=433.401854
>     > > > > Epoch 3, Duration=118.327797
>     > > > > Epoch 3, Training accuracy=0.248701
>     > > > > Epoch 3, Validation accuracy=0.302083
>     > > > > Epoch 4, Batch 199, Speed=419.713707
>     > > > > Epoch 4, Duration=126.468409
>     > > > > Epoch 4, Training accuracy=0.260949
>     > > > > Epoch 4, Validation accuracy=0.269030
>     > > > >
>     > > > > real    10m55.796s
>     > > > > user    399m33.567s
>     > > > > sys     13m55.904s
>     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > ImageRecordIOParser2:
>     > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
>     > threads
>     > > > > for decoding..
>     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
>     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > ImageRecordIOParser2:
>     > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
> threads
>     > > > > for decoding..
>     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > [23:28:04] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
> completed
>     > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
>     > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> 199,
>     > > > > Speed=419.039188 Epoch 0, Duration=143.934903 Epoch 0, Training
>     > > > > accuracy=0.122542 Epoch 0, Validation accuracy=0.164359 Epoch 1,
>     > Batch
>     > > > > 199, Speed=445.257048 Epoch 1, Duration=135.248399 Epoch 1,
> Training
>     > > > > accuracy=0.178828 Epoch 1, Validation accuracy=0.199419 Epoch 2,
>     > Batch
>     > > > > 199, Speed=447.115215 Epoch 2, Duration=132.003770 Epoch 2,
> Training
>     > > > > accuracy=0.217808 Epoch 2, Validation accuracy=0.233073 Epoch 3,
>     > Batch
>     > > > > 199, Speed=441.079477 Epoch 3, Duration=126.543316 Epoch 3,
> Training
>     > > > > accuracy=0.248102 Epoch 3, Validation accuracy=0.293870 Epoch 4,
>     > Batch
>     > > > > 199, Speed=449.329787 Epoch 4, Duration=138.398325 Epoch 4,
> Training
>     > > > > accuracy=0.270021 Epoch 4, Validation accuracy=0.311498
>     > > > >
>     > > > > real    11m45.329s
>     > > > > user    426m13.908s
>     > > > > sys     16m45.093s
>     > > > >
>     > > > > On Wed, Jun 26, 2019 at 4:18 PM Pedro Larroy
>     > > > > <pedro.larroy.li...@gmail.com> wrote:
>     > > > > >
>     > > > > > The difference looks smaller now, more like your numbers. I
> wonder
>     > > > > > if something happened during the previous benchmark like a
> system
>     > > > > > update...
>     > > > > >
>     > > > > >
>     > > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
>     > > > > (master)+$
>     > > > > > time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 &&
> time
>     > > > > > ~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
> [22:49:41]
>     > > > > > ../src/io/iter_image_recordio_2.cc:172:
>     > > > > > ImageRecordIOParser2:
>     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
>     > > > > > threads for decoding..
>     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > completed
>     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > > ImageRecordIOParser2:
>     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>     > > > > > threads for decoding..
>     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > > [22:49:41] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > completed
>     > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
>     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 [22:49:42]
>     > > > > > ../src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate
>     > > > > > 147456 bytes with malloc directly
>     > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
>     > > > > > 589824 bytes with malloc directly
>     > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
>     > > > > > 2359296 bytes with malloc directly
>     > > > > > [22:49:42] ../src/operator/nn/mkldnn/mkldnn_base.cc:74:
> Allocate
>     > > > > > 9437184 bytes with malloc directly
>     > > > > > Epoch 0, Batch 199, Speed=426.182733 Epoch 0,
> Duration=134.868458
>     > > > > > Epoch 0, Training accuracy=0.127238 Epoch 0, Validation
>     > > > > > accuracy=0.206388 Epoch 1, Batch 199, Speed=313.127156 Epoch
> 1,
>     > > > > > Duration=128.041775 Epoch 1, Training accuracy=0.182065 Epoch
> 1,
>     > > > > > Validation accuracy=0.202524 Epoch 2, Batch 199,
> Speed=410.931187
>     > > > > > Epoch 2, Duration=124.920588 Epoch 2, Training
> accuracy=0.202584
>     > > > > > Epoch 2, Validation accuracy=0.245693 Epoch 3, Batch 199,
>     > > > > > Speed=419.119335 Epoch 3, Duration=120.948349 Epoch 3,
> Training
>     > > > > > accuracy=0.235854 Epoch 3, Validation accuracy=0.291066 Epoch
> 4,
>     > > > > > Batch 199, Speed=430.473733 Epoch 4, Duration=130.181724
> Epoch 4,
>     > > > > > Training accuracy=0.257773 Epoch 4, Validation
> accuracy=0.304988
>     > > > > >
>     > > > > > real    11m7.356s
>     > > > > > user    406m9.910s
>     > > > > > sys     14m18.349s
>     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > > ImageRecordIOParser2:
>     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/train.rec, use 4
>     > > > > > threads for decoding..
>     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > completed
>     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:172:
>     > > > > > ImageRecordIOParser2:
>     > > > > > /home/piotr/deeplearning-benchmark/data/cifar/test.rec, use 4
>     > > > > > threads for decoding..
>     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:230: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > > [23:00:49] ../src/io/iter_image_recordio_2.cc:248: Load mean
> image
>     > > > > > from /home/piotr/deeplearning-benchmark/data/cifar/mean.bin
>     > > > > completed
>     > > > > > lr_schedule: {0: 0.05, 82: 0.005000000000000001, 123: 0.0005,
> 300:
>     > > > > > 0.0001} Epoch 0, Changed learning rate to 0.05 Epoch 0, Batch
> 199,
>     > > > > > Speed=348.618154 Epoch 0, Duration=146.469352 Epoch 0,
> Training
>     > > > > > accuracy=0.124121 Epoch 0, Validation accuracy=0.167227 Epoch
> 1,
>     > > > > > Batch 199, Speed=452.790825 Epoch 1, Duration=130.199421
> Epoch 1,
>     > > > > > Training
>     > > > > > accuracy=0.183863 Epoch 1, Validation accuracy=0.237079 Epoch
> 2,
>     > > > > > Batch 199, Speed=451.406559 Epoch 2, Duration=126.320823
> Epoch 2,
>     > > > > > Training
>     > > > > > accuracy=0.214844 Epoch 2, Validation accuracy=0.244692 Epoch
> 3,
>     > > > > > Batch 199, Speed=403.161873 Epoch 3, Duration=125.331660
> Epoch 3,
>     > > > > > Training
>     > > > > > accuracy=0.243506 Epoch 3, Validation accuracy=0.301182 Epoch
> 4,
>     > > > > > Batch 199, Speed=450.826598 Epoch 4, Duration=126.426253
> Epoch 4,
>     > > > > > Training
>     > > > > > accuracy=0.266424 Epoch 4, Validation accuracy=0.311899
>     > > > > >
>     > > > > > real    11m21.930s
>     > > > > > user    415m3.855s
>     > > > > > sys     13m53.975s
>     > > > > >
>     > > > > > On Wed, Jun 26, 2019 at 3:50 PM Pedro Larroy
>     > > > > > <pedro.larroy.li...@gmail.com> wrote:
>     > > > > > >
>     > > > > > > Hi Ciyong, thanks for trying to reproduce:
>     > > > > > >
>     > > > > > > I used this one:
>     > > > > > > https://github.com/awslabs/deeplearning-
>     > > > > benchmark/blob/master/dawnbe
>     > > > > > > nch/cifar10.py
>     > > > > > >
>     > > > > > > Could you provide hardware and OS details?
>     > > > > > >
>     > > > > > > I will rerun and repost numbers in a few minutes.
>     > > > > > >
>     > > > > > > Pedro.
>     > > > > > >
>     > > > > > > On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong
>     > > > > > > <ciyong.c...@intel.com>
>     > > > > wrote:
>     > > > > > > >
>     > > > > > > > Hi Pedro,
>     > > > > > > >
>     > > > > > > > I'm looking at this case, and using the script of
>     > > > > > > >
> "incubator-mxnet/example/image-classification/train_cifar10.py"
>     > > > > > > > to get
>     > > > > the timing data, but seems there's not much difference between
> mxnet
>     > > > > 1.4.1.rc0 and 1.5.0.rc1 on C5.18xlarge.
>     > > > > > > >
>     > > > > > > > Not sure if there's any difference in the python script,
> can
>     > you
>     > > > > > > > point me
>     > > > > the link to get your script (cifar10.py)?
>     > > > > > > > Or you can also have a try with MXNet's script
>     > > > > > > > (train_cifar10.py) and see
>     > > > > the performance.
>     > > > > > > >
>     > > > > > > > Here's the command I used to collect the time:
>     > > > > > > >         python train_cifar10.py --num-epoch=5
>     > > > > > > >
>     > > > > > > > 1) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
>     > > > > > > >         real    9m4.880s
>     > > > > > > >         user    333m13.340s
>     > > > > > > >         sys     14m36.100s
>     > > > > > > >
>     > > > > > > > 2) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
>     > > > > > > >         real    9m2.155s
>     > > > > > > >         user    329m37.092s
>     > > > > > > >         sys     16m8.668s
>     > > > > > > >
>     > > > > > > > -Ciyong
>     > > > > > > >
>     > > > > > > >
>     > > > > > > > -----Original Message-----
>     > > > > > > > From: Pedro Larroy [mailto:pedro.larroy.li...@gmail.com]
>     > > > > > > > Sent: Wednesday, June 26, 2019 6:28 AM
>     > > > > > > > To: dev@mxnet.incubator.apache.org
>     > > > > > > > Cc: d...@mxnet.apache.org
>     > > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating)
> version
>     > > > > > > > 1.5.0.rc1
>     > > > > > > >
>     > > > > > > > Hi these were my build flags and system info:
>     > > > > > > >
>     > > > > > > >
>     > > > > > > > --- # CMake configuration
>     > > > > > > > USE_CUDA: "OFF" # Build with CUDA support
>     > > > > > > > USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda
>     > > > > > > > USE_NCCL: "OFF" # Use NVidia NCCL with CUDA
>     > > > > > > > USE_OPENCV: "ON" # Build with OpenCV support
>     > > > > > > > USE_OPENMP: "ON" # Build with Openmp support
>     > > > > > > > USE_CUDNN: "ON" # Build with cudnn support) # one could
> set
>     > > > > > > > CUDNN_ROOT for search path
>     > > > > > > > USE_SSE: "ON" # Build with x86 SSE instruction support IF
> NOT
>     > > > > > > > ARM
>     > > > > > > > USE_F16C: "ON" # Build with x86 F16C instruction support)
> #
>     > > > > autodetects support if "ON"
>     > > > > > > > USE_LAPACK: "ON" # Build with lapack support
>     > > > > > > > USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found
>     > > > > > > > USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL
> found)
>     > > > > > > > IF USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>     > > > > > > > USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL
> found) IF
>     > > > > > > > USE_MKL_IF_AVAILABLE AND (NOT APPLE)
>     > > > > > > > USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of
> operators IF
>     > > > > NOT
>     > > > > > > > MSVC
>     > > > > > > > USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if
> found)
>     > > > > > > > USE_JEMALLOC: "ON" # Build with Jemalloc support
>     > > > > > > > USE_PROFILER: "ON" # Build with Profiler support
>     > > > > > > > USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
>     > > > > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
>     > > > > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
>     > > > > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
>     > > > > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
>     > > > > conventions.
>     > > > > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
>     > > > > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the
>     > compiler
>     > > > > > > > supports it
>     > > > > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE
> (VTune)) #
>     > > > > > > > one could set VTUNE_ROOT for search path
>     > > > > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime
> compilation
>     > > > > > > > support
>     > > > > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
>     > > > > > > > INSTALL_EXAMPLES: "OFF" # Install the example source
> files.
>     > > > > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on
> segfaults.
>     > > > > > > > USE_TENSORRT: "OFF" # Enable infeference optimization with
>     > > > TensorRT.
>     > > > > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
>     > > > > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test
>     > > > > > > > coverage metric output
>     > > > > > > > CMAKE_BUILD_TYPE: "Release"
>     > > > > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
>     > > > > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
>     > > > > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
>     > > > > > > >
>     > > > > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD,
> tag:
>     > > > > > > > 1.5.0.rc1,
>     > > > > > > > upstream/v1.5.x)
>     > > > > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD,
> tag:
>     > > > > > > > 1.4.1.rc0,
>     > > > > > > > upstream/v1.4.x)
>     > > > > > > >
>     > > > > > > > curl http://169.254.169.254/latest/meta-data/instance-type
>     > > > > > > > c5d.18xlarge
>     > > > > > > >
>     > > > > > > >
>     > > > > > > > Version      : 3.6.7
>     > > > > > > > Compiler     : GCC 8.2.0
>     > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
>     > > > > > > > Arch         : ('64bit', 'ELF')
>     > > > > > > > ------------Pip Info-----------
>     > > > > > > > Version      : 19.1.1
>     > > > > > > > Directory    :
>     > /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
>     > > > > packages/pip
>     > > > > > > > ----------MXNet Info-----------
>     > > > > > > > Version      : 1.5.0
>     > > > > > > > Directory    : /home/piotr/mxnet_1.5/python/mxnet
>     > > > > > > > Hashtag not found. Not installed from pre-built package.
>     > > > > > > > ----------System Info----------
>     > > > > > > > Platform     :
>     > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>     > > > > > > > system       : Linux
>     > > > > > > > node         : ip-172-31-63-171
>     > > > > > > > release      : 4.15.0-1035-aws
>     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
>     > > > > > > > ----------Hardware Info----------
>     > > > > > > > machine      : x86_64
>     > > > > > > > processor    : x86_64
>     > > > > > > > Architecture:        x86_64
>     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
>     > > > > > > > Byte Order:          Little Endian
>     > > > > > > > CPU(s):              72
>     > > > > > > > On-line CPU(s) list: 0-71
>     > > > > > > > Thread(s) per core:  2
>     > > > > > > > Core(s) per socket:  18
>     > > > > > > > Socket(s):           2
>     > > > > > > > NUMA node(s):        2
>     > > > > > > > Vendor ID:           GenuineIntel
>     > > > > > > > CPU family:          6
>     > > > > > > > Model:               85
>     > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
>     > 3.00GHz
>     > > > > > > > Stepping:            4
>     > > > > > > > CPU MHz:             1326.446
>     > > > > > > > BogoMIPS:            6000.00
>     > > > > > > > Hypervisor vendor:   KVM
>     > > > > > > > Virtualization type: full
>     > > > > > > > L1d cache:           32K
>     > > > > > > > L1i cache:           32K
>     > > > > > > > L2 cache:            1024K
>     > > > > > > > L3 cache:            25344K
>     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
>     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
>     > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> apic
>     > sep
>     > > > mtrr
>     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> syscall
>     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> nopl
>     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> monitor
>     > > > > > > > ssse3 fma cx16 pcid
>     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> xsave
>     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
>     > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
> bmi2
>     > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
>     > clflushopt
>     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> xsaves
>     > > > > > > > ida arat pku ospke ----------Network Test----------
>     > > > > > > >
>     > > > > > > > ----------Python Info----------
>     > > > > > > > Version      : 3.6.7
>     > > > > > > > Compiler     : GCC 8.2.0
>     > > > > > > > Build        : ('default', 'Oct 22 2018 11:32:17')
>     > > > > > > > Arch         : ('64bit', 'ELF')
>     > > > > > > > ------------Pip Info-----------
>     > > > > > > > Version      : 19.1.1
>     > > > > > > > Directory    :
>     > /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
>     > > > > packages/pip
>     > > > > > > > ----------MXNet Info-----------
>     > > > > > > > Version      : 1.4.1
>     > > > > > > > Directory    : /home/piotr/mxnet_1.4/python/mxnet
>     > > > > > > > Hashtag not found. Not installed from pre-built package.
>     > > > > > > > ----------System Info----------
>     > > > > > > > Platform     :
>     > > > Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
>     > > > > > > > system       : Linux
>     > > > > > > > node         : ip-172-31-63-171
>     > > > > > > > release      : 4.15.0-1035-aws
>     > > > > > > > version      : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
>     > > > > > > > ----------Hardware Info----------
>     > > > > > > > machine      : x86_64
>     > > > > > > > processor    : x86_64
>     > > > > > > > Architecture:        x86_64
>     > > > > > > > CPU op-mode(s):      32-bit, 64-bit
>     > > > > > > > Byte Order:          Little Endian
>     > > > > > > > CPU(s):              72
>     > > > > > > > On-line CPU(s) list: 0-71
>     > > > > > > > Thread(s) per core:  2
>     > > > > > > > Core(s) per socket:  18
>     > > > > > > > Socket(s):           2
>     > > > > > > > NUMA node(s):        2
>     > > > > > > > Vendor ID:           GenuineIntel
>     > > > > > > > CPU family:          6
>     > > > > > > > Model:               85
>     > > > > > > > Model name:          Intel(R) Xeon(R) Platinum 8124M CPU @
>     > 3.00GHz
>     > > > > > > > Stepping:            4
>     > > > > > > > CPU MHz:             1223.344
>     > > > > > > > BogoMIPS:            6000.00
>     > > > > > > > Hypervisor vendor:   KVM
>     > > > > > > > Virtualization type: full
>     > > > > > > > L1d cache:           32K
>     > > > > > > > L1i cache:           32K
>     > > > > > > > L2 cache:            1024K
>     > > > > > > > L3 cache:            25344K
>     > > > > > > > NUMA node0 CPU(s):   0-17,36-53
>     > > > > > > > NUMA node1 CPU(s):   18-35,54-71
>     > > > > > > > Flags:               fpu vme de pse tsc msr pae mce cx8
> apic
>     > sep
>     > > > mtrr
>     > > > > > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht
> syscall
>     > > > > > > > nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good
> nopl
>     > > > > > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> monitor
>     > > > > > > > ssse3 fma cx16 pcid
>     > > > > > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
> xsave
>     > > > > > > > avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
>     > > > > > > > invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep
> bmi2
>     > > > > > > > erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
>     > clflushopt
>     > > > > > > > clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> xsaves
>     > > > > > > > ida arat pku ospke ----------Network Test----------
>     > > > > > > >
>     > > > > > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
>     > > > > <pedro.larroy.li...@gmail.com> wrote:
>     > > > > > > > >
>     > > > > > > > > I did a training of cifar10 in CPU and seems there's
> some
>     > > > > > > > > regressions in the range of 7% increase of training time
>     > against
>     > > > 1.4.1:
>     > > > > > > > >
>     > > > > > > > > (py3_venv)
>     > > > > > > > > piotr@ip-172-31-63-171
> :0:~/deeplearning-benchmark/dawnbench
>     > > > > > > > > (master)+$ time python cifar10.py --epochs 5
>     > > > > > > > > real    11m30.388s
>     > > > > > > > > user    417m7.766s
>     > > > > > > > > sys     16m57.315s
>     > > > > > > > >
>     > > > > > > > > VS 1.4.1:
>     > > > > > > > > real    10m41.994s
>     > > > > > > > > user    392m40.646s
>     > > > > > > > > sys     12m30.601s
>     > > > > > > > >
>     > > > > > > > >
>     > > > > > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <
>     > roywei...@gmail.com>
>     > > > > wrote:
>     > > > > > > > > >
>     > > > > > > > > > Hi Anirudh,
>     > > > > > > > > >
>     > > > > > > > > > Thanks for jumping into this quickly, I followed up
> on the
>     > > > issue.
>     > > > > > > > > >
>     > > > > > > > > > I was meant for sockeye developer/maintainers to help
> setup
>     > > > > > > > > > nightly tests and raise issues early.
>     > > > > > > > > >
>     > > > > > > > > > Thanks!
>     > > > > > > > > >
>     > > > > > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
>     > > > > > > > > > <haibin.lin....@gmail.com>
>     > > > > > > > > > wrote:
>     > > > > > > > > >
>     > > > > > > > > > > In GluonNLP we are testing with MXNET nightly build
> for
>     > > > > > > > > > > each PR, and we did find some MXNet related issue
> caught
>     > by
>     > > > the CI.
>     > > > > > > > > > > I recommend other toolkits also add integration
> tests
>     > with
>     > > > > > > > > > > MXNet
>     > > > > nightly.
>     > > > > > > > > > > It helps identify issues early.
>     > > > > > > > > > >
>     > > > > > > > > > > Best,
>     > > > > > > > > > > Haibin
>     > > > > > > > > > >
>     > > > > > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric
>     > > > > > > > > > > <patric.z...@intel.com>
>     > > > > wrote:
>     > > > > > > > > > >
>     > > > > > > > > > > > Thanks to raise the issue and we will take a look
> ASAP.
>     > > > > > > > > > > >
>     > > > > > > > > > > > The downstream cases is not in the MXNet CI so
> it's
>     > hard
>     > > > > > > > > > > > to catch the potential bugs or performance
> degradation
>     > > > > > > > > > > > for
>     > > > > MXNet developers.
>     > > > > > > > > > > >
>     > > > > > > > > > > > In the future, I suggest adding the major
> downstream
>     > > > > > > > > > > > test cases, like
>     > > > > > > > > > > from
>     > > > > > > > > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into
> the
>     > > > > > > > > > > > nightly
>     > > > > test.
>     > > > > > > > > > > > If it's still too heavy,  maybe testing it weekly
> or
>     > > > > > > > > > > > monthly :)
>     > > > > > > > > > > >
>     > > > > > > > > > > > Thanks,
>     > > > > > > > > > > >
>     > > > > > > > > > > > --Patric
>     > > > > > > > > > > >
>     > > > > > > > > > > > > -----Original Message-----
>     > > > > > > > > > > > > From: Anirudh Subramanian
>     > > > > > > > > > > > > [mailto:anirudh2...@gmail.com]
>     > > > > > > > > > > > > Sent: Friday, June 21, 2019 9:31 AM
>     > > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
>     > > > > > > > > > > > > Cc: d...@mxnet.apache.org
>     > > > > > > > > > > > > Subject: Re: [VOTE] Release Apache MXNet
> (incubating)
>     > > > > > > > > > > > > version
>     > > > > > > > > > > > > 1.5.0.rc1
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > Hi Lai,
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > I have opened an issue:
>     > > > > > > > > > > > >
>     > https://github.com/apache/incubator-mxnet/issues/15297
>     > > > > > > > > > > > > I came to know about this issue only today and
> I have
>     > > > > > > > > > > > > not been
>     > > > > > > > > > > monitoring
>     > > > > > > > > > > > > sockeye.
>     > > > > > > > > > > > > I jumped onto this issue to make sure it wasn't
>     > caused
>     > > > > > > > > > > > > by the dlpack
>     > > > > > > > > > > > changes.
>     > > > > > > > > > > > > Also, I don't  think sockeye CI checks against
>     > master,
>     > > > > > > > > > > > > it is using
>     > > > > > > > > > > 1.4.1.
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > Anirudh
>     > > > > > > > > > > > >
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei
>     > > > > > > > > > > > > <roywei...@gmail.com>
>     > > > > wrote:
>     > > > > > > > > > > > >
>     > > > > > > > > > > > > > Hi,
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > Could you share which test failed and what’s
> the
>     > > > > > > > > > > > > > crash? How to reproduce it?
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > I was able to install sockeye and run all
> tests
>     > passed.
>     > > > > > > > > > > > > > Using python setup.py test
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > I have tested both nightly pip package and
>     > 1.5.0.rc1
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > It would be great to create an issue with
>     > > > > > > > > > > > > > reproducible steps and move the discussion
> there.
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > Also I see sockeye nightly build[1] has been
>     > failing
>     > > > > > > > > > > > > > for some time,
>     > > > > > > > > > > if
>     > > > > > > > > > > > > > it’s due to MXNet change, please raise this
> early
>     > so
>     > > > > > > > > > > > > > we can track and solve it in time rather than
> block
>     > > > > > > > > > > > > > the release
>     > > > > during vote time.
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh
> Subramanian
>     > > > > > > > > > > > > > <anirudh2...@gmail.com
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > wrote:
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > I was able to reproduce a crash with the
> commit
>     > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
> but not
>     > > > > > > > > > > > > > > with the commit
>     > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > Anirudh
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei
>     > > > > > > > > > > > > > > <roywei...@gmail.com>
>     > > > > > > > > > > wrote:
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > Hi Przemyslaw,
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > Is there an issue with more details to
> track
>     > the
>     > > > problem?
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław
>     > > > > > > > > > > > > > > > Trędak <ptre...@apache.org>
>     > > > > > > > > > > > > > > > wrote:
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > -1
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > There is a crash in sockeye unit test
> (python
>     > > > > > > > > > > > > > > > > setup.py
>     > > > > > > > > > > > > > > > > test) observed starting with nightly 1.5
>     > build
>     > > > > > > > > > > > > > > > > from
>     > > > > > > > > > > > > > > > > 6/13 and still occuring in
>     > > > > > > > > > > > > > > 1.5rc1. I
>     > > > > > > > > > > > > > > > > don't yet have the exact commit that is
>     > > > > > > > > > > > > > > > > responsible for it, but it is either
>     > > > > > > > > > > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c
>     > > > > > > > > > > > > > > > > (dlpack
>     > > > > > > > > > > > > > > > > related) or
>     > > > > > > > > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06
>     > > > > > > > > > > > > > > > > (cached op
>     > > > > > > > > > > > > optimization).
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei
>     > > > > > > > > > > > > > > > > <roywei...@gmail.com>
>     > > > > wrote:
>     > > > > > > > > > > > > > > > > > Dear MXNet community,
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > > This is the 3-day vote to release
> Apache
>     > > > > > > > > > > > > > > > > > MXNet
>     > > > > > > > > > > > > > > > > > (incubating) version
>     > > > > > > > > > > > > > > > > 1.5.0.
>     > > > > > > > > > > > > > > > > > Voting on dev@ will start June 19,
>     > > > > > > > > > > > > > > > > > 23:59:59(PST) and close
>     > > > > > > > > > > on
>     > > > > > > > > > > > > > June
>     > > > > > > > > > > > > > > > 22,
>     > > > > > > > > > > > > > > > > > 23:59:59.
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > > 1) Link to release notes:
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > >
>     > > > > > > > > > >
>     > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Re
>     > > > > > > > > > > le
>     > > > > > > > > > > ase+No
>     > > > > > > > > > > te
>     > > > > > > > > > > > > > > s
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > > 2) Link to release candidate:
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > >
>     > https://github.com/apache/incubator-mxnet/releases/tag/1.5
>     > > > > > > > > > > .0
>     > > > > > > > > > > .r
>     > > > > > > > > > > > > > > > > > c1
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > > 3) Link to source and signatures on
> apache
>     > > > dist server:
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > >
>     > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5
>     > > > > > > > > > > .0
>     > > > > > > > > > > .r
>     > > > > > > > > > > > > > > > > > c1/
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > > Please remember to TEST first before
> voting
>     > > > > accordingly:
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > > +1 = approve
>     > > > > > > > > > > > > > > > > > +0 = no opinion
>     > > > > > > > > > > > > > > > > > -1 = disapprove (provide reason)
>     > > > > > > > > > > > > > > > > > --
>     > > > > > > > > > > > > > > > > > Best Regards
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > > > Lai
>     > > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > --
>     > > > > > > > > > > > > > > > Best Regards
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > > > Lai
>     > > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > >
>     > > > > > > > > > > > > > --
>     > > > > > > > > > > > > > Best Regards
>     > > > > > > > > > > > > >
>     > > > > > > > > > > > > > Lai
>     > > > > > > > > > > > > >
>     > > > > > > > > > > >
>     > > > > > > > > > >
>     > > > > > > > > > --
>     > > > > > > > > > Best Regards
>     > > > > > > > > >
>     > > > > > > > > > Lai
>     > > >
>     > > --
>     > > Best Regards
>     > >
>     > > Lai
>     >
>     >
>
>     --
>     Sandeep Krishnamurthy

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

Reply via email to