Hi these were my build flags and system info:
--- # CMake configuration USE_CUDA: "OFF" # Build with CUDA support USE_OLDCMAKECUDA: "OFF" # Build with old cmake cuda USE_NCCL: "OFF" # Use NVidia NCCL with CUDA USE_OPENCV: "ON" # Build with OpenCV support USE_OPENMP: "ON" # Build with Openmp support USE_CUDNN: "ON" # Build with cudnn support) # one could set CUDNN_ROOT for search path USE_SSE: "ON" # Build with x86 SSE instruction support IF NOT ARM USE_F16C: "ON" # Build with x86 F16C instruction support) # autodetects support if "ON" USE_LAPACK: "ON" # Build with lapack support USE_MKL_IF_AVAILABLE: "ON" # Use MKL if found USE_MKLML_MKL: "ON" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE) USE_MKLDNN: "ON" # Use MKLDNN variant of MKL (if MKL found) IF USE_MKL_IF_AVAILABLE AND (NOT APPLE) USE_OPERATOR_TUNING: "ON" # Enable auto-tuning of operators IF NOT MSVC USE_GPERFTOOLS: "ON" # Build with GPerfTools support (if found) USE_JEMALLOC: "ON" # Build with Jemalloc support USE_PROFILER: "ON" # Build with Profiler support USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin USE_CPP_PACKAGE: "OFF" # Build C++ Package USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming conventions. USE_GPROF: "OFF" # Compile with gprof (profiling) flag USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler supports it USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) # one could set VTUNE_ROOT for search path ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation support BUILD_CPP_EXAMPLES: "ON" # Build cpp examples INSTALL_EXAMPLES: "OFF" # Install the example source files. USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults. USE_TENSORRT: "OFF" # Enable infeference optimization with TensorRT. USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers. ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test coverage metric output CMAKE_BUILD_TYPE: "Release" CMAKE_CUDA_COMPILER_LAUNCHER: "ccache" CMAKE_C_COMPILER_LAUNCHER: "ccache" CMAKE_CXX_COMPILER_LAUNCHER: "ccache" commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag: 1.5.0.rc1, upstream/v1.5.x) commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag: 1.4.1.rc0, upstream/v1.4.x) curl http://169.254.169.254/latest/meta-data/instance-type c5d.18xlarge Version : 3.6.7 Compiler : GCC 8.2.0 Build : ('default', 'Oct 22 2018 11:32:17') Arch : ('64bit', 'ELF') ------------Pip Info----------- Version : 19.1.1 Directory : /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-packages/pip ----------MXNet Info----------- Version : 1.5.0 Directory : /home/piotr/mxnet_1.5/python/mxnet Hashtag not found. Not installed from pre-built package. ----------System Info---------- Platform : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic system : Linux node : ip-172-31-63-171 release : 4.15.0-1035-aws version : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 72 On-line CPU(s) list: 0-71 Thread(s) per core: 2 Core(s) per socket: 18 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz Stepping: 4 CPU MHz: 1326.446 BogoMIPS: 6000.00 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 25344K NUMA node0 CPU(s): 0-17,36-53 NUMA node1 CPU(s): 18-35,54-71 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke ----------Network Test---------- ----------Python Info---------- Version : 3.6.7 Compiler : GCC 8.2.0 Build : ('default', 'Oct 22 2018 11:32:17') Arch : ('64bit', 'ELF') ------------Pip Info----------- Version : 19.1.1 Directory : /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-packages/pip ----------MXNet Info----------- Version : 1.4.1 Directory : /home/piotr/mxnet_1.4/python/mxnet Hashtag not found. Not installed from pre-built package. ----------System Info---------- Platform : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic system : Linux node : ip-172-31-63-171 release : 4.15.0-1035-aws version : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 72 On-line CPU(s) list: 0-71 Thread(s) per core: 2 Core(s) per socket: 18 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz Stepping: 4 CPU MHz: 1223.344 BogoMIPS: 6000.00 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 25344K NUMA node0 CPU(s): 0-17,36-53 NUMA node1 CPU(s): 18-35,54-71 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke ----------Network Test---------- On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy <pedro.larroy.li...@gmail.com> wrote: > > I did a training of cifar10 in CPU and seems there's some regressions > in the range of 7% increase of training time against 1.4.1: > > (py3_venv) piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench > (master)+$ time python cifar10.py --epochs 5 > real 11m30.388s > user 417m7.766s > sys 16m57.315s > > VS 1.4.1: > real 10m41.994s > user 392m40.646s > sys 12m30.601s > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei <roywei...@gmail.com> wrote: > > > > Hi Anirudh, > > > > Thanks for jumping into this quickly, I followed up on the issue. > > > > I was meant for sockeye developer/maintainers to help setup nightly tests > > and raise issues early. > > > > Thanks! > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin <haibin.lin....@gmail.com> > > wrote: > > > > > In GluonNLP we are testing with MXNET nightly build for each PR, and we > > > did > > > find some MXNet related issue caught by the CI. > > > I recommend other toolkits also add integration tests with MXNet nightly. > > > It helps identify issues early. > > > > > > Best, > > > Haibin > > > > > > On Thu, Jun 20, 2019 at 18:52 Zhao, Patric <patric.z...@intel.com> wrote: > > > > > > > Thanks to raise the issue and we will take a look ASAP. > > > > > > > > The downstream cases is not in the MXNet CI so it's hard to catch the > > > > potential bugs or performance degradation for MXNet developers. > > > > > > > > In the future, I suggest adding the major downstream test cases, like > > > from > > > > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test. > > > > If it's still too heavy, maybe testing it weekly or monthly :) > > > > > > > > Thanks, > > > > > > > > --Patric > > > > > > > > > -----Original Message----- > > > > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] > > > > > Sent: Friday, June 21, 2019 9:31 AM > > > > > To: dev@mxnet.incubator.apache.org > > > > > Cc: d...@mxnet.apache.org > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version > > > > > 1.5.0.rc1 > > > > > > > > > > Hi Lai, > > > > > > > > > > I have opened an issue: > > > > > https://github.com/apache/incubator-mxnet/issues/15297 > > > > > I came to know about this issue only today and I have not been > > > monitoring > > > > > sockeye. > > > > > I jumped onto this issue to make sure it wasn't caused by the dlpack > > > > changes. > > > > > Also, I don't think sockeye CI checks against master, it is using > > > 1.4.1. > > > > > > > > > > Anirudh > > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei <roywei...@gmail.com> wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > Could you share which test failed and what’s the crash? How to > > > > > > reproduce it? > > > > > > > > > > > > I was able to install sockeye and run all tests passed. Using python > > > > > > setup.py test > > > > > > > > > > > > I have tested both nightly pip package and 1.5.0.rc1 > > > > > > > > > > > > It would be great to create an issue with reproducible steps and > > > > > > move > > > > > > the discussion there. > > > > > > > > > > > > Also I see sockeye nightly build[1] has been failing for some time, > > > if > > > > > > it’s due to MXNet change, please raise this early so we can track > > > > > > and > > > > > > solve it in time rather than block the release during vote time. > > > > > > > > > > > > [1] https://travis-ci.org/awslabs/sockeye > > > > > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian > > > > > > <anirudh2...@gmail.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > I was able to reproduce a crash with the commit > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 but not with the commit > > > > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c. > > > > > > > > > > > > > > Anirudh > > > > > > > > > > > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei <roywei...@gmail.com> > > > wrote: > > > > > > > > > > > > > > > Hi Przemyslaw, > > > > > > > > > > > > > > > > Is there an issue with more details to track the problem? > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak > > > > > > > > <ptre...@apache.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > -1 > > > > > > > > > > > > > > > > > > There is a crash in sockeye unit test (python setup.py test) > > > > > > > > > observed starting with nightly 1.5 build from 6/13 and still > > > > > > > > > occuring in > > > > > > > 1.5rc1. I > > > > > > > > > don't yet have the exact commit that is responsible for it, > > > > > > > > > but > > > > > > > > > it is either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack > > > > > > > > > related) or > > > > > > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op > > > > > optimization). > > > > > > > > > > > > > > > > > > On 2019/06/20 06:36:22, Lai Wei <roywei...@gmail.com> wrote: > > > > > > > > > > Dear MXNet community, > > > > > > > > > > > > > > > > > > > > This is the 3-day vote to release Apache MXNet (incubating) > > > > > > > > > > version > > > > > > > > > 1.5.0. > > > > > > > > > > Voting on dev@ will start June 19, 23:59:59(PST) and close > > > on > > > > > > June > > > > > > > > 22, > > > > > > > > > > 23:59:59. > > > > > > > > > > > > > > > > > > > > 1) Link to release notes: > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Note > > > > > > > s > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2) Link to release candidate: > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.r > > > > > > > > > > c1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3) Link to source and signatures on apache dist server: > > > > > > > > > > > > > > > > > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.r > > > > > > > > > > c1/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please remember to TEST first before voting accordingly: > > > > > > > > > > > > > > > > > > > > +1 = approve > > > > > > > > > > +0 = no opinion > > > > > > > > > > -1 = disapprove (provide reason) > > > > > > > > > > -- > > > > > > > > > > Best Regards > > > > > > > > > > > > > > > > > > > > Lai > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Best Regards > > > > > > > > > > > > > > > > Lai > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Best Regards > > > > > > > > > > > > Lai > > > > > > > > > > > > > > > -- > > Best Regards > > > > Lai