Correction: I was able to reproduce the issue with MKLDNN enabled on master, but not on 1.2 branch.
On Thu, May 3, 2018 at 11:33 AM, Anirudh <anirudh2...@gmail.com> wrote: > Hi Pedro and Naveen, > > I am unable to reproduce this issue with MKLDNN on the master but not on > the 1.2.RC2 branch. > > Did the following on 1.2.RC2 branch: > > make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_DIST_KVSTORE=0 > USE_CUDA=0 USE_CUDNN=0 USE_MKLDNN=1 > export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0 > export MXNET_TEST_SEED=11 > export MXNET_MODULE_SEED=812478194 > export MXNET_TEST_COUNT=10000 > nosetests-2.7 -v tests/python/unittest/test_module.py:test_forward_reshape > > Was able to do the 10k runs successfully. > > Anirudh > > On Thu, May 3, 2018 at 8:46 AM, Anirudh <anirudh2...@gmail.com> wrote: > >> Hi Pedro and Naveen, >> >> Is this issue reproducible when MXNet is built with USE_MKLDNN=0? >> Also, there are a bunch of MKLDNN fixes that didn't go into the release >> branch. Is this issue reproducible on the release branch ? >> In my opinion, since we have marked MKLDNN as experimental feature for >> the release, if it is confirmed to be a MKLDNN issue >> we don't need to block the release on it. >> >> Anirudh >> >> On Thu, May 3, 2018 at 6:58 AM, Naveen Swamy <mnnav...@gmail.com> wrote: >> >>> Thanks for raising this issue Pedro. >>> >>> -1(binding) >>> >>> We were in a similar state for a while a year ago, a lot of effort went >>> to >>> stabilize the tests and the CI. I have seen the PR builds are >>> non-deterministic and you have to retry over and over (wasting resources >>> and time) and hope you get lucky. >>> >>> Look at the dashboard for master build >>> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/master/ >>> >>> -Naveen >>> >>> On Thu, May 3, 2018 at 5:11 AM, Pedro Larroy < >>> pedro.larroy.li...@gmail.com> >>> wrote: >>> >>> > -1 nondeterminisitc failures on CI master: >>> > https://issues.apache.org/jira/browse/MXNET-396 >>> > >>> > Was able to reproduce once in a fresh p3 instance with DLAMI can't >>> > reproduce consistently. >>> > >>> > On Wed, May 2, 2018 at 9:51 PM, Anirudh <anirudh2...@gmail.com> wrote: >>> > >>> > > Hi all, >>> > > >>> > > As part of RC2 release, we have addressed bugs and some concerns that >>> > were >>> > > raised. >>> > > >>> > > I would like to propose a vote to release Apache MXNet (incubating) >>> > version >>> > > 1.2.0.RC2. Voting will start now (Wednesday, May 2nd) and end at >>> 12:50 PM >>> > > PDT, Sunday, May 6th. >>> > > >>> > > Link to release notes: >>> > > https://cwiki.apache.org/confluence/display/MXNET/ >>> > > Apache+MXNet+%28incubating%29+1.2.0+Release+Notes >>> > > >>> > > Link to release candidate 1.2.0.rc2: >>> > > https://github.com/apache/incubator-mxnet/releases/tag/1.2.0.rc2 >>> > > >>> > > Voting results for 1.2.0.rc2: >>> > > https://lists.apache.org/thread.html/ebe561c609a8e32351dfe4aafc8876 >>> > > 199560336472726b58c3455e85@%3Cdev.mxnet.apache.org%3E >>> > > >>> > > View this page, click on "Build from Source", and use the source code >>> > > obtained from 1.2.0.rc2 tag: >>> > > https://mxnet.incubator.apache.org/install/index.html >>> > > >>> > > (Note: The README.md points to the 1.2.0 tag and does not work at the >>> > > moment.) >>> > > >>> > > Please remember to test first before voting accordingly: >>> > > >>> > > +1 = approve >>> > > +0 = no opinion >>> > > -1 = disapprove (provide reason) >>> > > >>> > > Anirudh >>> > > >>> > >>> >> >> >