valgrind doesn't work with Python. also, valgrind doesn't support some CPU instructions used by MXNet (I think some instructions related to random generator).
On Wed, May 2, 2018 at 8:59 PM, Bhavin Thaker <bhavintha...@gmail.com> wrote: > Have you tried running with valgrind to get some clues on the root-cause? > > Bhavin Thaker. > > On Wed, May 2, 2018 at 8:55 PM Da Zheng <zhengda1...@gmail.com> wrote: > >> It might also be possible that this isn't an MKLDNN bug. >> I just saw a similar memory error without MKLDNN build. >> >> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10783/1/pipeline >> >> Best, >> Da >> >> On Wed, May 2, 2018 at 2:14 PM, Zheng, Da <dzz...@amazon.com> wrote: >> > There might be a race condition that causes the memory error. >> > It might be caused by this PR: >> > https://github.com/apache/incubator-mxnet/pull/10706/files >> > This PR removes MKLDNN memory from NDArray. >> > However, I don't know why this causes memory error. If someone is using >> the memory, it should still hold the memory with shared pointer. >> > But I do see the memory error increase after this PR is merged. >> > >> > Best, >> > Da >> > >> > On 5/2/18, 12:26 PM, "Pedro Larroy" <pedro.larroy.li...@gmail.com> >> wrote: >> > >> > I couldn't reproduce locally with: >> > >> > ci/build.py -p ubuntu_cpu /work/runtime_functions.sh >> > build_ubuntu_cpu_mkldnn && ci/build.py --platform ubuntu_cpu >> > /work/runtime_functions.sh unittest_ubuntu_python2_cpu >> > >> > >> > On Wed, May 2, 2018 at 8:50 PM, Pedro Larroy < >> pedro.larroy.li...@gmail.com> >> > wrote: >> > >> > > Hi >> > > >> > > Seems master is not running anymore, there's a segmentation fault >> using >> > > MKDLNN-CPU >> > > >> > > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/ >> > > incubator-mxnet/detail/master/801/pipeline/662 >> > > >> > > >> > > I see my PRs failing with a similar error. >> > > >> > > Pedro >> > > >> > >> > >>