Have you tried running with valgrind to get some clues on the root-cause?

Bhavin Thaker.

On Wed, May 2, 2018 at 8:55 PM Da Zheng <zhengda1...@gmail.com> wrote:

> It might also be possible that this isn't an MKLDNN bug.
> I just saw a similar memory error without MKLDNN build.
>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10783/1/pipeline
>
> Best,
> Da
>
> On Wed, May 2, 2018 at 2:14 PM, Zheng, Da <dzz...@amazon.com> wrote:
> > There might be a race condition that causes the memory error.
> > It might be caused by this PR:
> > https://github.com/apache/incubator-mxnet/pull/10706/files
> > This PR removes MKLDNN memory from NDArray.
> > However, I don't know why this causes memory error. If someone is using
> the memory, it should still hold the memory with shared pointer.
> > But I do see the memory error increase after this PR is merged.
> >
> > Best,
> > Da
> >
> > On 5/2/18, 12:26 PM, "Pedro Larroy" <pedro.larroy.li...@gmail.com>
> wrote:
> >
> >     I couldn't reproduce locally with:
> >
> >     ci/build.py -p ubuntu_cpu /work/runtime_functions.sh
> >     build_ubuntu_cpu_mkldnn && ci/build.py --platform ubuntu_cpu
> >     /work/runtime_functions.sh unittest_ubuntu_python2_cpu
> >
> >
> >     On Wed, May 2, 2018 at 8:50 PM, Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> >     wrote:
> >
> >     > Hi
> >     >
> >     > Seems master is not running  anymore, there's a segmentation fault
> using
> >     > MKDLNN-CPU
> >     >
> >     > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> >     > incubator-mxnet/detail/master/801/pipeline/662
> >     >
> >     >
> >     > I see my PRs failing with a similar error.
> >     >
> >     > Pedro
> >     >
> >
> >
>

Reply via email to