I see. There's also an openmp primitive to change this. I see a way to
fix this issue with a bit of refactor.

Thanks.

Pedro.
On Thu, Nov 29, 2018 at 6:24 PM Chris Olivier <cjolivie...@gmail.com> wrote:
>
> I don’t think that does anything at all, as stated in my other email.
> Someone can look into the omp code to be sure but my suspicion is that the
> environment variable is only read on startup, and at any rate, better to be
> set through the api at runtime
>
> On Thu, Nov 29, 2018 at 8:11 AM Pedro Larroy <pedro.larroy.li...@gmail.com>
> wrote:
>
> > To be precise, what would be the consequences of not having these env
> > variables set in the engine threads related to OMP?
> > Given your experience with OpenMP I hope you can help us answer these
> > questions.
> >
> > Hopefully we can get the same effect (if any) of these setenvs using
> > some openmp call or a pragma. Definitely we shouldn't be mutating the
> > environment from a different thread from what I understand, which is
> > the likely cause of the random crashes some users are experiencing.
> >
> > Pedro
> > On Thu, Nov 29, 2018 at 5:00 PM Pedro Larroy
> > <pedro.larroy.li...@gmail.com> wrote:
> > >
> > > Chris.  The problem is with setenv, not with getenv. We don't want to
> > > remove any getenv call, just these misplaced setenvs:
> > >
> > >
> > >
> > https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L61
> > >
> > > Please check the code above carefully and give us your feedback. Based
> > > on your email I think we don't yet have a common understanding of the
> > > root cause of this issue.
> > >
> > > Pedro.
> > > On Thu, Nov 29, 2018 at 4:02 PM Chris Olivier <cjolivie...@gmail.com>
> > wrote:
> > > >
> > > > - getenv should be thread safe as long as nothing is calling
> > putenv/setenv
> > > > in another thread (the environment doesn’t change) as stated here:
> > > >
> > > > http://www.cplusplus.com/reference/cstdlib/getenv/
> > > >
> > > > it’s a simple library call, so to be sure either way, one can check the
> > > > actual source and see (in case some particular implementation is
> > acting in
> > > > a particularly thread-unsafe manner). This should be vetted before
> > making
> > > > any high-impact decisions such as trying to go remove every getenv
> > call in
> > > > the whole system.
> > > >
> > > > - locking after fork is possibly due to libgomp not supporting forking
> > such
> > > > that after a fork, a call is made to release the blocked omp threads
> > and
> > > > the main thread waits for the omp threads to finish, but the omp
> > threads
> > > > belong to the pre-forked process and thus never execute, causing that
> > > > forked process to freeze.  This behavior has been witnessed before.
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> > pedro.larroy.li...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi all.
> > > > >
> > > > > There are two important issues / fixes that should go in the next
> > > > > release in my radar:
> > > > >
> > > > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > > > > There is a bug in shape inference on CPU when not using MKL, also we
> > > > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > > > > I'm finishing a fix for these issues in the above PR.
> > > > >
> > > > > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > > > > We are seeing crashes due to unsafe setenv in multithreaded code.
> > > > > Setenv / getenv from multiple threads is not safe and is causing
> > > > > segfaults. This piece of code (the handlers in pthread_atfork)
> > already
> > > > > caused a very difficult to diagnose hang in a previous release, where
> > > > > a fork inside cudnn would deadlock the engine.
> > > > >
> > > > > I would remove setenv from 2) as a mitigation, but we would need to
> > > > > check for regressions as we could be creating additional threads
> > > > > inside the engine.
> > > > >
> > > > > I would suggest that we address these two major issues before the
> > next
> > > > > release.
> > > > >
> > > > > Pedro
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> > steffenroc...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Dear MXNet community,
> > > > > >
> > > > > > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > > > > release.
> > > > > > Sergey Kolychev will be co-managing the release and providing help
> > from
> > > > > the
> > > > > > committers side.
> > > > > > A release candidate will be cut on November 29, 2018 and voting
> > will
> > > > > start
> > > > > > December 7, 2018. Release notes have been drafted here [1]. If you
> > have
> > > > > any
> > > > > > additional features in progress and would like to include it in
> > this
> > > > > > release, please assure they have been merged by November 27, 2018.
> > > > > Release
> > > > > > schedule is available here [2].
> > > > > >
> > > > > > Feel free to add any other comments/suggestions. Please help to
> > review
> > > > > and
> > > > > > merge outstanding PR's and resolve issues impacting the quality of
> > the
> > > > > > 1.4.0 release.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Steffen
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > > > > >
> > > > > > [2]
> > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > > > > > kellen.sunderl...@gmail.com> wrote:
> > > > > >
> > > > > > > Spoke too soon[1], looks like others have been adding Turing
> > support as
> > > > > > > well (thanks to those helping with this).  I believe there's
> > still a
> > > > > few
> > > > > > > changes we'd have to make to claim support though (mshadow CMake
> > > > > changes,
> > > > > > > PyPi package creation tweaks).
> > > > > > >
> > > > > > > 1:
> > > > > > >
> > > > > > >
> > > > >
> > https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > > > > > >
> > > > > > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > > > > > kellen.sunderl...@gmail.com> wrote:
> > > > > > >
> > > > > > > > Hey Steffen, I'd like to be able to merge this PR for version
> > 1.4:
> > > > > > > > https://github.com/apache/incubator-mxnet/pull/13310 . It
> > fixes a
> > > > > > > > regression in master which causes incorrect feature vectors to
> > be
> > > > > output
> > > > > > > > when using the TensorRT feature.  (Thanks to Nathalie for
> > helping me
> > > > > > > track
> > > > > > > > down the root cause of the issue).   I'm currently blocked on
> > a CI
> > > > > issue
> > > > > > > I
> > > > > > > > haven't seen before, but hope to have it resolved by EOW.
> > > > > > > >
> > > > > > > > One call-out I would make is that we currently don't support
> > Turing
> > > > > > > > architecture (sm_75).  I've been slowly trying to add support,
> > but I
> > > > > > > don't
> > > > > > > > think I'd have capacity to do this done by EOW.  Does anyone
> > feel
> > > > > > > strongly
> > > > > > > > we need this in the 1.4 release?  From my perspective this will
> > > > > already
> > > > > > > be
> > > > > > > > a strong release without it.
> > > > > > > >
> > > > > > > > On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
> > > > > steffenroc...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Thanks Patrick, lets target to get the PR's merged this week.
> > > > > > > >>
> > > > > > > >> Call for contributions from the community: Right now we have
> > 10 PR
> > > > > > > >> awaiting
> > > > > > > >> merge
> > > > > > > >> <
> > > > > > > >>
> > > > > > >
> > > > >
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
> > > > > > > >> >
> > > > > > > >> and
> > > > > > > >> we have 61 open PR awaiting review.
> > > > > > > >> <
> > > > > > > >>
> > > > > > >
> > > > >
> > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
> > > > > > > >> >
> > > > > > > >> I would appreciate if you all can help to review the open PR
> > and the
> > > > > > > >> committers can drive the merge before code freeze for 1.4.0.
> > > > > > > >>
> > > > > > > >> The contributors on the Java API are making progress, but not
> > all
> > > > > > > >> performance issues are resolved. With some luck it should be
> > > > > possible to
> > > > > > > >> code freeze towards end of this week.
> > > > > > > >>
> > > > > > > >> Are there other critical features/bugs/PR you think need to be
> > > > > included
> > > > > > > in
> > > > > > > >> 1.4.0? If so, please communicate as soon as possible.
> > > > > > > >>
> > > > > > > >> Regards,
> > > > > > > >> Steffen
> > > > > > > >>
> > > > > > > >> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
> > patric.z...@intel.com
> > > > > >
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Thanks, Steffen. I think there is NO open issue to block the
> > > > > MKLDNN to
> > > > > > > >> GA
> > > > > > > >> > now.
> > > > > > > >> >
> > > > > > > >> > BTW, several quantization related PRs (#13297,#13260) are
> > under
> > > > > the
> > > > > > > >> review
> > > > > > > >> > and I think it can be merged in this week.
> > > > > > > >> >
> > > > > > > >> > Thanks,
> > > > > > > >> >
> > > > > > > >> > --Patric
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > > -----Original Message-----
> > > > > > > >> > > From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> > > > > > > >> > > Sent: Tuesday, November 20, 2018 2:57 AM
> > > > > > > >> > > To: dev@mxnet.incubator.apache.org
> > > > > > > >> > > Subject: Re: [Announce] Upcoming Apache MXNet
> > (incubating) 1.4.0
> > > > > > > >> release
> > > > > > > >> > >
> > > > > > > >> > > On Friday the contributors working on Java API discovered
> > a
> > > > > > > potential
> > > > > > > >> > > performance problem with inference using Java API vs.
> > Python.
> > > > > > > >> > Investigation
> > > > > > > >> > > is ongoing.
> > > > > > > >> > > As the Java API is one of the main features for the
> > upcoming
> > > > > > > release,
> > > > > > > >> I
> > > > > > > >> > > suggest to post-pone the code freeze towards end of this
> > week.
> > > > > > > >> > >
> > > > > > > >> > > Please provide feedback and concern about the change in
> > dates
> > > > > for
> > > > > > > code
> > > > > > > >> > > freeze and 1.4.0 release. I will provide updates on
> > progress
> > > > > > > resolving
> > > > > > > >> > the
> > > > > > > >> > > potential performance problem.
> > > > > > > >> > >
> > > > > > > >> > > Patrick - do you think it is possible to resolve the
> > remaining
> > > > > > > issues
> > > > > > > >> on
> > > > > > > >> > MKL-
> > > > > > > >> > > DNN this week, so we can consider GA for MKL-DNN with
> > 1.4.0?
> > > > > > > >> > >
> > > > > > > >> > > Regards,
> > > > > > > >> > > Steffen
> > > > > > > >> > >
> > > > > > > >> > > On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
> > > > > mecher...@gmail.com>
> > > > > > > >> > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > I'd like to remind everyone that 'code freeze' would
> > mean
> > > > > cutting
> > > > > > > a
> > > > > > > >> > > > v1.4.x release branch and all following fixes would
> > need to be
> > > > > > > >> > backported.
> > > > > > > >> > > > Development on master can be continued as usual.
> > > > > > > >> > > >
> > > > > > > >> > > > Best
> > > > > > > >> > > > Anton
> > > > > > > >> > > >
> > > > > > > >> > > > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
> > > > > > > >> steffenroc...@gmail.com>:
> > > > > > > >> > > >
> > > > > > > >> > > > > Dear MXNet community,
> > > > > > > >> > > > > the agreed plan was to establish code freeze for 1.4.0
> > > > > release
> > > > > > > >> > > > > today. As the 1.3.1 patch release is still ongoing I
> > > > > suggest to
> > > > > > > >> > > > > post-pone the code freeze to Friday 16th November
> > 2018.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Sergey Kolychev has agreed to act as co-release
> > manager for
> > > > > all
> > > > > > > >> > > > > tasks
> > > > > > > >> > > > which
> > > > > > > >> > > > > require committer privileges. If anybody is
> > interested to
> > > > > > > >> volunteer
> > > > > > > >> > > > > as release manager - now is the time to speak up.
> > Otherwise
> > > > > I
> > > > > > > will
> > > > > > > >> > > > > manage
> > > > > > > >> > > > the
> > > > > > > >> > > > > release.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Regards,
> > > > > > > >> > > > > Steffen
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > >
> >

Reply via email to