Hi everyone,

would you mind prepending [1.4.x] to the title of your PRs so we can see
cherry-picks at a glance? That'd allow me to better classify the load we
have on our CI (Release-branches have a higher load than master due to
cache mismatches).

Best regards,
Marco

On Fri, Nov 30, 2018 at 2:17 AM Marco de Abreu <marco.g.ab...@googlemail.com>
wrote:

> Hi Naveen,
>
> yeah sorry, that's DockerHub acting up again (this happens every now and
> then unfortunately). Basically docker pull starts multiple download threads
> and it seems like sometimes a single web server request sits in the queue
> forever which then slows down the docker pull (for the cache retrieval).
>
> Chance will be assisting with CI issues this week and I explained him my
> proposed solution: Basically wrap the 'docker pull' into a timeout in
> combination with a retry with backoff. Anton proposed, in case that retry
> fails after a few times, we are falling back to local cache and cache
> regeneration to avoid the job failing. That would solve the problem you're
> encountering. We would basically wrap [1] into the timeout-retry-mechanism.
>
> Best regards,
> Marco
>
> [1]:
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker_cache.py#L107
>
> On Fri, Nov 30, 2018 at 2:01 AM Joshua Z. Zhang <cheungc...@gmail.com>
> wrote:
>
>> Hi, I would like to bring a critical performance and stability patch of
>> existing gluon dataloader to 1.4.0:
>> https://github.com/apache/incubator-mxnet/pull/13447 <
>> https://github.com/apache/incubator-mxnet/pull/13447>.
>>
>> This PR is finished, waiting for CI to pass.
>>
>> Steffen, could you help me add that to the tracked list?
>>
>> Best,
>> Zhi
>>
>> > On Nov 29, 2018, at 4:25 PM, Naveen Swamy <mnnav...@gmail.com> wrote:
>> >
>> > the tests are randomly failing in different stages
>> >
>> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/
>> > This PR has failed 8 times so far
>> >
>> > On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel <steffenroc...@gmail.com
>> >
>> > wrote:
>> >
>> >> Pedro - ok. Please add PR to v1.4.x branch after merge to master and
>> please
>> >> update tracking page
>> >> <
>> >>
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
>> >>>
>> >> .
>> >> Steffen
>> >>
>> >> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy <
>> pedro.larroy.li...@gmail.com
>> >>>
>> >> wrote:
>> >>
>> >>> PR is ready from my side and passes the tests, unless somebody raises
>> >>> any concerns it's good to go.
>> >>> On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <
>> steffenroc...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Pedro - added  to 1.4.0 tracking list
>> >>>> <
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
>> >>>>
>> >>>>
>> >>>> Do you have already ETA?
>> >>>> Steffen
>> >>>>
>> >>>> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
>> >>> pedro.larroy.li...@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi all.
>> >>>>>
>> >>>>> There are two important issues / fixes that should go in the next
>> >>>>> release in my radar:
>> >>>>>
>> >>>>> 1) https://github.com/apache/incubator-mxnet/pull/13409/files
>> >>>>> There is a bug in shape inference on CPU when not using MKL, also we
>> >>>>> are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
>> >>>>> I'm finishing a fix for these issues in the above PR.
>> >>>>>
>> >>>>> 2) https://github.com/apache/incubator-mxnet/issues/13438
>> >>>>> We are seeing crashes due to unsafe setenv in multithreaded code.
>> >>>>> Setenv / getenv from multiple threads is not safe and is causing
>> >>>>> segfaults. This piece of code (the handlers in pthread_atfork)
>> >> already
>> >>>>> caused a very difficult to diagnose hang in a previous release,
>> where
>> >>>>> a fork inside cudnn would deadlock the engine.
>> >>>>>
>> >>>>> I would remove setenv from 2) as a mitigation, but we would need to
>> >>>>> check for regressions as we could be creating additional threads
>> >>>>> inside the engine.
>> >>>>>
>> >>>>> I would suggest that we address these two major issues before the
>> >> next
>> >>>>> release.
>> >>>>>
>> >>>>> Pedro
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
>> >>> steffenroc...@gmail.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Dear MXNet community,
>> >>>>>>
>> >>>>>> I will be the release manager for the upcoming Apache MXNet 1.4.0
>> >>>>> release.
>> >>>>>> Sergey Kolychev will be co-managing the release and providing help
>> >>> from
>> >>>>> the
>> >>>>>> committers side.
>> >>>>>> A release candidate will be cut on November 29, 2018 and voting
>> >> will
>> >>>>> start
>> >>>>>> December 7, 2018. Release notes have been drafted here [1]. If you
>> >>> have
>> >>>>> any
>> >>>>>> additional features in progress and would like to include it in
>> >> this
>> >>>>>> release, please assure they have been merged by November 27, 2018.
>> >>>>> Release
>> >>>>>> schedule is available here [2].
>> >>>>>>
>> >>>>>> Feel free to add any other comments/suggestions. Please help to
>> >>> review
>> >>>>> and
>> >>>>>> merge outstanding PR's and resolve issues impacting the quality of
>> >>> the
>> >>>>>> 1.4.0 release.
>> >>>>>>
>> >>>>>> Regards,
>> >>>>>>
>> >>>>>> Steffen
>> >>>>>>
>> >>>>>> [1]
>> >>>>>>
>> >>>>>
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
>> >>>>>>
>> >>>>>> [2]
>> >>>>>
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
>> >>>>>> kellen.sunderl...@gmail.com> wrote:
>> >>>>>>
>> >>>>>>> Spoke too soon[1], looks like others have been adding Turing
>> >>> support as
>> >>>>>>> well (thanks to those helping with this).  I believe there's
>> >> still
>> >>> a
>> >>>>> few
>> >>>>>>> changes we'd have to make to claim support though (mshadow CMake
>> >>>>> changes,
>> >>>>>>> PyPi package creation tweaks).
>> >>>>>>>
>> >>>>>>> 1:
>> >>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
>> >>>>>>>
>> >>>>>>> On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
>> >>>>>>> kellen.sunderl...@gmail.com> wrote:
>> >>>>>>>
>> >>>>>>>> Hey Steffen, I'd like to be able to merge this PR for version
>> >>> 1.4:
>> >>>>>>>> https://github.com/apache/incubator-mxnet/pull/13310 . It
>> >> fixes
>> >>> a
>> >>>>>>>> regression in master which causes incorrect feature vectors to
>> >> be
>> >>>>> output
>> >>>>>>>> when using the TensorRT feature.  (Thanks to Nathalie for
>> >>> helping me
>> >>>>>>> track
>> >>>>>>>> down the root cause of the issue).   I'm currently blocked on a
>> >>> CI
>> >>>>> issue
>> >>>>>>> I
>> >>>>>>>> haven't seen before, but hope to have it resolved by EOW.
>> >>>>>>>>
>> >>>>>>>> One call-out I would make is that we currently don't support
>> >>> Turing
>> >>>>>>>> architecture (sm_75).  I've been slowly trying to add support,
>> >>> but I
>> >>>>>>> don't
>> >>>>>>>> think I'd have capacity to do this done by EOW.  Does anyone
>> >> feel
>> >>>>>>> strongly
>> >>>>>>>> we need this in the 1.4 release?  From my perspective this will
>> >>>>> already
>> >>>>>>> be
>> >>>>>>>> a strong release without it.
>> >>>>>>>>
>> >>>>>>>> On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel <
>> >>>>> steffenroc...@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Thanks Patrick, lets target to get the PR's merged this week.
>> >>>>>>>>>
>> >>>>>>>>> Call for contributions from the community: Right now we have
>> >> 10
>> >>> PR
>> >>>>>>>>> awaiting
>> >>>>>>>>> merge
>> >>>>>>>>> <
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+
>> >>>>>>>>>>
>> >>>>>>>>> and
>> >>>>>>>>> we have 61 open PR awaiting review.
>> >>>>>>>>> <
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review
>> >>>>>>>>>>
>> >>>>>>>>> I would appreciate if you all can help to review the open PR
>> >>> and the
>> >>>>>>>>> committers can drive the merge before code freeze for 1.4.0.
>> >>>>>>>>>
>> >>>>>>>>> The contributors on the Java API are making progress, but not
>> >>> all
>> >>>>>>>>> performance issues are resolved. With some luck it should be
>> >>>>> possible to
>> >>>>>>>>> code freeze towards end of this week.
>> >>>>>>>>>
>> >>>>>>>>> Are there other critical features/bugs/PR you think need to be
>> >>>>> included
>> >>>>>>> in
>> >>>>>>>>> 1.4.0? If so, please communicate as soon as possible.
>> >>>>>>>>>
>> >>>>>>>>> Regards,
>> >>>>>>>>> Steffen
>> >>>>>>>>>
>> >>>>>>>>> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric <
>> >>> patric.z...@intel.com
>> >>>>>>
>> >>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Thanks, Steffen. I think there is NO open issue to block the
>> >>>>> MKLDNN to
>> >>>>>>>>> GA
>> >>>>>>>>>> now.
>> >>>>>>>>>>
>> >>>>>>>>>> BTW, several quantization related PRs (#13297,#13260) are
>> >>> under
>> >>>>> the
>> >>>>>>>>> review
>> >>>>>>>>>> and I think it can be merged in this week.
>> >>>>>>>>>>
>> >>>>>>>>>> Thanks,
>> >>>>>>>>>>
>> >>>>>>>>>> --Patric
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>> -----Original Message-----
>> >>>>>>>>>>> From: Steffen Rochel [mailto:steffenroc...@gmail.com]
>> >>>>>>>>>>> Sent: Tuesday, November 20, 2018 2:57 AM
>> >>>>>>>>>>> To: dev@mxnet.incubator.apache.org
>> >>>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating)
>> >>> 1.4.0
>> >>>>>>>>> release
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Friday the contributors working on Java API discovered
>> >> a
>> >>>>>>> potential
>> >>>>>>>>>>> performance problem with inference using Java API vs.
>> >>> Python.
>> >>>>>>>>>> Investigation
>> >>>>>>>>>>> is ongoing.
>> >>>>>>>>>>> As the Java API is one of the main features for the
>> >> upcoming
>> >>>>>>> release,
>> >>>>>>>>> I
>> >>>>>>>>>>> suggest to post-pone the code freeze towards end of this
>> >>> week.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Please provide feedback and concern about the change in
>> >>> dates
>> >>>>> for
>> >>>>>>> code
>> >>>>>>>>>>> freeze and 1.4.0 release. I will provide updates on
>> >> progress
>> >>>>>>> resolving
>> >>>>>>>>>> the
>> >>>>>>>>>>> potential performance problem.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Patrick - do you think it is possible to resolve the
>> >>> remaining
>> >>>>>>> issues
>> >>>>>>>>> on
>> >>>>>>>>>> MKL-
>> >>>>>>>>>>> DNN this week, so we can consider GA for MKL-DNN with
>> >> 1.4.0?
>> >>>>>>>>>>>
>> >>>>>>>>>>> Regards,
>> >>>>>>>>>>> Steffen
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov <
>> >>>>> mecher...@gmail.com>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> I'd like to remind everyone that 'code freeze' would
>> >> mean
>> >>>>> cutting
>> >>>>>>> a
>> >>>>>>>>>>>> v1.4.x release branch and all following fixes would need
>> >>> to be
>> >>>>>>>>>> backported.
>> >>>>>>>>>>>> Development on master can be continued as usual.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Best
>> >>>>>>>>>>>> Anton
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel <
>> >>>>>>>>> steffenroc...@gmail.com>:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Dear MXNet community,
>> >>>>>>>>>>>>> the agreed plan was to establish code freeze for 1.4.0
>> >>>>> release
>> >>>>>>>>>>>>> today. As the 1.3.1 patch release is still ongoing I
>> >>>>> suggest to
>> >>>>>>>>>>>>> post-pone the code freeze to Friday 16th November
>> >> 2018.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Sergey Kolychev has agreed to act as co-release
>> >> manager
>> >>> for
>> >>>>> all
>> >>>>>>>>>>>>> tasks
>> >>>>>>>>>>>> which
>> >>>>>>>>>>>>> require committer privileges. If anybody is interested
>> >>> to
>> >>>>>>>>> volunteer
>> >>>>>>>>>>>>> as release manager - now is the time to speak up.
>> >>> Otherwise
>> >>>>> I
>> >>>>>>> will
>> >>>>>>>>>>>>> manage
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>> release.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>> Steffen
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>>
>>

Reply via email to