Hi everyone, would you mind prepending [1.4.x] to the title of your PRs so we can see cherry-picks at a glance? That'd allow me to better classify the load we have on our CI (Release-branches have a higher load than master due to cache mismatches).
Best regards, Marco On Fri, Nov 30, 2018 at 2:17 AM Marco de Abreu <marco.g.ab...@googlemail.com> wrote: > Hi Naveen, > > yeah sorry, that's DockerHub acting up again (this happens every now and > then unfortunately). Basically docker pull starts multiple download threads > and it seems like sometimes a single web server request sits in the queue > forever which then slows down the docker pull (for the cache retrieval). > > Chance will be assisting with CI issues this week and I explained him my > proposed solution: Basically wrap the 'docker pull' into a timeout in > combination with a retry with backoff. Anton proposed, in case that retry > fails after a few times, we are falling back to local cache and cache > regeneration to avoid the job failing. That would solve the problem you're > encountering. We would basically wrap [1] into the timeout-retry-mechanism. > > Best regards, > Marco > > [1]: > https://github.com/apache/incubator-mxnet/blob/master/ci/docker_cache.py#L107 > > On Fri, Nov 30, 2018 at 2:01 AM Joshua Z. Zhang <cheungc...@gmail.com> > wrote: > >> Hi, I would like to bring a critical performance and stability patch of >> existing gluon dataloader to 1.4.0: >> https://github.com/apache/incubator-mxnet/pull/13447 < >> https://github.com/apache/incubator-mxnet/pull/13447>. >> >> This PR is finished, waiting for CI to pass. >> >> Steffen, could you help me add that to the tracked list? >> >> Best, >> Zhi >> >> > On Nov 29, 2018, at 4:25 PM, Naveen Swamy <mnnav...@gmail.com> wrote: >> > >> > the tests are randomly failing in different stages >> > >> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/ >> > This PR has failed 8 times so far >> > >> > On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel <steffenroc...@gmail.com >> > >> > wrote: >> > >> >> Pedro - ok. Please add PR to v1.4.x branch after merge to master and >> please >> >> update tracking page >> >> < >> >> >> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack >> >>> >> >> . >> >> Steffen >> >> >> >> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy < >> pedro.larroy.li...@gmail.com >> >>> >> >> wrote: >> >> >> >>> PR is ready from my side and passes the tests, unless somebody raises >> >>> any concerns it's good to go. >> >>> On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel < >> steffenroc...@gmail.com> >> >>> wrote: >> >>>> >> >>>> Pedro - added to 1.4.0 tracking list >> >>>> < >> >>> >> >> >> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack >> >>>> >> >>>> >> >>>> Do you have already ETA? >> >>>> Steffen >> >>>> >> >>>> On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy < >> >>> pedro.larroy.li...@gmail.com> >> >>>> wrote: >> >>>> >> >>>>> Hi all. >> >>>>> >> >>>>> There are two important issues / fixes that should go in the next >> >>>>> release in my radar: >> >>>>> >> >>>>> 1) https://github.com/apache/incubator-mxnet/pull/13409/files >> >>>>> There is a bug in shape inference on CPU when not using MKL, also we >> >>>>> are running activation on CPU via MKL when we compile CUDNN+MKLDNN. >> >>>>> I'm finishing a fix for these issues in the above PR. >> >>>>> >> >>>>> 2) https://github.com/apache/incubator-mxnet/issues/13438 >> >>>>> We are seeing crashes due to unsafe setenv in multithreaded code. >> >>>>> Setenv / getenv from multiple threads is not safe and is causing >> >>>>> segfaults. This piece of code (the handlers in pthread_atfork) >> >> already >> >>>>> caused a very difficult to diagnose hang in a previous release, >> where >> >>>>> a fork inside cudnn would deadlock the engine. >> >>>>> >> >>>>> I would remove setenv from 2) as a mitigation, but we would need to >> >>>>> check for regressions as we could be creating additional threads >> >>>>> inside the engine. >> >>>>> >> >>>>> I would suggest that we address these two major issues before the >> >> next >> >>>>> release. >> >>>>> >> >>>>> Pedro >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel < >> >>> steffenroc...@gmail.com> >> >>>>> wrote: >> >>>>>> >> >>>>>> Dear MXNet community, >> >>>>>> >> >>>>>> I will be the release manager for the upcoming Apache MXNet 1.4.0 >> >>>>> release. >> >>>>>> Sergey Kolychev will be co-managing the release and providing help >> >>> from >> >>>>> the >> >>>>>> committers side. >> >>>>>> A release candidate will be cut on November 29, 2018 and voting >> >> will >> >>>>> start >> >>>>>> December 7, 2018. Release notes have been drafted here [1]. If you >> >>> have >> >>>>> any >> >>>>>> additional features in progress and would like to include it in >> >> this >> >>>>>> release, please assure they have been merged by November 27, 2018. >> >>>>> Release >> >>>>>> schedule is available here [2]. >> >>>>>> >> >>>>>> Feel free to add any other comments/suggestions. Please help to >> >>> review >> >>>>> and >> >>>>>> merge outstanding PR's and resolve issues impacting the quality of >> >>> the >> >>>>>> 1.4.0 release. >> >>>>>> >> >>>>>> Regards, >> >>>>>> >> >>>>>> Steffen >> >>>>>> >> >>>>>> [1] >> >>>>>> >> >>>>> >> >>> >> >> >> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes >> >>>>>> >> >>>>>> [2] >> >>>>> >> >>> >> >> >> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland < >> >>>>>> kellen.sunderl...@gmail.com> wrote: >> >>>>>> >> >>>>>>> Spoke too soon[1], looks like others have been adding Turing >> >>> support as >> >>>>>>> well (thanks to those helping with this). I believe there's >> >> still >> >>> a >> >>>>> few >> >>>>>>> changes we'd have to make to claim support though (mshadow CMake >> >>>>> changes, >> >>>>>>> PyPi package creation tweaks). >> >>>>>>> >> >>>>>>> 1: >> >>>>>>> >> >>>>>>> >> >>>>> >> >>> >> >> >> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08 >> >>>>>>> >> >>>>>>> On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland < >> >>>>>>> kellen.sunderl...@gmail.com> wrote: >> >>>>>>> >> >>>>>>>> Hey Steffen, I'd like to be able to merge this PR for version >> >>> 1.4: >> >>>>>>>> https://github.com/apache/incubator-mxnet/pull/13310 . It >> >> fixes >> >>> a >> >>>>>>>> regression in master which causes incorrect feature vectors to >> >> be >> >>>>> output >> >>>>>>>> when using the TensorRT feature. (Thanks to Nathalie for >> >>> helping me >> >>>>>>> track >> >>>>>>>> down the root cause of the issue). I'm currently blocked on a >> >>> CI >> >>>>> issue >> >>>>>>> I >> >>>>>>>> haven't seen before, but hope to have it resolved by EOW. >> >>>>>>>> >> >>>>>>>> One call-out I would make is that we currently don't support >> >>> Turing >> >>>>>>>> architecture (sm_75). I've been slowly trying to add support, >> >>> but I >> >>>>>>> don't >> >>>>>>>> think I'd have capacity to do this done by EOW. Does anyone >> >> feel >> >>>>>>> strongly >> >>>>>>>> we need this in the 1.4 release? From my perspective this will >> >>>>> already >> >>>>>>> be >> >>>>>>>> a strong release without it. >> >>>>>>>> >> >>>>>>>> On Tue, Nov 20, 2018 at 6:42 PM Steffen Rochel < >> >>>>> steffenroc...@gmail.com> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>>> Thanks Patrick, lets target to get the PR's merged this week. >> >>>>>>>>> >> >>>>>>>>> Call for contributions from the community: Right now we have >> >> 10 >> >>> PR >> >>>>>>>>> awaiting >> >>>>>>>>> merge >> >>>>>>>>> < >> >>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> >> >> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-merge+ >> >>>>>>>>>> >> >>>>>>>>> and >> >>>>>>>>> we have 61 open PR awaiting review. >> >>>>>>>>> < >> >>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> >> >> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aopen+label%3Apr-awaiting-review >> >>>>>>>>>> >> >>>>>>>>> I would appreciate if you all can help to review the open PR >> >>> and the >> >>>>>>>>> committers can drive the merge before code freeze for 1.4.0. >> >>>>>>>>> >> >>>>>>>>> The contributors on the Java API are making progress, but not >> >>> all >> >>>>>>>>> performance issues are resolved. With some luck it should be >> >>>>> possible to >> >>>>>>>>> code freeze towards end of this week. >> >>>>>>>>> >> >>>>>>>>> Are there other critical features/bugs/PR you think need to be >> >>>>> included >> >>>>>>> in >> >>>>>>>>> 1.4.0? If so, please communicate as soon as possible. >> >>>>>>>>> >> >>>>>>>>> Regards, >> >>>>>>>>> Steffen >> >>>>>>>>> >> >>>>>>>>> On Mon, Nov 19, 2018 at 8:26 PM Zhao, Patric < >> >>> patric.z...@intel.com >> >>>>>> >> >>>>>>>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> Thanks, Steffen. I think there is NO open issue to block the >> >>>>> MKLDNN to >> >>>>>>>>> GA >> >>>>>>>>>> now. >> >>>>>>>>>> >> >>>>>>>>>> BTW, several quantization related PRs (#13297,#13260) are >> >>> under >> >>>>> the >> >>>>>>>>> review >> >>>>>>>>>> and I think it can be merged in this week. >> >>>>>>>>>> >> >>>>>>>>>> Thanks, >> >>>>>>>>>> >> >>>>>>>>>> --Patric >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>>> -----Original Message----- >> >>>>>>>>>>> From: Steffen Rochel [mailto:steffenroc...@gmail.com] >> >>>>>>>>>>> Sent: Tuesday, November 20, 2018 2:57 AM >> >>>>>>>>>>> To: dev@mxnet.incubator.apache.org >> >>>>>>>>>>> Subject: Re: [Announce] Upcoming Apache MXNet (incubating) >> >>> 1.4.0 >> >>>>>>>>> release >> >>>>>>>>>>> >> >>>>>>>>>>> On Friday the contributors working on Java API discovered >> >> a >> >>>>>>> potential >> >>>>>>>>>>> performance problem with inference using Java API vs. >> >>> Python. >> >>>>>>>>>> Investigation >> >>>>>>>>>>> is ongoing. >> >>>>>>>>>>> As the Java API is one of the main features for the >> >> upcoming >> >>>>>>> release, >> >>>>>>>>> I >> >>>>>>>>>>> suggest to post-pone the code freeze towards end of this >> >>> week. >> >>>>>>>>>>> >> >>>>>>>>>>> Please provide feedback and concern about the change in >> >>> dates >> >>>>> for >> >>>>>>> code >> >>>>>>>>>>> freeze and 1.4.0 release. I will provide updates on >> >> progress >> >>>>>>> resolving >> >>>>>>>>>> the >> >>>>>>>>>>> potential performance problem. >> >>>>>>>>>>> >> >>>>>>>>>>> Patrick - do you think it is possible to resolve the >> >>> remaining >> >>>>>>> issues >> >>>>>>>>> on >> >>>>>>>>>> MKL- >> >>>>>>>>>>> DNN this week, so we can consider GA for MKL-DNN with >> >> 1.4.0? >> >>>>>>>>>>> >> >>>>>>>>>>> Regards, >> >>>>>>>>>>> Steffen >> >>>>>>>>>>> >> >>>>>>>>>>> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov < >> >>>>> mecher...@gmail.com> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> I'd like to remind everyone that 'code freeze' would >> >> mean >> >>>>> cutting >> >>>>>>> a >> >>>>>>>>>>>> v1.4.x release branch and all following fixes would need >> >>> to be >> >>>>>>>>>> backported. >> >>>>>>>>>>>> Development on master can be continued as usual. >> >>>>>>>>>>>> >> >>>>>>>>>>>> Best >> >>>>>>>>>>>> Anton >> >>>>>>>>>>>> >> >>>>>>>>>>>> ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel < >> >>>>>>>>> steffenroc...@gmail.com>: >> >>>>>>>>>>>> >> >>>>>>>>>>>>> Dear MXNet community, >> >>>>>>>>>>>>> the agreed plan was to establish code freeze for 1.4.0 >> >>>>> release >> >>>>>>>>>>>>> today. As the 1.3.1 patch release is still ongoing I >> >>>>> suggest to >> >>>>>>>>>>>>> post-pone the code freeze to Friday 16th November >> >> 2018. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Sergey Kolychev has agreed to act as co-release >> >> manager >> >>> for >> >>>>> all >> >>>>>>>>>>>>> tasks >> >>>>>>>>>>>> which >> >>>>>>>>>>>>> require committer privileges. If anybody is interested >> >>> to >> >>>>>>>>> volunteer >> >>>>>>>>>>>>> as release manager - now is the time to speak up. >> >>> Otherwise >> >>>>> I >> >>>>>>> will >> >>>>>>>>>>>>> manage >> >>>>>>>>>>>> the >> >>>>>>>>>>>>> release. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Regards, >> >>>>>>>>>>>>> Steffen >> >>>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> >> >> >>