Re: MXNet 1.6 as last release with Python 2 support?

2020-01-17 Thread Lai Wei
+1


Best Regards

Lai


On Fri, Jan 17, 2020 at 10:39 AM Lin Yuan  wrote:

> +1
>
> On Fri, Jan 17, 2020 at 10:04 AM Xingjian SHI 
> wrote:
>
> > +1. We should move to support Python>=3.5 only.
> >
> > Get Outlook for iOS
> > 
> > From: Lausen, Leonard 
> > Sent: Friday, January 17, 2020 10:02:30 AM
> > To: d...@mxnet.apache.org 
> > Subject: Re: MXNet 1.6 as last release with Python 2 support?
> >
> > If the lazy consensus passes, I believe the minimum Python version
> > supported
> > would be Python 3.5.
> >
> > Python 3.5 because it seems to be the minimum Python 3 version tested by
> > our CI,
> > specifically in the jobs running on Ubuntu 16.04.
> >
> > Best regards
> > Leonard
> >
> > On Fri, 2020-01-17 at 17:36 +, Lausen, Leonard wrote:
> > > Dear MXNet community,
> > >
> > > as effective January 1, 2020, no new bug reports, fixes, or changes
> will
> > be
> > > made
> > > to Python 2, and as MXNet 1.6 will be released after January 1, 2020, I
> > > suggest
> > > to announce in the MXNet 1.6 release notes that MXNet 1.6 is the last
> > release
> > > supporting Python 2.
> > >
> > > We have previously reached consensus on announcing that Python 2 is
> > dropped in
> > > the next major release (ie. MXNet 2), however, given the delay in 1.6
> > release,
> > > the plan to release 1.7 in the future and that Python 2 is dead
> already I
> > > think
> > > we can revisit this assumption.
> > >
> > > Advantages are
> > > - Time savings for developers, as Python 3 standard library contains
> more
> > >   features than Python 2, and it is more efficient to target only 1
> > language
> > >   (Python 3) instead of 2 languages (Python 2 & 3)
> > > - Simplification and cost savings for CI
> > >
> > > I thus suggest 72h lazy consensus for announcing dropping of Python 2
> as
> > > described above. If you disagree, please veto (send "-1") and we can
> > continue
> > > supporting Python 2 in all 1.x releases as per previous consensus. Note
> > that
> > > at
> > > the time of previous consensus, no 1.7 release was planned.
> > >
> > > Best regards
> > > Leonard
> >
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc1

2020-01-07 Thread Lai Wei
+1
Build from source on Ubuntu with CUDA/CUDNN/MKLDNN and tested with
keras-mxnet.
Unit tests passed and example works on CPU/GPU.


Best Regards

Lai


On Tue, Jan 7, 2020 at 11:49 AM Lin Yuan  wrote:

> Correction: it was built from source on Ubuntu 16.04
>
> On Tue, Jan 7, 2020 at 11:42 AM Lin Yuan  wrote:
>
> > +1
> >
> > Build from source on Ubuntu 18 with CUDA/CUDNN/NCCL on and verified it
> > works with Horovod 0.18.2
> >
> > On Tue, Jan 7, 2020 at 9:55 AM Przemysław Trędak 
> > wrote:
> >
> >> Dear MXNet community,
> >>
> >> This is the vote to release Apache MXNet (incubating) version 1.6.0.
> >> Voting starts today and will close on Friday 1/10/2020 23:59 PST.
> >>
> >> Link to release notes:
> >> https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
> >>
> >> Link to release candidate:
> >> https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc1
> >>
> >> Link to source and signatures on apache dist server:
> >> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc1/
> >>
> >> The differences comparing to previous release candidate 1.6.0.rc0:
> >> * Fix for RNN gradient calculation for MKLDNN ([v1.6.x] Cherry-pick
> >> MKL-DNN Rnn operator enhancements to v1.6.x (#17225))
> >> * Fix for Windows CMake build (Backport #16980 #17031 #17018 #17019 to
> >> 1.6 branch (#17213))
> >> * CPU counterpart to contrib multihead attention operators (Interleaved
> >> MHA for CPU path (#17138) (#17211))
> >> * Fix for #16060 (fix norm sparse fallback (#17149))
> >> * Fix for inconsistent names in estimator API (fix parameter names in
> the
> >> estimator api (#17051) (#17162))
> >> * Fixes for OpenMP (Backport 3rdparty/openmp fixes (#17193))
> >> * Fix for pointwise fusion speed for large networks (which was the
> reason
> >> of -1 in the vote for rc0) as well as fixes for nondeterminism in sum of
> >> squares operator and trainer parameter order (Backport #17002, #17068
> and
> >> #17114 to 1.6 branch (#17137))
> >>
> >>
> >> Please remember to TEST first before voting accordingly:
> >> +1 = approve
> >> +0 = no opinion
> >> -1 = disapprove (provide reason)
> >>
> >>
> >> Best regards,
> >> Przemyslaw Tredak
> >>
> >
>


Re: [VOTE] Release Apache MXNet (incubating) 1.5.1.rc0

2019-09-19 Thread Lai Wei
+1

build from source on GPU and tested with gluon estimator and latest
keras-mxnet.


Best Regards

Lai


On Thu, Sep 19, 2019 at 1:02 PM sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> Thank you Tao for leading this and all the community members for helping in
> this release.
>
>
> +1
>
>
> -[Y] Are release files in correct location?
>
> -[Y] Do release files have the word incubating in their name?
>
> -[Y] Are the digital signature and hashes correct?
>
> -[Y] Does DISCLAIMER file exist?
>
> -[Y] Do LICENSE and NOTICE files exists?
>
> -[Y] Is the LICENSE and NOTICE text correct?
>
> -[Y] Is the NOTICE year correct?
>
> -[Y] Un-included software dependencies are not mentioned in LICENSE or
> NOTICE?
>
> -[Y] License information is not mentioned in NOTICE?
>
> Is there any 3rd party code contained inside the release? If so:
>
> -[N] Does the software have a compatible license?
>
> -[Y] Are all software licenses mentioned in LICENSE?
>
> -[Y] Is the full text of the licenses (or pointers to it) in LICENSE?
>
> Is any of this code Apache licensed? Do they have NOTICE files? If so:
>
> -[Y] Have relevant parts of those NOTICE files been added to this NOTICE
>
> file?
>
> -[Y] Do all source files have ASF headers?
>
> -[Y] Do the contents of the release match with what's tagged in version
> control?
>
> -[N] Are there any unexpected binary files in the release?
>
> -[Y] Can you compile from source? Are the instruction clear?
>
>
> Except the license issue mentioned in this Github issue -
> https://github.com/apache/incubator-mxnet/issues/15542
>
>
> I was able to build from source on GPU(p3.2x EC2 instance) and run
> opperf-operator
> benchmark utilit
> y
> successfully
> with no regression compared to v1.5.0.
>
>
>
>
> On Thu, Sep 19, 2019 at 11:51 AM Anirudh Subramanian <
> anirudh2...@gmail.com>
> wrote:
>
> > +1
> >
> > Build from source with cmake and ran unittest for gluon and amp.
> >
> > Noticed that test_sync_batchnorm fails on p3.8xlarge (hidden by the CI
> > because passes on machines with 1 or 2 gpus).
> > I have opened an issue for the same
> > https://github.com/apache/incubator-mxnet/issues/16214 though I think
> its
> > not a blocker for this release.
> >
> > Anirudh
> >
> > On Thu, Sep 19, 2019 at 11:28 AM Chaitanya Bapat 
> > wrote:
> >
> > > +1
> > >
> > > Correctly built for GPU, CPU on Ubuntu 14.01 (10.1 Cuda for GPU)
> > > Ran image classification (resnet50+cifar10)
> > > Ran Operator Performance (opperf)
> > >
> > > On Thu, 19 Sep 2019 at 02:12, Tao Lv  wrote:
> > >
> > > > Hi community,
> > > >
> > > > Friendly reminder: it is less than 1.5 days remaining, so please take
> > > your
> > > > time to verify and vote.
> > > >
> > > > Thanks,
> > > > -tao
> > > >
> > > > On Thu, Sep 19, 2019 at 3:06 PM Lin Yuan 
> wrote:
> > > >
> > > > > +1
> > > > > Tested Horovod on GPU
> > > > >
> > > > > On Wed, Sep 18, 2019 at 6:16 AM Zhao, Patric <
> patric.z...@intel.com>
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > Tested MKLDNN backend and everything looks great.
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: Qing Lan 
> > > > > > > Sent: Wednesday, September 18, 2019 2:20 AM
> > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) 1.5.1.rc0
> > > > > > >
> > > > > > > +1 for Scala/Java test. Passed all tests for CPU/GPU build.
> > > > > > > Also tested build from source with static build.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Qing
> > > > > > > 
> > > > > > > From: Tao Lv 
> > > > > > > Sent: Tuesday, September 17, 2019 14:14
> > > > > > > To: dev@mxnet.incubator.apache.org <
> > dev@mxnet.incubator.apache.org
> > > >
> > > > > > > Subject: [VOTE] Release Apache MXNet (incubating) 1.5.1.rc0
> > > > > > >
> > > > > > > Dear MXNet community,
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > This is the 3-day vote to release Apache MXNet (incubating)
> > version
> > > > > > 1.5.1.
> > > > > > >
> > > > > > > Voting on dev@ will start September 17, 12:00pm (PST)  and
> close
> > > on
> > > > > > > September 20, 12:00pm (PST).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 1) Link to release notes:
> > > > > > >
> > > > > > >
> > > >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Notes
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2) Link to release candidate:
> > > > > > >
> > > > > > >
> https://github.com/apache/incubator-mxnet/releases/tag/1.5.1.rc0
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 3) Link to source and signatures on Apache dist server:
> > > > > > >
> > > > > > >
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.1.rc0/
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > >
> > > > > > > +1 = approve
> > > > > > >

Re: [Discussion] MXNet 1.5.1 release

2019-08-29 Thread Lai Wei
Hi Tao,

Just checked 1.5.x nightly build is passing, so 10 is not needed. I moved
it so 1.6.0 scope.

Thanks


Best Regards

Lai


On Thu, Aug 29, 2019 at 8:12 AM Tao Lv  wrote:

> @Aaron,
> Thank you for looking into these two issues. I have removed the #15609 from
> the scope of 1.5.1. Please let me know if you have any update about #15608.
>
> @Lai,
> I'm fine with the decision. License issue about MKL-DNN, cub and pybind is
> moved to next release.
>
> @Sam,
> I also removed the sidebar issue [3] from the scope of 1.5.1. Besides, I
> notice one of your cherry picks is stopped by the CI. Please take a look at
> it. Thanks.
>
> *Nice progress since the last update:*
> 1. Per the discussion, we decided to remove #15609, the license issue about
> MKL-DNN, cub and pybind, and the sidebar issue [3] from the scope of 1.5.1
> patch release;
> 2. 3 fixes [4] [5] [6] were merged into the v1.5.x branch.
>
> *Opens (suggested owners are highlighted):*
> 1. @Aaron is working on #15608 to see if we can have it in v1.5.x;
> 2. Two cherry pick PRs [7] [8] cannot pass the CI. I have pinged the
> authors to take a look at the CI failures.
> 3. @Kellen proposed 5 fixes [9] for TensorRT but till now only 3 are picked
> to v1.5.x. Please help to confirm if the other 2 are still needed.
> 4. Sorry that I missed the proposal for fixing the nightly build [10] in
> previous update. @Lai, can you help to confirm if it's still valid?
> 5. @Lin please help to make a conclusion for the GPU OOM issue caused by
> topk regression [11]. If it cannot be addressed on v1.5.x branch, I will
> remove it from the scope of this release and mark it as a known issue in
> the release note.
>
> Please find the details in
>
> https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Plan+and+Status
> .
>
> Thanks,
> -tao
>
> [1] https://github.com/apache/incubator-mxnet/pull/15609
> [2] https://github.com/apache/incubator-mxnet/pull/15608
> [3] https://github.com/apache/incubator-mxnet/issues/15200
> [4] https://github.com/apache/incubator-mxnet/pull/16029
> [5] https://github.com/apache/incubator-mxnet/pull/16026
> [6] https://github.com/apache/incubator-mxnet/pull/16028
> [7] https://github.com/apache/incubator-mxnet/pull/15803
> [8] https://github.com/apache/incubator-mxnet/pull/16027
> [9]
>
> https://github.com/apache/incubator-mxnet/issues/15613#issuecomment-520688668
> [10]
>
> https://github.com/apache/incubator-mxnet/issues/15613#issuecomment-516937546
> [11] https://github.com/apache/incubator-mxnet/issues/15703
>
>
>
> On Thu, Aug 29, 2019 at 1:06 AM Skalicky, Sam 
> wrote:
>
> > Hi Tao,
> >
> > I just talked with Aaron, lets leave the sidebar issue for later.
> >
> > I created PRs in the v1.5.x branch to cherry pick the fixes into the
> 1.5.1
> > release:
> > https://github.com/apache/incubator-mxnet/pull/16027
> > https://github.com/apache/incubator-mxnet/pull/16028
> >
> > Thanks for your work on this release!
> > Sam
> >
> > On Aug 28, 2019, at 9:35 AM, Lai Wei  > roywei...@gmail.com>> wrote:
> >
> > Hi,
> >
> > Regrading the license issue[1],  we still have item 3, 4, 5 left.
> > I think it's better to remove them from 1.5.1 release scope and target
> for
> > 1.6.0 as it need more time and requires changes that should not go into
> > patch release.
> >
> >
> > [1] https://github.com/apache/incubator-mxnet/issues/15542
> >
> > Best Regards
> >
> > Lai
> >
> >
> > On Wed, Aug 28, 2019 at 9:20 AM Aaron Markham  > <mailto:aaron.s.mark...@gmail.com>>
> > wrote:
> >
> > 5 no. Install page defaults to master so you don't need to pick it.
> > 6 probably, but there might be other PRs needed. I'd check out the branch
> > and attempt the install across platforms to be sure.
> >
> > On Wed, Aug 28, 2019, 08:55 Tao Lv  > ta...@apache.org>> wrote:
> >
> > Hi Aaron,
> >
> > They were proposed to be ported to v1.5.x at the beginning of the
> > discussion but I didn't see any action for that. So I'm wondering if
> > they're still needed. I asked for that in the last update on 8/20 but
> > didn't get a response.
> >
> > If they're still needed, I hope someone who is more familiar with Julia
> > frontend can help to cherry pick the commits to the v1.5.x branch.
> >
> > thanks,
> > -tao
> >
> > On Wed, Aug 28, 2019 at 11:43 PM Aaron Markham <
> > aaron.s.mark...@gmail.com<mailto:aaron.s.mark...@gmail.com>>
> > wrote:
> >
> > I don'

Re: [Discussion] MXNet 1.5.1 release

2019-08-28 Thread Lai Wei
t; mxnet/blob/master/CODEOWNERS
> > > > > > > > > > > >
> > > > > > > > > > > > Do we have regularly build, run, functionality and
> > > > > performance
> > > > > > > > > > > > testing
> > > > > > > > > > > for
> > > > > > > > > > > > this release?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > --Patric
> > > > > > > > > > > >
> > > > > > > > > > > > > -Original Message-
> > > > > > > > > > > > > From: Tao Lv 
> > > > > > > > > > > > > Sent: Monday, August 12, 2019 8:59 PM
> > > > > > > > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > > > > > > > Subject: Re: [Discussion] MXNet 1.5.1 release
> > > > > > > > > > > > >
> > > > > > > > > > > > > Update:
> > > > > > > > > > > > >
> > > > > > > > > > > > > We're cherry picking fixes from the master to the
> > > v1.5.x
> > > > > > > branch.
> > > > > > > > > > > > > Some
> > > > > > > > > > > of
> > > > > > > > > > > > > them are already merged. Please find details on the
> > > cwiki
> > > > > > page:
> > > > > > > > > > > > >
> > > > > > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Pl
> > > > > > > > > > > > > an+a
> > > > > > > > > > > > > nd+Status
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >  There are still 3 opens:
> > > > > > > > > > > > > 1. Nightly test failure on CI (
> > > > > > > > > > > > >
> > https://github.com/apache/incubator-mxnet/issues/15374
> > > ):
> > > > > The
> > > > > > > > issue
> > > > > > > > > > > > > is
> > > > > > > > > > > > still
> > > > > > > > > > > > > open. I'm wondering if it has been fixed or not. If
> > > not,
> > > > is
> > > > > > > there
> > > > > > > > > > > anyone
> > > > > > > > > > > > > working on it?
> > > > > > > > > > > > > 2. Broken Sidebar on website API for master and
> > 1.5.0 (
> > > > > > > > > > > > >
> > https://github.com/apache/incubator-mxnet/issues/15200
> > > ):
> > > > I
> > > > > > > don't
> > > > > > > > > > > > > see
> > > > > > > > > > > any
> > > > > > > > > > > > > progress on this issue? Do we still want to include
> > it
> > > > into
> > > > > > > 1.5.1
> > > > > > > > > > > > > patch
> > > > > > > > > > > > release?
> > > > > > > > > > > > > 3. License issues need to be fixed before 1.6
> > release (
> > > > > > > > > > > > >
> > https://github.com/apache/incubator-mxnet/issues/15542
> > > ):
> > > > > > > > Currently
> > > > > > > > > > > > > the license issue for code and images is partially
> > > fixed
> > > > on
> > > > > > the
> > > > > > > > > > > > > master
> > > > > > > > > > > > branch and
> > > > > > > > > > > > > will be picked to v1.5.x soon. MKLML license issue
> is
> > > > > pushed
> > > > > > > out
> > > > > > &g

Re: [apache/incubator-mxnet] [RFC][WIP] RFC Issue Mirroring to d...@mxnet.apache.org (#15749)

2019-08-05 Thread Lai Wei
+1 cycled reference between github disucssion and dev list dicussion is 
confusing.

-- 
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/15749#issuecomment-518441849

Re: [Discussion] MXNet 1.5.1 release

2019-07-31 Thread Lai Wei
Hi Tao,

Thank you so much for driving it.  Currently nightly test on tutorials are
failing and it need to be fixed. [3]
I have updated the issue[1] and cwiki.[2]

[1] https://github.com/apache/incubator-mxnet/issues/15613
[2]
https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Plan+and+Status
[3] https://github.com/apache/incubator-mxnet/issues/15374

Best Regards

Lai


On Wed, Jul 31, 2019 at 8:04 AM Tao Lv  wrote:

>  Hi community,
>
>
>
> Thanks for the initiative from Sam (samskalicky@github), we already have a
> discussion thread [1] on github about the defects and bugs exposed in the
> 1.5.0 release.
>
> Shufan (juliusshufan@github) and I (TaoLv@github) would like to manage the
> release of 1.5.1. This will be our first debut on the release process, your
> comments are always valuable.
>
>
>
> Per the SemVer 2.0 [2], MXNet 1.5.1 will be a patch release which contains
> backwards-compatible fixes only.
>
> I have created a page on cwiki [3] to track the release process and moved
> the issues and PRs mentioned in the github discussion thread to the page.
>
>
>
> Here I would like to ask the community to:
>
> (1) Raise any other defect or regression you identified in the 1.5.0
> release. Please file a github issue for it and note the issue number in
> this thread;
>
> (2) Please comment with one sentence for why you think the issue is
> critical and must have in the 1.5.1 release;
>
> (3) If the issue is already fixed on master branch or already have a PR
> WIP, please also note the fix commit id or PR number;
>
> (4) If the issue is still open and there is no PR WIP, please indicate
> whether you'd be willing to help it out;
>
> (5) Feel free to comment if any other suggestion for the release.
>
>
>
> I suggest to keep this thread open for one week to collect enough
> information and proposals before we decide the timeline for the release. So
> your timely response will be highly appreciated!
>
>
>
> PS: Sorry to say that even as a committer, this is the first time for me to
> manage a release. So it would be great if an experienced committer can help
> to guide the process.
>
>
>
> -tao
>
>
>
> [1] https://github.com/apache/incubator-mxnet/issues/15613
>
> [2] https://semver.org/
>
> [3]
>
> https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Plan+and+Status
>


[ANNOUNCE] Release Apache MXNet (incubating) version 1.5.0

2019-07-29 Thread Lai Wei
Dear all,

The Apache MXNet (incubating) community is happy to announce Apache MXNet
(incubating) version 1.5.0!

Release blog post:

https://blogs.apache.org/mxnet/entry/apache-mxnet-incubating-1-5

https://medium.com/apache-mxnet/apache-mxnet-1-5-0-release-is-now-available-4138f5233401

Apache MXNet (incubating) is a deep learning framework designed for both
efficiency and flexibility. It allows you to mix symbolic and imperative
programming to maximize efficiency and productivity.

This release includes several new features, userbility improvements, bug
fixes and performance improvements.

A full list of the changes in this release can be found in the release
notes:

https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes

A link to the download can be found here:

http://mxnet.incubator.apache.org/versions/master/install/download.html

If you prefer to build from source and experiment with various compile-time
configuration options, use this link to get the instructions:

http://mxnet.incubator.apache.org/versions/master/install/index.html?version=v1.5.0&platform=Linux&language=Python&processor=CPU

Or you can download and play with MXNet easily using one of the options
below:

1. The Pip packages can be found here:
https://pypi.python.org/pypi/mxnet

2. The Docker Images can be found here:
https://hub.docker.com/r/mxnet/python/

Links in Maven to the published Scala packages:

https://repository.apache.org/#nexus-search;gav~org.apache.mxnet~~1.5.0~~

and to the experimental Clojure packages:

https://repository.apache.org/#nexus-search;gav~org.apache.mxnet.contrib.clojure~~1.5.0~~

3. The Release Tag:

https://github.com/apache/incubator-mxnet/releases/tag/1.5.0

MXNet Resources
- Our discussion forum (https://discuss.mxnet.io)
- MXNet user mailing list (
https://lists.apache.org/list.html?u...@mxnet.apache.org)
- MXNet dev mailing list (
https://lists.apache.org/list.html?d...@mxnet.apache.org)
- StackOverflow mxnet tag (https://stackoverflow.com/questions/tagged/mxnet)
- MXNet website (https://mxnet.incubator.apache.org/faq/)
- Github issues (https://github.com/apache/incubator-mxnet/issues)
- Wiki (https://cwiki.apache.org/confluence/display/MXNET)

Attend one of the regular user groups meetings:
https://cwiki.apache.org/confluence/x/7BY0BQ

For more information on Apache MXNet (incubating), please see:

https://mxnet.incubator.apache.org/

Best regards,
Apache MXNet (incubating) Team

___

DISCLAIMER:

Apache MXNet (incubating) is an effort undergoing incubation at The Apache
Software Foundation (ASF), sponsored by the name of Apache Incubator PMC.
Incubation is required of all newly accepted projects until a further
review indicates that the infrastructure, communications, and decision
making process have stabilized in a manner consistent with other successful
ASF projects. While incubation status is not necessarily a reflection of
the completeness or stability of the code, it does indicate that the
project has yet to be fully endorsed by the ASF.


Re: [RESULTS] [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc2

2019-07-15 Thread Lai Wei
Hi Denis and all,

You can find updates of the vote @general list[1]
We lack one more vote now and have a potential license issue. Whether it’s
a release blocker is not determined yet.



[1]
https://lists.apache.org/thread.html/5365cdab7dee08d220e32decc76fd54aa05e29bc891c416828cb64d2@%3Cgeneral.incubator.apache.org%3E



On Mon, Jul 15, 2019 at 8:13 PM Davydenko, Denis <
dzianis.davydze...@gmail.com> wrote:

> Hi, Michael,
>
> Could you please update whether you had a chance to very MXNet v1.5.0rc2?
>
>
> On 7/10/19, 10:57 AM, "Michael Wall"  wrote:
>
> Will make time to review before Sun.  Thanks for the note.
>
> Mike
>
> On Wed, Jul 10, 2019 at 1:52 PM Lai Wei  wrote:
>
> > Dear MXNet mentors,
> >
> > Since the vote has passed on dev, I started voting on general@
> >
> > Justin is asking did any mentors vote on the release[1]. Could you
> please
> > help with the vote on general@ ?
> > Really appreciate it, thanks a lot!
> >
> > [1]
> >
> >
> https://mail-archives.apache.org/mod_mbox/incubator-general/201907.mbox/%3c380abca2-01e5-4200-bed1-b943a223e...@classsoftware.com%3e
> >
> >
> > Best Regards
> >
> > Lai
> >
> >
> > On Wed, Jul 10, 2019 at 2:19 AM Lai Wei  wrote:
> >
> > > Dear MXNet community,
> > >
> > > I'm happy to announce the results of the vote.
> > >
> > > This vote passes with 5 +1 votes (3 binding) and no 0 or -1 votes.
> > >
> > > +1 votes
> > > * Sheng Zha / binding
> > > * Qing Lan / binding
> > > * Sandeep Krishnamurthy / binding
> > > * Przemysław Trędak
> > > * Patric Zhao
> > >
> > > 0 votes
> > > * No votes
> > >
> > > -1 votes
> > > * No votes
> > >
> > > Vote thread can be found here [1]. The list of members can be
> found here
> > > [2].
> > >
> > > I'll continue with the release process and the release
> announcement will
> > > follow in the next few days.
> > >
> > >
> > > Best regards,
> > > Lai
> > >
> > > [1]
> > >
> >
> https://lists.apache.org/thread.html/50fe473a3e03c891caccb8cae8e5195bb740a4758f7688790dff70df@%3Cdev.mxnet.apache.org%3E
> > > [2] http://incubator.apache.org/projects/mxnet.html
> > >
> > >
> > >
> > >
> > > Best Regards
> > >
> > > Lai
> > >
> >
>
>
>
> --
Best Regards

Lai


Re: [RESULTS] [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc2

2019-07-10 Thread Lai Wei
Dear MXNet mentors,

Since the vote has passed on dev, I started voting on general@

Justin is asking did any mentors vote on the release[1]. Could you please
help with the vote on general@ ?
Really appreciate it, thanks a lot!

[1]
https://mail-archives.apache.org/mod_mbox/incubator-general/201907.mbox/%3c380abca2-01e5-4200-bed1-b943a223e...@classsoftware.com%3e


Best Regards

Lai


On Wed, Jul 10, 2019 at 2:19 AM Lai Wei  wrote:

> Dear MXNet community,
>
> I'm happy to announce the results of the vote.
>
> This vote passes with 5 +1 votes (3 binding) and no 0 or -1 votes.
>
> +1 votes
> * Sheng Zha / binding
> * Qing Lan / binding
> * Sandeep Krishnamurthy / binding
> * Przemysław Trędak
> * Patric Zhao
>
> 0 votes
> * No votes
>
> -1 votes
> * No votes
>
> Vote thread can be found here [1]. The list of members can be found here
> [2].
>
> I'll continue with the release process and the release announcement will
> follow in the next few days.
>
>
> Best regards,
> Lai
>
> [1]
> https://lists.apache.org/thread.html/50fe473a3e03c891caccb8cae8e5195bb740a4758f7688790dff70df@%3Cdev.mxnet.apache.org%3E
> [2] http://incubator.apache.org/projects/mxnet.html
>
>
>
>
> Best Regards
>
> Lai
>


[RESULTS] [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc2

2019-07-10 Thread Lai Wei
Dear MXNet community,

I'm happy to announce the results of the vote.

This vote passes with 5 +1 votes (3 binding) and no 0 or -1 votes.

+1 votes
* Sheng Zha / binding
* Qing Lan / binding
* Sandeep Krishnamurthy / binding
* Przemysław Trędak
* Patric Zhao

0 votes
* No votes

-1 votes
* No votes

Vote thread can be found here [1]. The list of members can be found here
[2].

I'll continue with the release process and the release announcement will
follow in the next few days.


Best regards,
Lai

[1]
https://lists.apache.org/thread.html/50fe473a3e03c891caccb8cae8e5195bb740a4758f7688790dff70df@%3Cdev.mxnet.apache.org%3E
[2] http://incubator.apache.org/projects/mxnet.html




Best Regards

Lai


Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc2

2019-07-09 Thread Lai Wei
+1
Tested the following works fine:

1. Built from source on OSX, Ubuntu CPU, GPU
2. Ran example/gluon image classification on CPU, GPU
3. Built latest Keras-MXNet from source and all tests passed.



Best Regards

Lai


On Tue, Jul 9, 2019 at 2:11 PM Qing Lan  wrote:

> Have successfully fixed the issue on OSX.
>
> Scala/Java build is fine:
>
> osx-cpupassed (Qing)
> linux-cpu  passed (Zach)
> linux-gpu  passed (Zach)
>
> +1 for the release.
>
> Thanks,
> Qing
>
>
> 
> From: Qing Lan 
> Sent: Monday, July 8, 2019 12:47
> To: d...@mxnet.apache.org; dev@mxnet.incubator.apache.org
> Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc2
>
> Hi All,
>
> I found the problem when I tried to build from source with my Mac:
>
> clang: error: unsupported option '-fopenmp'
> clang: error: unsupported option '-fopenmp'
> make: *** [build/src/operator/nn/mkldnn/mkldnn_act.o] Error 1
> make: *** [build/src/operator/nn/cudnn/cudnn_batch_norm.o] Error 1
>
> I use "make -j4" with tar.gz package
>
> Thanks,
> Qing
>
>
>
> 
> From: Sheng Zha 
> Sent: Friday, July 5, 2019 17:42
> To: d...@mxnet.apache.org
> Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc2
>
> +1
>
> On 2019/06/27 17:05:40, Lai Wei  wrote:
> > Dear MXNet community,
> >
> > This is the 3-day vote to release Apache MXNet (incubating) version
> 1.5.0.
> > Voting on dev@ will start June 26, 23:59:59(PST)  and close on June 29,
> > 23:59:59.
> >
> > 1) Link to release notes:
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes
> >
> >
> > 2) Link to release candidate:
> >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc2
> >
> >
> >
> > 3) Link to source and signatures on apache dist server:
> >
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc2/
> >
> >
> >
> > Please remember to TEST first before voting accordingly:
> >
> > +1 = approve
> > +0 = no opinion
> > -1 = disapprove (provide reason)
> > --
> > Best Regards
> >
> > Lai
> >
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc2

2019-07-05 Thread Lai Wei
Dear MXNet community,

I'm extending this vote to July 9, 23:59:59 as there is no code change
required from the performance blocker discussed in rc1 thread.[1]

Your help to validate and vote this release is much appreciated.

[1]
https://lists.apache.org/thread.html/154ef1e4010671e7375c7a7cbedb413d5a4a3677321488440fb32a3a@%3Cdev.mxnet.apache.org%3E


Best Regards

Lai


On Thu, Jun 27, 2019 at 10:05 AM Lai Wei  wrote:

> Dear MXNet community,
>
> This is the 3-day vote to release Apache MXNet (incubating) version 1.5.0.
> Voting on dev@ will start June 26, 23:59:59(PST)  and close on June 29,
> 23:59:59.
>
> 1) Link to release notes:
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes
>
>
> 2) Link to release candidate:
>
> https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc2
>
>
>
> 3) Link to source and signatures on apache dist server:
>
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc2/
>
>
>
> Please remember to TEST first before voting accordingly:
>
> +1 = approve
> +0 = no opinion
> -1 = disapprove (provide reason)
> --
> Best Regards
>
> Lai
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-07-05 Thread Lai Wei
Hi all,

An update on the regression issue:

There is no significant regression on operators between 1.4.1 and 1.5.0
according to latest finding here [1].
The previous possible regresion observed is due to profiler change between
1.4.1 and 1.5.0 so it's not an apple to apple comparison. Please refer to
the performance results using time and timeit module from this comment. [2]

With that, let's restart voting on 1.5.0.rc2, as there is no code change
required.

[1]
https://github.com/apache/incubator-mxnet/issues/15429#issuecomment-508865398
[2]
https://github.com/apache/incubator-mxnet/issues/15429#issuecomment-508831150





Best Regards

Lai


On Sat, Jun 29, 2019 at 12:35 PM Chris Olivier 
wrote:

> for batch norm, I mean. max*
>
> On Sat, Jun 29, 2019 at 12:34 PM Chris Olivier 
> wrote:
>
> > what’s with the mac memory usage being 2x in 1.4? As I am not sure where
> > the number is coming from (if it’s my profiler code, I wouldn’t consider
> it
> > terribly meaningful), but it is the same everywhere else, so it kind of
> > sticks out.
> >
> > On Thu, Jun 27, 2019 at 3:36 PM sandeep krishnamurthy <
> > sandeep.krishn...@gmail.com> wrote:
> >
> >> Hello Ciyong/Pedro,
> >>
> >> Ran operator benchmarks on 1.4.1 and 1.5.0.rc2. (Not complete, doesn’t
> >> cover all MXNet operators, not presented in best possible way, still
> WIP)
> >>
> >>
> https://gist.github.com/sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50
> >>
> >> Following operators looks slower in 1.5 compared to 1.4.1:
> >> - BatchNorm
> >> - Pooling
> >> - FullyConnected
> >> - batch_dot
> >> - Dot
> >> - broadcast_mul
> >> - log_softmax
> >> and few other operators
> >>
> >> Also, several operators runs a lot faster on 1.5 compared to 1.4.1. For
> >> example - Convolution, flatten, elementwise operators etc. So I see that
> >> likely few operators have regressed noticeably, however, due to other
> >> operator performance improvements, the end effect is not that
> significant
> >> hiding a lot of regression. We need more detailed analysis per operator
> >> performance. We will not be able to do this for current release, we
> should
> >> have a more concrete way to determining such performance regression
> before
> >> next release.
> >>
> >> Setup:
> >> 1.5 => Build from source (head of 1.5.rc2 tag), built with MKLDNN
> >> 1.4.1 => PyPi mxnet-mkl==1.4.1
> >> Machine: C5.18X
> >> No explicit environment variable were set
> >> Operator benchmark code -
> >> https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
> >>
> >> Best,
> >> Sandeep
> >>
> >>
> >> On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
> >> pedro.larroy.li...@gmail.com>
> >> wrote:
> >>
> >> > I will try to run a few benchmarks in a bare metal instance tonight to
> >> > remove virtualization variance for the measurements and provide some
> >> > numbers.
> >> >
> >> > Please propose a set of models / examples that would be desirable to
> >> > run before the release and provide a link to an easy to run script
> >> > with instructions so we can validate the release better.
> >> >
> >> > Thank you.
> >> >
> >> > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei  wrote:
> >> > >
> >> > > Dear @dev,
> >> > >
> >> > > I m cancelling the vote for cached op fix:
> >> > >
> >> > > https://github.com/apache/incubator-mxnet/pull/15298
> >> > >
> >> > > As for the possible cpu training regression, it looks like not a
> >> blocker
> >> > > for now.
> >> > >
> >> > > I will start a new rc2 vote, please help to validate.
> >> > >
> >> > > Thanks!
> >> > >
> >> > >
> >> > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <
> ciyong.c...@intel.com>
> >> > wrote:
> >> > >
> >> > > > Hi Pedro,
> >> > > >
> >> > > > I was able to reproduced the similar result (v1.5 is ~%5.6 slower
> >> than
> >> > > > v1.4, I was using 18 cores for computing) with your script on
> >> > C5.18xlarge.
> >> > > > But need to bind the cores with below command when running the
> >> script,
> >> >

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-28 Thread Lai Wei
ay to determining such performance regression
> > before
> > next release.
> >
> > Setup:
> > 1.5 => Build from source (head of 1.5.rc2 tag), built with MKLDNN
> > 1.4.1 => PyPi mxnet-mkl==1.4.1
> > Machine: C5.18X
> > No explicit environment variable were set
> > Operator benchmark code -
> >
> https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf
> >
> > Best,
> > Sandeep
> >
> >
> > On Thu, Jun 27, 2019 at 10:42 AM Pedro Larroy <
> > pedro.larroy.li...@gmail.com>
> > wrote:
> >
> > > I will try to run a few benchmarks in a bare metal instance
> tonight to
> > > remove virtualization variance for the measurements and provide
> some
> > > numbers.
> > >
> > > Please propose a set of models / examples that would be desirable
> to
> > > run before the release and provide a link to an easy to run script
> > > with instructions so we can validate the release better.
> > >
> > > Thank you.
> > >
> > > On Thu, Jun 27, 2019 at 10:01 AM Lai Wei 
> wrote:
> > > >
> > > > Dear @dev,
> > > >
> > > > I m cancelling the vote for cached op fix:
> > > >
> > > > https://github.com/apache/incubator-mxnet/pull/15298
> > > >
> > > > As for the possible cpu training regression, it looks like not a
> > blocker
> > > > for now.
> > > >
> > > > I will start a new rc2 vote, please help to validate.
> > > >
> > > > Thanks!
> > > >
> > > >
> > > > On Thu, Jun 27, 2019 at 10:06 PM Chen, Ciyong <
> ciyong.c...@intel.com
> > >
> > > wrote:
> > > >
> > > > > Hi Pedro,
> > > > >
> > > > > I was able to reproduced the similar result (v1.5 is ~%5.6
> slower
> > than
> > > > > v1.4, I was using 18 cores for computing) with your script on
> > > C5.18xlarge.
> > > > > But need to bind the cores with below command when running the
> > script,
> > > > > (without setting the env variables, I got a close time (<1%)
> with
> > v1.5
> > > and
> > > > > v1.4)
> > > > > export
> > KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
> > > > > export OMP_NUM_THREADS=18
> > > > >
> > > > > Did you set any env variables during running?
> > > > >
> > > > > The performance result I got as below:
> > > > > 1) 1.4.1.rc0 (1a7199691f5cbc6012bb53eecbf884bed5ae6590)
> > > > > real12m10.856s
> > > > > user234m49.576s
> > > > > sys 4m38.044s
> > > > >
> > > > > 2) 1.5.0.rc1 (4d9667121ae6fb643f2a02ab15e25231ed756cde)
> > > > > real12m52.140s
> > > > > user246m30.740s
> > > > > sys 5m8.188s
> > > > >
> > > > > As I looked at the profiling data, most of the ops have same
> perf
> > > between
> > > > > v1.4 and v1.5. But some ops like " _backward_BatchNorm" and
> > "Pooling"
> > > is
> > > > > ~1.37x slower on v1.5 compared with v1.4.
> > > > > Will do further analysis on these ops.
> > > > >
> > > > > Here's the hardware/OS info from my side:
> > > > > --Python Info--
> > > > > Version  : 3.6.8
> > > > > Compiler : GCC 7.3.0
> > > > > Build: ('default', 'Dec 30 2018 01:22:34')
> > > > > Arch : ('64bit', '')
> > > > > Pip Info---
> > > > > Version  : 19.0.3
> > > > > Directory:
> > > > >
> > /home/ubuntu/anaconda3/envs/perf-mxnet/lib/python3.6/site-packages/pip
> > > > > --MXNet Info---
> > > > > Version  : 1.5.0
> > > > > Directory: /home/ubuntu/ws/incubator-mxnet/python/mxnet
> > > > > Hashtag not found. Not installed from pre-built packa

[VOTE] Release Apache MXNet (incubating) version 1.5.0.rc2

2019-06-27 Thread Lai Wei
Dear MXNet community,

This is the 3-day vote to release Apache MXNet (incubating) version 1.5.0.
Voting on dev@ will start June 26, 23:59:59(PST)  and close on June 29,
23:59:59.

1) Link to release notes:
https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes


2) Link to release candidate:

https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc2



3) Link to source and signatures on apache dist server:

https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc2/



Please remember to TEST first before voting accordingly:

+1 = approve
+0 = no opinion
-1 = disapprove (provide reason)
-- 
Best Regards

Lai


Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-27 Thread Lai Wei
USE_DIST_KVSTORE: "OFF" # Build with DIST_KVSTORE support
> > > > > USE_PLUGINS_WARPCTC: "OFF" # Use WARPCTC Plugins
> > > > > USE_PLUGIN_CAFFE: "OFF" # Use Caffe Plugin
> > > > > USE_CPP_PACKAGE: "OFF" # Build C++ Package
> > > > > USE_MXNET_LIB_NAMING: "ON" # Use MXNet library naming
> > conventions.
> > > > > USE_GPROF: "OFF" # Compile with gprof (profiling) flag
> > > > > USE_CXX14_IF_AVAILABLE: "OFF" # Build with C++14 if the compiler
> > > > > supports it
> > > > > USE_VTUNE: "OFF" # Enable use of Intel Amplifier XE (VTune)) #
> > > > > one could set VTUNE_ROOT for search path
> > > > > ENABLE_CUDA_RTC: "ON" # Build with CUDA runtime compilation
> > > > > support
> > > > > BUILD_CPP_EXAMPLES: "ON" # Build cpp examples
> > > > > INSTALL_EXAMPLES: "OFF" # Install the example source files.
> > > > > USE_SIGNAL_HANDLER: "ON" # Print stack traces on segfaults.
> > > > > USE_TENSORRT: "OFF" # Enable infeference optimization with
> TensorRT.
> > > > > USE_ASAN: "OFF" # Enable Clang/GCC ASAN sanitizers.
> > > > > ENABLE_TESTCOVERAGE: "OFF" # Enable compilation with test
> > > > > coverage metric output
> > > > > CMAKE_BUILD_TYPE: "Release"
> > > > > CMAKE_CUDA_COMPILER_LAUNCHER: "ccache"
> > > > > CMAKE_C_COMPILER_LAUNCHER: "ccache"
> > > > > CMAKE_CXX_COMPILER_LAUNCHER: "ccache"
> > > > >
> > > > > commit 4d9667121ae6fb643f2a02ab15e25231ed756cde (HEAD, tag:
> > > > > 1.5.0.rc1,
> > > > > upstream/v1.5.x)
> > > > > commit 1a7199691f5cbc6012bb53eecbf884bed5ae6590 (HEAD, tag:
> > > > > 1.4.1.rc0,
> > > > > upstream/v1.4.x)
> > > > >
> > > > > curl http://169.254.169.254/latest/meta-data/instance-type
> > > > > c5d.18xlarge
> > > > >
> > > > >
> > > > > Version  : 3.6.7
> > > > > Compiler : GCC 8.2.0
> > > > > Build: ('default', 'Oct 22 2018 11:32:17')
> > > > > Arch : ('64bit', 'ELF')
> > > > > Pip Info---
> > > > > Version  : 19.1.1
> > > > > Directory: /home/piotr/mxnet_1.5/py3_venv/lib/python3.6/site-
> > packages/pip
> > > > > --MXNet Info---
> > > > > Version  : 1.5.0
> > > > > Directory: /home/piotr/mxnet_1.5/python/mxnet
> > > > > Hashtag not found. Not installed from pre-built package.
> > > > > --System Info--
> > > > > Platform :
> Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > > > system   : Linux
> > > > > node : ip-172-31-63-171
> > > > > release  : 4.15.0-1035-aws
> > > > > version  : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > > > --Hardware Info--
> > > > > machine  : x86_64
> > > > > processor: x86_64
> > > > > Architecture:x86_64
> > > > > CPU op-mode(s):  32-bit, 64-bit
> > > > > Byte Order:  Little Endian
> > > > > CPU(s):  72
> > > > > On-line CPU(s) list: 0-71
> > > > > Thread(s) per core:  2
> > > > > Core(s) per socket:  18
> > > > > Socket(s):   2
> > > > > NUMA node(s):2
> > > > > Vendor ID:   GenuineIntel
> > > > > CPU family:  6
> > > > > Model:   85
> > > > > Model name:  Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > > > > Stepping:4
> > > > > CPU MHz: 1326.446
> > > > > BogoMIPS:6000.00
> > > > > Hypervisor vendor:   KVM
> > > > > Virtualization type: full
> > > > > L1d cache:   32K
> > > > > L1i cache:   32K
> > > > > L2 cache:1024K
> > > > > L3 cache:25344K
> > > > > NUMA node0 CPU(s):   0-17,36-53
> > > > > NUMA node1 CPU(s):   18-35,54-71
> > > > > Flags:

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-20 Thread Lai Wei
Hi Anirudh,

Thanks for jumping into this quickly, I followed up on the issue.

I was meant for sockeye developer/maintainers to help setup nightly tests
and raise issues early.

Thanks!

On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin 
wrote:

> In GluonNLP we are testing with MXNET nightly build for each PR, and we did
> find some MXNet related issue caught by the CI.
> I recommend other toolkits also add integration tests with MXNet nightly.
> It helps identify issues early.
>
> Best,
> Haibin
>
> On Thu, Jun 20, 2019 at 18:52 Zhao, Patric  wrote:
>
> > Thanks to raise the issue and we will take a look ASAP.
> >
> > The downstream cases is not in the MXNet CI so it's hard to catch the
> > potential bugs or performance degradation for MXNet developers.
> >
> > In the future, I suggest adding the major downstream test cases, like
> from
> > sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test.
> > If it's still too heavy,  maybe testing it weekly or monthly :)
> >
> > Thanks,
> >
> > --Patric
> >
> > > -Original Message-
> > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> > > Sent: Friday, June 21, 2019 9:31 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Cc: d...@mxnet.apache.org
> > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
> > >
> > > Hi Lai,
> > >
> > > I have opened an issue:
> > > https://github.com/apache/incubator-mxnet/issues/15297
> > > I came to know about this issue only today and I have not been
> monitoring
> > > sockeye.
> > > I jumped onto this issue to make sure it wasn't caused by the dlpack
> > changes.
> > > Also, I don't  think sockeye CI checks against master, it is using
> 1.4.1.
> > >
> > > Anirudh
> > >
> > >
> > > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei  wrote:
> > >
> > > > Hi,
> > > >
> > > > Could you share which test failed and what’s the crash? How to
> > > > reproduce it?
> > > >
> > > > I was able to install sockeye and run all tests passed. Using python
> > > > setup.py test
> > > >
> > > > I have tested both nightly pip package and 1.5.0.rc1
> > > >
> > > > It would be great to create an issue with reproducible steps and move
> > > > the discussion there.
> > > >
> > > > Also I see sockeye nightly build[1] has been failing for some time,
> if
> > > > it’s due to MXNet change, please raise this early so we can track and
> > > > solve it in time rather than block the release during vote time.
> > > >
> > > > [1] https://travis-ci.org/awslabs/sockeye
> > > >
> > > >
> > > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > > >  > > > >
> > > > wrote:
> > > >
> > > > > I was able to reproduce a crash with the commit
> > > > > 09202f7f261954383aa387144524d38f83f18d06 but not with the commit
> > > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > > >
> > > > > Anirudh
> > > > >
> > > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei 
> wrote:
> > > > >
> > > > > > Hi Przemyslaw,
> > > > > >
> > > > > > Is there an issue with more details to track the problem?
> > > > > >
> > > > > >
> > > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak
> > > > > > 
> > > > > > wrote:
> > > > > >
> > > > > > > -1
> > > > > > >
> > > > > > > There is a crash in sockeye unit test (python setup.py test)
> > > > > > > observed starting with nightly 1.5 build from 6/13 and still
> > > > > > > occuring in
> > > > > 1.5rc1. I
> > > > > > > don't yet have the exact commit that is responsible for it, but
> > > > > > > it is either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > > > related) or
> > > > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op
> > > optimization).
> > > > > > >
> > > > > > > On 2019/06/20 06:36:22, Lai Wei  wrote:
> > > > > > > > Dear MXNet community,
> > > > > > > 

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-20 Thread Lai Wei
Hi,

Could you share which test failed and what’s the crash? How to reproduce it?

I was able to install sockeye and run all tests passed. Using
python setup.py test

I have tested both nightly pip package and 1.5.0.rc1

It would be great to create an issue with reproducible steps and move the
discussion there.

Also I see sockeye nightly build[1] has been failing for some time, if it’s
due to MXNet change, please raise this early so we can track and solve it
in time rather than block the release during vote time.

[1] https://travis-ci.org/awslabs/sockeye


On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian 
wrote:

> I was able to reproduce a crash with the commit
> 09202f7f261954383aa387144524d38f83f18d06 but not with the commit
> a862270beb2d796c1ba311183f7f4a766a18ad6c.
>
> Anirudh
>
> On Thu, Jun 20, 2019 at 3:53 PM Lai Wei  wrote:
>
> > Hi Przemyslaw,
> >
> > Is there an issue with more details to track the problem?
> >
> >
> > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak 
> > wrote:
> >
> > > -1
> > >
> > > There is a crash in sockeye unit test (python setup.py test) observed
> > > starting with nightly 1.5 build from 6/13 and still occuring in
> 1.5rc1. I
> > > don't yet have the exact commit that is responsible for it, but it is
> > > either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack related) or
> > > 09202f7f261954383aa387144524d38f83f18d06 (cached op optimization).
> > >
> > > On 2019/06/20 06:36:22, Lai Wei  wrote:
> > > > Dear MXNet community,
> > > >
> > > > This is the 3-day vote to release Apache MXNet (incubating) version
> > > 1.5.0.
> > > > Voting on dev@ will start June 19, 23:59:59(PST)  and close on June
> > 22,
> > > > 23:59:59.
> > > >
> > > > 1) Link to release notes:
> > > >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes
> > > >
> > > >
> > > > 2) Link to release candidate:
> > > >
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc1
> > > >
> > > >
> > > > 3) Link to source and signatures on apache dist server:
> > > >
> > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc1/
> > > >
> > > >
> > > > Please remember to TEST first before voting accordingly:
> > > >
> > > > +1 = approve
> > > > +0 = no opinion
> > > > -1 = disapprove (provide reason)
> > > > --
> > > > Best Regards
> > > >
> > > > Lai
> > > >
> > >
> > --
> > Best Regards
> >
> > Lai
> >
>
-- 
Best Regards

Lai


Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-20 Thread Lai Wei
Hi Przemyslaw,

Is there an issue with more details to track the problem?


On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak 
wrote:

> -1
>
> There is a crash in sockeye unit test (python setup.py test) observed
> starting with nightly 1.5 build from 6/13 and still occuring in 1.5rc1. I
> don't yet have the exact commit that is responsible for it, but it is
> either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack related) or
> 09202f7f261954383aa387144524d38f83f18d06 (cached op optimization).
>
> On 2019/06/20 06:36:22, Lai Wei  wrote:
> > Dear MXNet community,
> >
> > This is the 3-day vote to release Apache MXNet (incubating) version
> 1.5.0.
> > Voting on dev@ will start June 19, 23:59:59(PST)  and close on June 22,
> > 23:59:59.
> >
> > 1) Link to release notes:
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes
> >
> >
> > 2) Link to release candidate:
> >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc1
> >
> >
> > 3) Link to source and signatures on apache dist server:
> >
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc1/
> >
> >
> > Please remember to TEST first before voting accordingly:
> >
> > +1 = approve
> > +0 = no opinion
> > -1 = disapprove (provide reason)
> > --
> > Best Regards
> >
> > Lai
> >
>
-- 
Best Regards

Lai


[VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-19 Thread Lai Wei
Dear MXNet community,

This is the 3-day vote to release Apache MXNet (incubating) version 1.5.0.
Voting on dev@ will start June 19, 23:59:59(PST)  and close on June 22,
23:59:59.

1) Link to release notes:
https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes


2) Link to release candidate:

https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc1


3) Link to source and signatures on apache dist server:

https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc1/


Please remember to TEST first before voting accordingly:

+1 = approve
+0 = no opinion
-1 = disapprove (provide reason)
-- 
Best Regards

Lai


Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-12 Thread Lai Wei
Hi @dev,

I am canceling the vote as the issue Lin discovered require a fix[1] and
the solution is not ready yet.
It's a general problem when building from source with MXNet, not only
impacting horovod use cases.  Any help is appreciated.

Other issues we are tracking:
1. Regression on hybridize with static_alloc. (not a blocker for now)
2. Scala doc issue [2], already merged in master, need to backport to 1.5.x

Thanks for everyone's help! Please let us know if there is any other issue
with 1.5.0

[1] https://github.com/apache/incubator-mxnet/pull/15213
[2] https://github.com/apache/incubator-mxnet/pull/15216



Best Regards

Lai


On Tue, Jun 11, 2019 at 5:04 PM Pedro Larroy 
wrote:

> Tested with CPU, 2.6x slower. comparing master vs 1.4.1.
>
> Looks like a general regression.
>
>
> On Tue, Jun 11, 2019 at 2:31 PM Lai Wei  wrote:
> >
> > Hi guys,
> >
> > Thanks for the updates. Currently, we are able to confirm Lin's issue
> with
> > Horovod, and there is a fix pending. [1]
> > Will update later today to see if we need to cancel this vote for the
> fix.
> >
> > As for the hybridize with static alloc performance regression. IMO it
> does
> > not need to be a blocker if we have the following speed order.
> > 1.5.0 w/o static > 1.5.0 w/ static  > 1.4.1 w/ static > 1.4.1 w/o static
> > and it will be great to know the following to better make a decision on
> > whether this should block the release.
> > 1) if this is a model specific or a general regression.
> > 2) if this is platform specific or general (w/ or w/o CUDA, w/ or w/o
> > MKLDNN)
> >
> >
> > [1]https://github.com/apache/incubator-mxnet/pull/15213
> >
> >
> > Thanks
> >
> > Best Regards
> >
> > Lai
> >
> >
> > On Tue, Jun 11, 2019 at 1:46 PM Zhi Zhang  wrote:
> >
> > >
> > >
> > > On 2019/06/11 18:53:56, Pedro Larroy 
> > > wrote:
> > > > The stack trace doesn't seem to come from MXNet, do you have more
> info?
> > > >
> > > > On Tue, Jun 11, 2019 at 11:46 AM Zhi Zhang 
> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 2019/06/11 17:36:09, Pedro Larroy  >
> > > wrote:
> > > > > > A bit more background into this:
> > > > > >
> > > > > > While tuning a model using LSTM and convolutions we find that
> using
> > > > > > hybridize with static_alloc and static_shape is 15% slower in the
> > > > > > latest revision vs in version 1.4.1 in which using hybridize with
> > > > > > static_alloc and static_shape is 10% faster than without.
> > > > > >
> > > > > > Overwall we are still 33% faster when comparing master to 1.5.
> > > > > >
> > > > > > Let me know if you think this is a release blocker or not.
> > > > > >
> > > > > > Pedro.
> > > > > >
> > > > > > On Mon, Jun 10, 2019 at 4:51 PM Pedro Larroy
> > > > > >  wrote:
> > > > > > >
> > > > > > > -1
> > > > > > >
> > > > > > > We found a performance regression vs 1.4 related to CachedOp
> which
> > > > > > > affects Hybrid forward, which we are looking into.
> > > > > > >
> > > > > > > Pedro.
> > > > > > >
> > > > > > > On Mon, Jun 10, 2019 at 4:33 PM Lin Yuan 
> > > wrote:
> > > > > > > >
> > > > > > > > -1 (Tentatively until resolved)
> > > > > > > >
> > > > > > > > I tried to build MXNet 1.5.0 from source and pip install
> horovod
> > > but got
> > > > > > > > the following error:
> > > > > > > >
> > > > > > > > Reproduce:
> > > > > > > > 1) cp make/config.mk .
> > > > > > > > 2) turn on USE_CUDA, USE_CUDNN, USE_NCCL
> > > > > > > > 3) make -j
> > > > > > > >
> > > > > > > > MXNet can build successfully.
> > > > > > > >
> > > > > > > > 4) pip install horovod
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > >
> /home/ubuntu/src/incubator-mxnet/python/mxnet/../../include/mkldnn/mkldnn.h:55:28:
> > > > > > > >

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-11 Thread Lai Wei
Hi guys,

Thanks for the updates. Currently, we are able to confirm Lin's issue with
Horovod, and there is a fix pending. [1]
Will update later today to see if we need to cancel this vote for the fix.

As for the hybridize with static alloc performance regression. IMO it does
not need to be a blocker if we have the following speed order.
1.5.0 w/o static > 1.5.0 w/ static  > 1.4.1 w/ static > 1.4.1 w/o static
and it will be great to know the following to better make a decision on
whether this should block the release.
1) if this is a model specific or a general regression.
2) if this is platform specific or general (w/ or w/o CUDA, w/ or w/o
MKLDNN)


[1]https://github.com/apache/incubator-mxnet/pull/15213


Thanks

Best Regards

Lai


On Tue, Jun 11, 2019 at 1:46 PM Zhi Zhang  wrote:

>
>
> On 2019/06/11 18:53:56, Pedro Larroy 
> wrote:
> > The stack trace doesn't seem to come from MXNet, do you have more info?
> >
> > On Tue, Jun 11, 2019 at 11:46 AM Zhi Zhang  wrote:
> > >
> > >
> > >
> > > On 2019/06/11 17:36:09, Pedro Larroy 
> wrote:
> > > > A bit more background into this:
> > > >
> > > > While tuning a model using LSTM and convolutions we find that using
> > > > hybridize with static_alloc and static_shape is 15% slower in the
> > > > latest revision vs in version 1.4.1 in which using hybridize with
> > > > static_alloc and static_shape is 10% faster than without.
> > > >
> > > > Overwall we are still 33% faster when comparing master to 1.5.
> > > >
> > > > Let me know if you think this is a release blocker or not.
> > > >
> > > > Pedro.
> > > >
> > > > On Mon, Jun 10, 2019 at 4:51 PM Pedro Larroy
> > > >  wrote:
> > > > >
> > > > > -1
> > > > >
> > > > > We found a performance regression vs 1.4 related to CachedOp which
> > > > > affects Hybrid forward, which we are looking into.
> > > > >
> > > > > Pedro.
> > > > >
> > > > > On Mon, Jun 10, 2019 at 4:33 PM Lin Yuan 
> wrote:
> > > > > >
> > > > > > -1 (Tentatively until resolved)
> > > > > >
> > > > > > I tried to build MXNet 1.5.0 from source and pip install horovod
> but got
> > > > > > the following error:
> > > > > >
> > > > > > Reproduce:
> > > > > > 1) cp make/config.mk .
> > > > > > 2) turn on USE_CUDA, USE_CUDNN, USE_NCCL
> > > > > > 3) make -j
> > > > > >
> > > > > > MXNet can build successfully.
> > > > > >
> > > > > > 4) pip install horovod
> > > > > >
> > > > > >
> > > > > >
> /home/ubuntu/src/incubator-mxnet/python/mxnet/../../include/mkldnn/mkldnn.h:55:28:
> > > > > > fatal error: mkldnn_version.h: No such file or directory
> > > > > > compilation terminated.
> > > > > > INFO: Unable to build MXNet plugin, will skip it.
> > > > > >
> > > > > > I did not change any setting of MKLDNN in my config.mk. I am
> building on
> > > > > > DLAMI base 18.0 which is Ubuntu 16.04 and CUDA 10.0
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Lin
> > > > > >
> > > > > >
> > > > > > On Sat, Jun 8, 2019 at 5:39 PM shiwen hu 
> wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Lai Wei  于2019年6月9日周日 上午4:12写道:
> > > > > > >
> > > > > > > > Dear MXNet community,
> > > > > > > >
> > > > > > > > This is the 3-day vote to release Apache MXNet (incubating)
> version
> > > > > > > 1.5.0.
> > > > > > > > Voting on dev@ will start June 8, 23:59:59(PST)  and close
> on June 11,
> > > > > > > > 23:59:59.
> > > > > > > >
> > > > > > > > 1) Link to release notes:
> > > > > > > >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes
> > > > > > > >
> > > > > > > > 2) Link to release candidate:
> > > > > > > >
> > > > > > > >
> https://github.com/apache/incubato

[VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-08 Thread Lai Wei
Dear MXNet community,

This is the 3-day vote to release Apache MXNet (incubating) version 1.5.0.
Voting on dev@ will start June 8, 23:59:59(PST)  and close on June 11,
23:59:59.

1) Link to release notes:
https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes

2) Link to release candidate:

https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc0

3) Link to source and signatures on apache dist server:

https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc0/


Please remember to TEST first before voting accordingly:
+1 = approve
+0 = no opinion
-1 = disapprove (provide reason)


Best Regards

Lai


Re: [DISCUSS] 1.5.0 Release Plan

2019-06-04 Thread Lai Wei
Hi dev@,

Here are the updated release tracker and the timeline for 1.5.0 release:
https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status
We are working on 2 last blockers on nightly test failure[1][2] and would
like to fix them by 06/04 so we can start to tag 1.5.0

For [1] I am in favor of disabling the test if we can't fix in time,
similar to what we did on CPU failure in [4].
Large tensor tests require a lot of memory and we don't have a good
solution on what's the best way to test them yet. [5]

For [2], it should be fixed by [3]

[1] large tensor nightly GPU failure:
https://github.com/apache/incubator-mxnet/issues/14981
[2] AMP tutorial nightly GPU failure:
https://github.com/apache/incubator-mxnet/issues/15028
[3] nightly fix: https://github.com/apache/incubator-mxnet/pull/15141
[4] large tensor nightly CPU:
https://github.com/apache/incubator-mxnet/issues/14980
[5] testability of large tensor discussion:
https://lists.apache.org/thread.html/d7f397c3f32196cb66ad9deae55dcf9a06dda56b37cbb0399ea1799f@%3Cdev.mxnet.apache.org%3E

Any help will be appreciated, thanks a lot!


Best Regards

Lai


On Fri, May 31, 2019 at 2:31 PM Haibin Lin  wrote:

> Hi dev@,
>
> Quick update on the gluonnlp issue. Lai and I worked together to test
> gluonnlp and MXNet with different configurations, and found that the use of
> GELU operator in fp16 is causing the divergence. It was a very recent
> change in gluonnlp, and it can be avoided by reverting the change in
> GluonNLP. This doesn't block 1.5 release anymore.
>
> Best,
> Haibin
>
> On Thu, May 30, 2019 at 11:33 AM Lai Wei  wrote:
>
> > Hi dev@,
> >
> > Quick update on the 1.5.0 release, all previous tracked PRs have been
> > merged and CI is back to normal again, please rebase your PR.
> > Again, I would like to encourage downstream projects to test against
> latest
> > MXNet now to discover bugs and regressions early, really appreciate your
> > help.
> >
> > We still have 3 new open issues/PRs to track:
> > 1. Gluon NLP BERT training Haibin mentioned
> > 2. https://github.com/apache/incubator-mxnet/pull/15039
> > 3. https://github.com/apache/incubator-mxnet/pull/15097
> >
> > Thanks!
> >
> > Best Regards
> >
> > Lai
> >
> >
> > On Tue, May 28, 2019 at 9:32 AM Haibin Lin 
> > wrote:
> >
> > > Hi dev@,
> > >
> > > I was testing GluonNLP with MXNet master, and found that BERT training
> > > crashes a few hours after I launch the job. I can confirm that MXNet
> pip
> > > package 20190412 works fine. I am bisecting changes in MXNet/GluonNLP
> to
> > > check what causes the problem. I'll send an update as soon as I find
> the
> > > root cause, or if I find any workaround.
> > >
> > > Thanks,
> > > Haibin
> > >
> > > On Thu, May 23, 2019 at 2:12 AM Lin Yuan  wrote:
> > >
> > > > Hi Lai,
> > > >
> > > > One important PR that is currently blocked by a Flaky TensorRT test:
> > > >
> > > > https://github.com/apache/incubator-mxnet/pull/15041
> > > >
> > > > I have retriggered it several times. If it fails again, I may need CI
> > > team
> > > > to help disable this test. It has been reported by multiple people:
> > > > https://github.com/apache/incubator-mxnet/issues/14978
> > > >
> > > > Thanks,
> > > >
> > > > Lin
> > > >
> > > > On Wed, May 22, 2019 at 11:38 PM Zhao, Patric  >
> > > > wrote:
> > > >
> > > > > Thanks, Lai.
> > > > >
> > > > > With the great helps from the community, all PRs listed in the
> > roadmap
> > > > are
> > > > > done :)
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642
> > > > >
> > > > > Update the status of the below list
> > > > >
> > > > >  - [1] PR#14713 is almost done and wait for internal validation
> > results
> > > > >  - [2] PR#14893 is merged
> > > > >  - [3] PR#15031 is merged
> > > > >  - [7] PR#15038 new PR to fix the bug in C++ interface, will be
> > merged
> > > > > soon after the review.
> > > > >
> > > > > Feel free to let me know if anything our team can help :)
> > > > >
> > > > > BR,
> > > > >
> > > > > --Patric
> > > > >
> 

Re: [DISCUSS] 1.5.0 Release Plan

2019-05-30 Thread Lai Wei
Hi dev@,

Quick update on the 1.5.0 release, all previous tracked PRs have been
merged and CI is back to normal again, please rebase your PR.
Again, I would like to encourage downstream projects to test against latest
MXNet now to discover bugs and regressions early, really appreciate your
help.

We still have 3 new open issues/PRs to track:
1. Gluon NLP BERT training Haibin mentioned
2. https://github.com/apache/incubator-mxnet/pull/15039
3. https://github.com/apache/incubator-mxnet/pull/15097

Thanks!

Best Regards

Lai


On Tue, May 28, 2019 at 9:32 AM Haibin Lin  wrote:

> Hi dev@,
>
> I was testing GluonNLP with MXNet master, and found that BERT training
> crashes a few hours after I launch the job. I can confirm that MXNet pip
> package 20190412 works fine. I am bisecting changes in MXNet/GluonNLP to
> check what causes the problem. I'll send an update as soon as I find the
> root cause, or if I find any workaround.
>
> Thanks,
> Haibin
>
> On Thu, May 23, 2019 at 2:12 AM Lin Yuan  wrote:
>
> > Hi Lai,
> >
> > One important PR that is currently blocked by a Flaky TensorRT test:
> >
> > https://github.com/apache/incubator-mxnet/pull/15041
> >
> > I have retriggered it several times. If it fails again, I may need CI
> team
> > to help disable this test. It has been reported by multiple people:
> > https://github.com/apache/incubator-mxnet/issues/14978
> >
> > Thanks,
> >
> > Lin
> >
> > On Wed, May 22, 2019 at 11:38 PM Zhao, Patric 
> > wrote:
> >
> > > Thanks, Lai.
> > >
> > > With the great helps from the community, all PRs listed in the roadmap
> > are
> > > done :)
> > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642
> > >
> > > Update the status of the below list
> > >
> > >  - [1] PR#14713 is almost done and wait for internal validation results
> > >  - [2] PR#14893 is merged
> > >  - [3] PR#15031 is merged
> > >  - [7] PR#15038 new PR to fix the bug in C++ interface, will be merged
> > > soon after the review.
> > >
> > > Feel free to let me know if anything our team can help :)
> > >
> > > BR,
> > >
> > > --Patric
> > >
> > > > -Original Message-
> > > > From: Lai Wei [mailto:roywei...@gmail.com]
> > > > Sent: Thursday, May 23, 2019 6:05 AM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Subject: Re: [DISCUSS] 1.5.0 Release Plan
> > > >
> > > > Hi @dev,
> > > >
> > > > Thanks for working hard for the 1.5 release, since there has been
> > several
> > > > release blockers (mostly fixed). We are extending the code freeze to
> > > Friday
> > > > 05/22/2019. Right now we are tracking the following 5 open
> > > PRs[1][2][3][4][5]
> > > > and 1 issue[6]. Please let us know if you need more time.
> > > >
> > > > I would like to encourage all downstream projects to test with latest
> > > MXNet
> > > > to avoid any incompatibility in the coming 1.5.0 release. If you have
> > any
> > > > issues that may block the release, please let us know.
> > > > Thank you very much.
> > > >
> > > > [1] https://github.com/apache/incubator-mxnet/pull/14713
> > > > [2] https://github.com/apache/incubator-mxnet/pull/14893
> > > > [3] https://github.com/apache/incubator-mxnet/pull/15031
> > > > [4] https://github.com/apache/incubator-mxnet/pull/15039
> > > > [5] https://github.com/apache/incubator-mxnet/pull/15041
> > > > [6] https://github.com/apache/incubator-mxnet/issues/15034
> > > >
> > > >
> > > > Best Regards
> > > >
> > > > Lai
> > > >
> > > >
> > > > On Wed, May 15, 2019 at 9:05 PM Junru Shao 
> > > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > Here I may have a release blocker for 1.5.0 about implementation of
> > > > > dynamic shape mechanism, which somehow conflicts with Gluon's
> > > > deferred
> > > > > initialization [1].
> > > > >
> > > > > [1] https://github.com/dmlc/gluon-nlp/issues/706
> > > > >
> > > > > On Wed, May 15, 2019 at 12:09 PM Anirudh Subramanian <
> > > > > anirudh2...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Lai,

Re: [DISCUSS] 1.5.0 Release Plan

2019-05-22 Thread Lai Wei
Hi @dev,

Thanks for working hard for the 1.5 release, since there has been several
release blockers (mostly fixed). We are extending the code freeze to Friday
05/22/2019. Right now we are tracking the following 5 open
PRs[1][2][3][4][5] and 1 issue[6]. Please let us know if you need more
time.

I would like to encourage all downstream projects to test with latest MXNet
to avoid any incompatibility in the coming 1.5.0 release. If you have any
issues that may block the release, please let us know.
Thank you very much.

[1] https://github.com/apache/incubator-mxnet/pull/14713
[2] https://github.com/apache/incubator-mxnet/pull/14893
[3] https://github.com/apache/incubator-mxnet/pull/15031
[4] https://github.com/apache/incubator-mxnet/pull/15039
[5] https://github.com/apache/incubator-mxnet/pull/15041
[6] https://github.com/apache/incubator-mxnet/issues/15034


Best Regards

Lai


On Wed, May 15, 2019 at 9:05 PM Junru Shao  wrote:

> Hi folks,
>
> Here I may have a release blocker for 1.5.0 about implementation of dynamic
> shape mechanism, which somehow conflicts with Gluon's deferred
> initialization [1].
>
> [1] https://github.com/dmlc/gluon-nlp/issues/706
>
> On Wed, May 15, 2019 at 12:09 PM Anirudh Subramanian <
> anirudh2...@gmail.com>
> wrote:
>
> > Hi Lai,
> >
> > From the discussion I had with Nvidia offline they are targeting on
> pushing
> > the required changes today.
> > Since this is important feature for the release, if this gets delayed and
> > cannot  be merged by 05/17/2019,
> > the code freeze date may need to be changed.
> >
> > Anirudh
> >
> > On Wed, May 15, 2019 at 1:23 AM Lv, Tao A  wrote:
> >
> > > Hi dev,
> > >
> > > We see there are several github issues [1][2][3][4] about mxnet windows
> > > build experience. The team is working intensively [5][6][7] on that to
> > fix
> > > some problems of MKL-DNN build on windows. We hope these fixes can
> catch
> > > the code freeze and finally enter the 1.5.0 release.
> > >
> > > The PR against mshadow (#374) was already merged and MXNet PR #14877 is
> > > under review - great thanks to CI team for helping on the MKL
> > installation
> > > request. PR #14952 is document change according to build logic changes
> in
> > > PR #14877. So I think these two PRs should be merged simultaneously.
> > > Currently #14877 is experiencing a CI response problem.
> > >
> > > Please take your time to have a look at these two PRs. Your comments
> and
> > > suggestions are highly appreciated.
> > >
> > > Thanks,
> > > -tao
> > >
> > > [1] https://github.com/apache/incubator-mxnet/issues/14670
> > > [2] https://github.com/apache/incubator-mxnet/issues/14335
> > > [3] https://github.com/apache/incubator-mxnet/issues/14203
> > > [4] https://github.com/apache/incubator-mxnet/issues/14085
> > > [5] https://github.com/apache/incubator-mxnet/pull/14877
> > > [6] https://github.com/dmlc/mshadow/pull/374
> > > [7] https://github.com/apache/incubator-mxnet/pull/14952
> > >
> > > -Original Message-
> > > From: Lai Wei [mailto:roywei...@gmail.com]
> > > Sent: Wednesday, May 15, 2019 2:57 PM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: [DISCUSS] 1.5.0 Release Plan
> > >
> > > Hi Anirudh,
> > >
> > > I see there was an offline disucssion
> > > <
> > >
> >
> https://github.com/apache/incubator-mxnet/pull/14173#pullrequestreview-235846341
> > > >
> > > and I have updated the AMP feature and your project on the release
> > tracker
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status
> > > >
> > > ,
> > > Please let me know if you have any updates.
> > >
> > > Hi @dev,
> > > This is a gentle reminder that  the code freeze for 1.5.0 release is on
> > > 05/17/2019, please let us know if you have any WIP pull requests aiming
> > for
> > > 1.5.0 that needs attention.
> > > Please understand we already have around 650 commits in master that
> need
> > > to be released in time. We understand TensorRT test in CI is failing
> and
> > > are trying to fix it. Meanwhile please update the tracker if there is
> any
> > > change:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status
> > >
> > > Thanks!
> > >
> > > Lai
> > >
> > >

Re: [DISCUSS] 1.5.0 Release Plan

2019-05-15 Thread Lai Wei
Hi Anirudh,

I see there was an offline disucssion
<https://github.com/apache/incubator-mxnet/pull/14173#pullrequestreview-235846341>
and I have updated the AMP feature and your project on the release tracker
<https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status>
,
Please let me know if you have any updates.

Hi @dev,
This is a gentle reminder that  the code freeze for 1.5.0 release is on
05/17/2019, please let us know if you have any WIP pull requests aiming for
1.5.0 that needs attention.
Please understand we already have around 650 commits in master that need to
be released in time. We understand TensorRT test in CI is failing and are
trying to fix it. Meanwhile please update the tracker if there is any
change:
https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status

Thanks!

Lai


On Wed, May 8, 2019 at 11:58 AM Anirudh Subramanian 
wrote:

> Hi Sheng,
>
> I had a discussion with nvidia folks offline today (@ptrendx et. al.). I
> strongly feel that the AMP feature should be included as part of the
> release: https://github.com/apache/incubator-mxnet/pull/14173 .
> The PR is aimed for completion for next week but reviews and RFC
> discussions may take some time. I would request to extend the release code
> freeze by 2 weeks.
> Also, I would like to include
>
> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32+to+Mixed+Precision+Models
> which
> depends on the AMP PR.
> I am also aiming for adding a PR by this week end or early next week, but
> reviews will take longer than May 17th.
>
> Anirudh
>
>
> On Mon, May 6, 2019 at 11:49 PM Sheng Zha  wrote:
>
> > Hi,
> >
> > While 1.4.1 vote on general@incubator is still on going, I’d like to
> > propose that we start preparing 1.5.0 release.
> >
> > 1.5.0 will include changes that dates back to last year and there has
> been
> > a lot of new features and improvements in it, so it will likely time us
> > more time to prepare than 1.4.1. I propose the following timeline:
> > - Cut release branch: release branch already cut. Will sync with master
> > branch on 5/15/2019 EOD.
> > - Code freeze: 5/17/2019. No more changes unless the release branch is in
> > a broken state.
> > - Tag and vote: 5/20/2019 onward.
> >
> > Lai Wei (roywei@) expressed to me offline that he’s willing to help
> drive
> > this release as release manager, and I’m happy to help again as
> committer.
> >
> > If you have features in progress that you’d like to include in 1.5.0:
> > - Add your feature to the scope:
> >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+and+Status
> > - Indicate in this thread:
> >   - how confident you are about making it happen before the code freeze.
> > If not confident, provide estimate for a more manageable code freeze date
> > so that people can discuss whether to extend the deadline or to skip one
> > release for it.
> > - whether your PR requires more attention to make it happen.
> >
> > Thanks for your attention. Comments and suggestions are also welcome.
> >
> > -sz
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.4.1.rc0

2019-05-02 Thread Lai Wei
+1

Built from source and tested keras-mxnet working fine.

Best Regards

Lai


On Wed, May 1, 2019 at 4:22 PM Carin Meier  wrote:

> + 1 (binding)
>
> Built Scala/ Clojure and ran tests
>
> On Wed, May 1, 2019 at 7:06 PM Aaron Markham 
> wrote:
>
> > Make that +1 (non-binding)
> >
> > On Wed, May 1, 2019 at 3:42 PM Aaron Markham 
> > wrote:
> > >
> > > +1 (binding)
> > >
> > > * Built with GPU and tested the first part of the ssd example.
> > > * Built with GPU / cross-compiled to arm8 for Jetson.
> > > * Built Scala/Java on top of the cross-compiled arm8 (ran into trouble
> > > here, but I think this is not popular enough yet to derail things,
> > > plus there are workarounds)
> > > * Built on CPU instance and tested docs.
> > > http://34.201.8.176/versions/1.4.1/api/python/io/io.html
> > > I don't see anything specific being different in this patch for docs,
> > > so hard to tell if there's an issue. I'll assume not given the
> > > successful generation of the API docs.
> > >
> > >
> > > On Wed, May 1, 2019 at 1:28 PM Pedro Larroy
> > >  wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Tried CPU build + C++ tests + 714 Python unit tests in 605s.
> > > > ARMv7 build + small unit test in QEMU + ARMv8 builds.
> > > >
> > > > Thanks. Regards
> > > >
> > > > Pedro.
> > > >
> > > > On Wed, May 1, 2019 at 10:41 AM Qing Lan 
> wrote:
> > > > >
> > > > > +1 (binding)
> > > > >
> > > > > build from source works for OSX and Ubuntu CPU
> > > > > Scala build/test successfully with Dynamic link and static link.
> > > > >
> > > > > Thanks,
> > > > > Qing
> > > > >
> > > > > 
> > > > > From: Sheng Zha 
> > > > > Sent: Wednesday, May 1, 2019 13:14
> > > > > To: d...@mxnet.apache.org
> > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > 1.4.1.rc0
> > > > >
> > > > > Hi all,
> > > > >
> > > > > Reminder that the vote for 1.4.1 release is still ongoing. If you
> > can, please help out. Thank you.
> > > > >
> > > > > -sz
> > > > >
> > > > > On 2019/04/30 06:51:45, Junru Shao 
> wrote:
> > > > > > Dear MXNet community,
> > > > > >
> > > > > > This is the 3-day vote to release Apache MXNet (incubating)
> > version v1.4.1.
> > > > > > The voting on dev@ list will start Apr 29 23:59:59 (PST) and
> > close on May
> > > > > > 02 23:59:59.
> > > > > >
> > > > > > Below are links to
> > > > > > 1) Release notes:
> > > > > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.1+Release+Notes
> > > > > > .
> > > > > > 2) Release Candidate:
> > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.4.1.rc0
> .
> > > > > > 3) Source and signatures on Apache dist server:
> > > > > >
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.4.1.rc0/.
> > > > > >
> > > > > > Please remember to TEST first before voting accordingly:
> > > > > > +1 = approve
> > > > > > +0 = no opinion
> > > > > > -1 = disapprove (provide reason)
> > > > > >
> > > > > > Best regards,
> > > > > > Junru Shao
> > > > > >
> >
>


Re: [Announcement] New Committer -- Lin Yuan

2019-02-03 Thread Lai Wei
Congrats Lin!

On Sun, Feb 3, 2019 at 9:13 AM Yuxi Hu  wrote:

> Congrats Lin!
>
> On Sun, Feb 3, 2019 at 7:41 AM Lv, Tao A  wrote:
>
> > Congratulations Lin!
> >
> > -Original Message-
> > From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> > Sent: Sunday, February 3, 2019 3:10 PM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: [Announcement] New Committer -- Lin Yuan
> >
> > Congrats Lin!  Well deserved.
> >
> > On Sat, Feb 2, 2019 at 11:05 PM Marco de Abreu 
> > wrote:
> >
> > > Congratulations, welcome!
> > >
> > > Am So., 3. Feb. 2019, 04:04 hat Chaitanya Bapat 
> > > geschrieben:
> > >
> > > > Congratulations Lin! Way to go!
> > > >
> > > > On Sat, 2 Feb 2019 at 19:39, sandeep krishnamurthy <
> > > > sandeep.krishn...@gmail.com> wrote:
> > > >
> > > > > Welcome Lin :-)
> > > > >
> > > > > On Sat, Feb 2, 2019, 3:28 PM Yuan Tang  > wrote:
> > > > >
> > > > > > Welcome Lin!
> > > > > >
> > > > > > On Sat, Feb 2, 2019 at 6:27 PM Tianqi Chen
> > > > > >  > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Dear Community:
> > > > > > >
> > > > > > > Please join me to welcome Lin Yuan(@apeforest) as a new
> > > > > > > committer
> > > of
> > > > > > > Apache(incubating) MXNet!
> > > > > > >
> > > > > > > He has contributed to various improvements, including better
> > > > > > compatibility
> > > > > > > of larger arrays across the codebase.
> > > > > > >
> > > > > > > Commits:
> > > > > > > https://github.com/apache/incubator-mxnet/commits?author=apefo
> > > > > > > rest
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93&q=is%3A
> > > pr+author%3Aapeforest
> > > > > > >
> > > > > > >
> > > > > > > Reviews:
> > > > > > > https://github.com/apache/incubator-mxnet/pulls?utf8=%
> > > > > > > E2%9C%93&q=reviewed-by%3Aapeforest
> > > > > > >
> > > > > > > dev@ activitivity
> > > > > > >
> > > > >
> > > https://lists.apache.org/list.html?*@mxnet.apache.org:lte=6M:Lin%20Yua
> > > n
> > > > > > >
> > > > > > > Tianqi
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > *Chaitanya Prakash Bapat*
> > > > *+1 (973) 953-6299*
> > > >
> > > > [image: https://www.linkedin.com//in/chaibapat25]
> > > > [image:
> > > https://www.facebook.com/chaibapat
> > > > ]
> > > > [image:
> > > > https://twitter.com/ChaiBapchya]  > > >[image:
> > > > https://www.linkedin.com//in/chaibapat25]
> > > > 
> > > >
> > >
> >
>
>
> --
> Yuxi(Darren) Hu, Ph.D.
> Software Development Engineer
> Amazon Web Services
>
-- 
Best Regards

Lai


Re: MXNet - Gluon - Audio

2018-11-13 Thread Lai Wei
Hi Gaurav,

Thanks for starting this. I see the PR is out
, left some initial
reviews, good work!

In addition to Sandeep's queries, I have the following:
1. Can we include some simple classic audio dataset for users to directly
import and try out? like MNIST in vision. (e.g.:
http://pytorch.org/audio/datasets.html#yesno)
2. Librosa provides some good audio feature extractions, we can use it for
now. But it's slow as you have to do conversions between ndarray and numpy.
In the long term, can we make transforms to use mxnet operators and change
your transforms to hybrid blocks? For example, mxnet FFT

operator
can be used in a hybrid block transformer, which will be a lot faster.

Some additional references on users already using mxnet on audio, we should
aim to make it easier and automate the file load/preprocess/transform
process.
1. https://github.com/chen0040/mxnet-audio
2. https://github.com/shuokay/mxnet-wavenet

Looking forward to seeing this feature out.
Thanks!

Best Regards

Lai


On Tue, Nov 13, 2018 at 9:09 AM sandeep krishnamurthy <
sandeep.krishn...@gmail.com> wrote:

> Thanks, Gaurav for starting this initiative. The design document is
> detailed and gives all the information.
> Starting to add this in "Contrib" is a good idea while we expect a few
> rough edges and cleanups to follow.
>
> I had the following queries:
> 1. Is there any analysis comparing LibROSA with other libraries? w.r.t
> features, performance, community usage in audio data domain.
> 2. What is the recommendation of LibROSA dependency? Part of MXNet PyPi or
> ask the user to install if required? I prefer the latter, similar to
> protobuf in ONNX-MXNet.
> 3. I see LibROSA is a fully Python-based library. Are we getting blocked on
> the dependency for future use cases when we want to make transformations as
> operators and allow for cross-language support?
> 4. In performance design considerations, with lazy=True / False the
> performance difference is too scary ( 8 minutes to 4 hours!!) This requires
> some more analysis. If we known turning a flag off/on has 24X performance
> degradation, should we need to provide that control to user? What is the
> impact of this on Memory usage?
> 5. I see LibROSA has ISC license (
> https://github.com/librosa/librosa/blob/master/LICENSE.md) which says free
> to use with same license notification. I am not sure if this is ok. I
> request other committers/mentors to suggest.
>
> Best,
> Sandeep
>
> On Fri, Nov 9, 2018 at 5:45 PM Gaurav Gireesh 
> wrote:
>
> > Dear MXNet Community,
> >
> > I recently started looking into performing some simple sound multi-class
> > classification tasks with Audio Data and realized that as a user, I would
> > like MXNet to have an out of the box feature which allows us to load
> audio
> > data(at least 1 file format), extract features( or apply some common
> > transforms/feature extraction) and train a model using the Audio Dataset.
> > This could be a first step towards building and supporting APIs similar
> to
> > what we have for "vision" related use cases in MXNet.
> >
> > Below is the design proposal :
> >
> > Gluon - Audio Design Proposal
> > 
> >
> > I would highly appreciate your taking time to review and provide
> feedback,
> > comments/suggestions on this.
> > Looking forward to your support.
> >
> >
> > Best Regards,
> >
> > Gaurav Gireesh
> >
>
>
> --
> Sandeep Krishnamurthy
>


Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-06 Thread Lai Wei
Hi Anton,

Thanks for driving this, I would like to include the following fix in
1.3.1:
Allow infer shape partial on foreach operator:
https://github.com/apache/incubator-mxnet/pull/12471

Keras-MXNet needs this functionality to infer shape partially
on foreach operator. (Used in RNN operators)

Thanks a lot!


Best Regards
Lai Wei



On Tue, Nov 6, 2018 at 10:44 AM Haibin Lin  wrote:

> Hi Naveen and Anton,
>
> Thanks for pointing that out. You are right that these are not critical
> fixes. Putting them in 1.4.0 is more appropriate. PRs are closed.
>
> Best,
> Haibin
>
> On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy  wrote:
>
> > Please note that this is a patch release(1.3.1) to address critical
> bugs!,
> > For everything else please wait for 1.4.0 which is planned very shortly
> > after 1.3.1
> >
> > > On Nov 6, 2018, at 7:17 AM, Anton Chernov  wrote:
> > >
> > > The following PR's have been created so far:
> > >
> > > Infer dtype in SymbolBlock import from input symbol (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13117
> > >
> > > [MXNET-953] Fix oob memory read (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13118
> > >
> > > [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13119
> > >
> > > [MXNET-922] Fix memleak in profiler (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13120
> > >
> > > Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13121
> > >
> > > update mshadow (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13122
> > >
> > > CudnnFind() usage improvements (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13123
> > >
> > > Fix lazy record io when used with dataloader and multi_worker > 0
> > (v1.3.x)
> > > https://github.com/apache/incubator-mxnet/pull/13124
> > >
> > >
> > > As stated previously I would be rather opposed to have following PR's
> it
> > in
> > > the patch release:
> > >
> > > Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
> > > https://github.com/apache/incubator-mxnet/pull/13129
> > >
> > > sample_like operators (#13034) v1.3.x
> > > https://github.com/apache/incubator-mxnet/pull/13130
> > >
> > >
> > > Best
> > > Anton
> > >
> > > вт, 6 нояб. 2018 г. в 16:06, Anton Chernov :
> > >
> > >> Hi Haibin,
> > >>
> > >> I have a few comments regarding the proposed performance improvement
> > >> changes.
> > >>
> > >> CUDNN support for LSTM with projection & clipping
> > >> https://github.com/apache/incubator-mxnet/pull/13056
> > >>
> > >> There is no doubt that this change brings value, but I don't see it
> as a
> > >> critical bug fix. I would rather leave it for the next major release.
> > >>
> > >> sample_like operators
> > >> https://github.com/apache/incubator-mxnet/pull/13034
> > >>
> > >> Even if it's related to performance, this is an addition of
> > functionality
> > >> and I would also push this to be in the next major release only.
> > >>
> > >>
> > >> Best
> > >> Anton
> > >>
> > >>
> > >> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov :
> > >>
> > >>> Hi Patric,
> > >>>
> > >>> This change was listed in the 'PR candidates suggested for
> > consideration
> > >>> for v1.3.1 patch release' section [1].
> > >>>
> > >>> You are right, I also think that this is not a critical hotfix change
> > >>> that should be included into the 1.3.1 patch release.
> > >>>
> > >>> Thus I'm not making any further efforts to bring it in.
> > >>>
> > >>> Best
> > >>> Anton
> > >>>
> > >>> [1]
> > >>>
> >
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
> > >>>
> > >>>
> > >>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric :
> > >>>
> > >>>> Hi Anton,
> > >>>>
> > >>>> Thanks for looking into t

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Lai Wei
Hi Anirudh,

Update: Did an install on a fresh instance with USE_MKLDNN=1, works fine
now. Pip install with --pre is also working fine.
Problem is the mkl-dnn I installed on the old instance.
Closing the issue <https://github.com/awslabs/keras-apache-mxnet/issues/75>.

Thanks!

Best Regards

Lai Wei

https://www.linkedin.com/pub/lai-wei/2b/731/52b

On Mon, May 7, 2018 at 2:48 PM, Lai Wei  wrote:

> Hi Anirudh,
>
> yes, also tried that,  didn't resolve. Looking into root cause and will
> update.
>
> Best Regards
>
> Lai Wei
>
> https://www.linkedin.com/pub/lai-wei/2b/731/52b
>
> On Mon, May 7, 2018 at 2:15 PM, Anirudh  wrote:
>
>> Hi Lai,
>>
>> I see that you used USE_MKL2017_EXPERIMENTAL=1, I am not sure if this is
>> the right flag. Did you try USE_MKLDNN=1 ?
>>
>> Anirudh
>>
>> On Mon, May 7, 2018 at 1:22 PM, Lai Wei  wrote:
>>
>> > Hi,
>> >
>> > I would like to raise an issue with mxnet-mkl. The keras-mxnet package
>> was
>> > working fine with mxnet-mkl 1.1.0 for training on CPU. However, weights
>> are
>> > not updated when I use mxnet-mkl 1.2.0b20180507. I tried both 'pip
>> install
>> > mxnet-mkl --pre' and built from source from release branch (v1.2.0) with
>> > mkl flag.
>> >
>> > Please refer to this issue for more details:
>> > https://github.com/awslabs/keras-apache-mxnet/issues/75
>> >
>> > There is no code change on keras-mxnet side, so I guess some API broke
>> when
>> > using latest mxnet-mkl. Still working on finding the root cause.
>> >
>> > Thanks
>> >
>> >
>> > Best Regards
>> >
>> > Lai Wei
>> >
>> > https://www.linkedin.com/pub/lai-wei/2b/731/52b
>> >
>> > On Mon, May 7, 2018 at 10:38 AM, Haibin Lin 
>> > wrote:
>> >
>> > > +1 binding. Build from source with CUDA, ran linear classification
>> > example
>> > > and works fine.
>> > >
>> > > Best.
>> > > Haibin
>> > >
>> > >
>> > > On Sun, May 6, 2018 at 10:08 PM, Steffen Rochel <
>> steffenroc...@gmail.com
>> > >
>> > > wrote:
>> > >
>> > > > +1 (non-binding). Tested with selected notebooks from The Straight
>> > Dope.
>> > > > So many important enhancements everybody contributed and our users
>> are
>> > > > waiting for. Hope we will see more votes.
>> > > > Steffen
>> > > > On Mon, May 7, 2018 at 1:07 AM Anirudh 
>> wrote:
>> > > >
>> > > > > Hi all,
>> > > > >
>> > > > > Since we don't have enough binding votes yet, I am extending the
>> vote
>> > > > till
>> > > > > tomorrow (Monday May 7th), 12:50 PM PDT.
>> > > > >
>> > > > > Anirudh
>> > > > >
>> > > > > On Sun, May 6, 2018 at 4:05 PM, Anirudh 
>> > wrote:
>> > > > >
>> > > > > > Hi Pedro,
>> > > > > >
>> > > > > > Thanks for the clarification. I was able to reproduce the issue
>> > with
>> > > > > > USE_OPENMP=OFF. I wasn't able to reproduce the issue with Make.
>> > Since
>> > > > the
>> > > > > > issue is not reproducible with make and the customers using
>> > > > > USE_OPENMP=OFF
>> > > > > > with cmake should be small, I agree with you that this should
>> not
>> > be
>> > > a
>> > > > > > blocker. I have added the issue to known issues in release
>> notes:
>> > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.2.
>> 0.rc2
>> > > > > >
>> > > > > > Anirudh
>> > > > > >
>> > > > > > On Sun, May 6, 2018 at 9:03 AM, Pedro Larroy <
>> > > > > pedro.larroy.li...@gmail.com
>> > > > > > > wrote:
>> > > > > >
>> > > > > >> Agreed, I was not aware that the problems where not present in
>> the
>> > > > > release
>> > > > > >> branch.
>> > > > > >>
>> > > > > >> On Fri, May 4, 2018 at 8:32 PM, Haibin Lin <
>> > > haibin.lin@gmail.com>
>> > > > > >> wrote:
>> > > 

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Lai Wei
Hi Anirudh,

yes, also tried that,  didn't resolve. Looking into root cause and will
update.

Best Regards

Lai Wei

https://www.linkedin.com/pub/lai-wei/2b/731/52b

On Mon, May 7, 2018 at 2:15 PM, Anirudh  wrote:

> Hi Lai,
>
> I see that you used USE_MKL2017_EXPERIMENTAL=1, I am not sure if this is
> the right flag. Did you try USE_MKLDNN=1 ?
>
> Anirudh
>
> On Mon, May 7, 2018 at 1:22 PM, Lai Wei  wrote:
>
> > Hi,
> >
> > I would like to raise an issue with mxnet-mkl. The keras-mxnet package
> was
> > working fine with mxnet-mkl 1.1.0 for training on CPU. However, weights
> are
> > not updated when I use mxnet-mkl 1.2.0b20180507. I tried both 'pip
> install
> > mxnet-mkl --pre' and built from source from release branch (v1.2.0) with
> > mkl flag.
> >
> > Please refer to this issue for more details:
> > https://github.com/awslabs/keras-apache-mxnet/issues/75
> >
> > There is no code change on keras-mxnet side, so I guess some API broke
> when
> > using latest mxnet-mkl. Still working on finding the root cause.
> >
> > Thanks
> >
> >
> > Best Regards
> >
> > Lai Wei
> >
> > https://www.linkedin.com/pub/lai-wei/2b/731/52b
> >
> > On Mon, May 7, 2018 at 10:38 AM, Haibin Lin 
> > wrote:
> >
> > > +1 binding. Build from source with CUDA, ran linear classification
> > example
> > > and works fine.
> > >
> > > Best.
> > > Haibin
> > >
> > >
> > > On Sun, May 6, 2018 at 10:08 PM, Steffen Rochel <
> steffenroc...@gmail.com
> > >
> > > wrote:
> > >
> > > > +1 (non-binding). Tested with selected notebooks from The Straight
> > Dope.
> > > > So many important enhancements everybody contributed and our users
> are
> > > > waiting for. Hope we will see more votes.
> > > > Steffen
> > > > On Mon, May 7, 2018 at 1:07 AM Anirudh 
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Since we don't have enough binding votes yet, I am extending the
> vote
> > > > till
> > > > > tomorrow (Monday May 7th), 12:50 PM PDT.
> > > > >
> > > > > Anirudh
> > > > >
> > > > > On Sun, May 6, 2018 at 4:05 PM, Anirudh 
> > wrote:
> > > > >
> > > > > > Hi Pedro,
> > > > > >
> > > > > > Thanks for the clarification. I was able to reproduce the issue
> > with
> > > > > > USE_OPENMP=OFF. I wasn't able to reproduce the issue with Make.
> > Since
> > > > the
> > > > > > issue is not reproducible with make and the customers using
> > > > > USE_OPENMP=OFF
> > > > > > with cmake should be small, I agree with you that this should not
> > be
> > > a
> > > > > > blocker. I have added the issue to known issues in release notes:
> > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.2.0.rc2
> > > > > >
> > > > > > Anirudh
> > > > > >
> > > > > > On Sun, May 6, 2018 at 9:03 AM, Pedro Larroy <
> > > > > pedro.larroy.li...@gmail.com
> > > > > > > wrote:
> > > > > >
> > > > > >> Agreed, I was not aware that the problems where not present in
> the
> > > > > release
> > > > > >> branch.
> > > > > >>
> > > > > >> On Fri, May 4, 2018 at 8:32 PM, Haibin Lin <
> > > haibin.lin@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > I agree with Anirudh that the focus of the discussion should
> be
> > > > > limited
> > > > > >> to
> > > > > >> > the release branch, not the master branch. Anything that
> breaks
> > on
> > > > > >> master
> > > > > >> > but works on release branch should not block the release
> itself.
> > > > > >> >
> > > > > >> >
> > > > > >> > Best,
> > > > > >> >
> > > > > >> > Haibin
> > > > > >> >
> > > > > >> > On Fri, May 4, 2018 at 10:58 AM, Pedro Larroy <
> > > > > >> > pedro.larroy.li...@gmail.com>
> > > > > >> > wrot

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Lai Wei
Hi,

I would like to raise an issue with mxnet-mkl. The keras-mxnet package was
working fine with mxnet-mkl 1.1.0 for training on CPU. However, weights are
not updated when I use mxnet-mkl 1.2.0b20180507. I tried both 'pip install
mxnet-mkl --pre' and built from source from release branch (v1.2.0) with
mkl flag.

Please refer to this issue for more details:
https://github.com/awslabs/keras-apache-mxnet/issues/75

There is no code change on keras-mxnet side, so I guess some API broke when
using latest mxnet-mkl. Still working on finding the root cause.

Thanks


Best Regards

Lai Wei

https://www.linkedin.com/pub/lai-wei/2b/731/52b

On Mon, May 7, 2018 at 10:38 AM, Haibin Lin 
wrote:

> +1 binding. Build from source with CUDA, ran linear classification example
> and works fine.
>
> Best.
> Haibin
>
>
> On Sun, May 6, 2018 at 10:08 PM, Steffen Rochel 
> wrote:
>
> > +1 (non-binding). Tested with selected notebooks from The Straight Dope.
> > So many important enhancements everybody contributed and our users are
> > waiting for. Hope we will see more votes.
> > Steffen
> > On Mon, May 7, 2018 at 1:07 AM Anirudh  wrote:
> >
> > > Hi all,
> > >
> > > Since we don't have enough binding votes yet, I am extending the vote
> > till
> > > tomorrow (Monday May 7th), 12:50 PM PDT.
> > >
> > > Anirudh
> > >
> > > On Sun, May 6, 2018 at 4:05 PM, Anirudh  wrote:
> > >
> > > > Hi Pedro,
> > > >
> > > > Thanks for the clarification. I was able to reproduce the issue with
> > > > USE_OPENMP=OFF. I wasn't able to reproduce the issue with Make. Since
> > the
> > > > issue is not reproducible with make and the customers using
> > > USE_OPENMP=OFF
> > > > with cmake should be small, I agree with you that this should not be
> a
> > > > blocker. I have added the issue to known issues in release notes:
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.2.0.rc2
> > > >
> > > > Anirudh
> > > >
> > > > On Sun, May 6, 2018 at 9:03 AM, Pedro Larroy <
> > > pedro.larroy.li...@gmail.com
> > > > > wrote:
> > > >
> > > >> Agreed, I was not aware that the problems where not present in the
> > > release
> > > >> branch.
> > > >>
> > > >> On Fri, May 4, 2018 at 8:32 PM, Haibin Lin <
> haibin.lin@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > I agree with Anirudh that the focus of the discussion should be
> > > limited
> > > >> to
> > > >> > the release branch, not the master branch. Anything that breaks on
> > > >> master
> > > >> > but works on release branch should not block the release itself.
> > > >> >
> > > >> >
> > > >> > Best,
> > > >> >
> > > >> > Haibin
> > > >> >
> > > >> > On Fri, May 4, 2018 at 10:58 AM, Pedro Larroy <
> > > >> > pedro.larroy.li...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > I see your point.
> > > >> > >
> > > >> > > I checked the failures on the v1.2.0 branch and I don't see
> > > segfaults,
> > > >> > just
> > > >> > > minor failures due to flaky tests.
> > > >> > >
> > > >> > > I will trigger it repeatedly a few times until Sunday to have a
> > and
> > > >> > change
> > > >> > > my vote accordingly.
> > > >> > >
> > > >> > >
> > > http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.2.0/
> > > >> > > http://jenkins.mxnet-ci.amazon-ml.com/blue/
> organizations/jenkins/
> > > >> > > incubator-mxnet/detail/v1.2.0/17/pipeline
> > > >> > > http://jenkins.mxnet-ci.amazon-ml.com/blue/
> organizations/jenkins/
> > > >> > > incubator-mxnet/detail/v1.2.0/15/pipeline/
> > > >> > >
> > > >> > >
> > > >> > > Pedro.
> > > >> > >
> > > >> > > On Fri, May 4, 2018 at 7:16 PM, Anirudh 
> > > >> wrote:
> > > >> > >
> > > >> > > > Hi Pedro,
> > > >> > > >
> > > >> >