Re: Global Search Now Available on MXNet Website

2020-05-20 Thread Lin Yuan
Awesome work! Thanks a lot for making this desirable feature happen.

Lin

On Wed, May 20, 2020 at 8:45 AM Yang Shi  wrote:

> Hi MXNet Community,
>
> Global search feature is added to the main information pages of MXNet
> website. It can search for contents across all site in any version.
> Currently it is available on master website, and will be supported on v1.6
> website shortly.
>
> Best regards,
> Yang
>


Re: Versioned Dropdown for Python API Docs

2020-04-09 Thread Lin Yuan
Connor,

Good job! Thanks for your contribution. The versioned website looks neat
and hopefully can reduce a lot of confusion to MXNet users.

Best,

Lin


On Thu, Apr 9, 2020 at 11:03 AM Goggins, Connor 
wrote:

> Hi all, the new production MXNet website (with general API version
> dropdown) is now live: https://mxnet.apache.org/. Users can now leverage
> the general version dropdown to switch between different versions of the
> website corresponding to various MXNet API releases.
>
> If you happen to find any bugs associated with the dropdown or the website
> content, please create a GitHub issue in the repository<
> https://github.com/apache/incubator-mxnet> with the "Website" tag so the
> bug can be fixed.
>
> All the best,
> Connor
>
> From: "Goggins, Connor" 
> Date: Monday, March 30, 2020 at 12:41 PM
> To: "d...@mxnet.apache.org" 
> Cc: "Krishnamurthy, Sandeep" , "Markham, Aaron" <
> markh...@amazon.com>
> Subject: Re: Versioned Dropdown for Python API Docs
>
> Hi all, quick status update on the Website 2.0 project: currently, v1.6 of
> the website is not available (only the master is available). This presents
> a serious problem to our users, as v1.6 is the current stable release. To
> solve this issue, we tried generating a static artifact for v1.6 (in
> addition to static artifacts for the other versions), but ran into issues
> with building Julia and R docs for v1.6. We traced these errors to the
> version of Python being used for the v1.6 Jenkins pipeline (Python 2.7),
> and agreed that fixing the issue would require converting the entire
> pipeline to use Python 3. Since this will take a significant amount of time
> and we want to make the v1.6 website available to our users as soon as
> possible, we agreed to document the issue here (
> https://github.com/apache/incubator-mxnet/issues/17910) for future work,
> build the static artifact for v1.6 without the Julia & R docs as a stopgap
> solution, and pull in the Julia & R docs from master (since the user impact
> of this will be minimal).
>
> Are there any concerns from the community in taking this approach? Would
> also love to get thoughts from Julia and R contributors on this proposed
> path forward.
>
> From: "Goggins, Connor" 
> Date: Tuesday, March 24, 2020 at 10:43 AM
> To: "d...@mxnet.apache.org" 
> Cc: "Krishnamurthy, Sandeep" , "Markham, Aaron" <
> markh...@amazon.com>
> Subject: Re: Versioned Dropdown for Python API Docs
>
> By the way, I am currently developing a build pipeline for v1.6, so that
> specific version is not currently available.
>
> From: "Goggins, Connor" 
> Date: Monday, March 23, 2020 at 8:08 PM
> To: "d...@mxnet.apache.org" 
> Cc: "Krishnamurthy, Sandeep" , "Markham, Aaron" <
> markh...@amazon.com>
> Subject: Re: Versioned Dropdown for Python API Docs
>
> Update on progress so far:
>
>   *   Fixed broken components of static artifacts for old versions
> (internal/external links, menus, etc.)
>   *   Added missing supplemental content (missing tutorials, docs, etc.)
> to static artifacts for old versions
>   *   Implemented working general version dropdown menu capable of
> switching between old artifacts
>   *   Finished general version dropdown for master website (styling and
> functionality) – successfully tested with Jenkins full website build
> I have deployed the artifact generated by the Jenkins full website build
> here as a
> preview. Please let me know if you have any feedback or if there are any
> changes you would like made in the project’s Github issue<
> https://github.com/apache/incubator-mxnet/issues/17798>.
>
> From: "Goggins, Connor" 
> Date: Monday, March 9, 2020 at 6:22 PM
> To: "d...@mxnet.apache.org" 
> Cc: "Krishnamurthy, Sandeep" , "Markham, Aaron" <
> markh...@amazon.com>
> Subject: Versioned Dropdown for Python API Docs
>
> With the development of MXNet Website 2.0, I propose a version dropdown
> for the Python API docs to support documentation for past releases and
> reduce confusion regarding the incompatibility of past releases with the
> current docs (which only cover master).
>
> Issues is being tracked here<
> https://github.com/apache/incubator-mxnet/issues/17798>.
>


Re: MXNet Bot Demo

2020-03-12 Thread Lin Yuan
Chai,

Awesome work. When do we expect this bot to be deployed?

Best,

Lin

On Thu, Mar 12, 2020 at 2:00 PM Chaitanya Bapat 
wrote:

> Hello MXNet community,
>
> I have built an MXNet Bot  that allows PR
> Authors, Committers and Jenkins Admins to trigger CI manually.
> It handles 2 problems
> 1. Manual CI trigger instead of existing automated CI trigger
> 2. Gives permissions to PR Authors (in addition to MXNet Committers and
> Jenkins Admins)
>
> Design Doc :
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+CI+Bot
>
> I urge you all to attend the demonstration meeting and lend your views on
> the same.
>
> Thank you,
> Chai
>
> *Meeting Details*:
> ==Conference Bridge Information==
> You have been invited to an online meeting, powered by Amazon Chime.
> *Chime meeting ID*: *9272158344*
> Join via Chime clients (manually): Select 'Meetings > Join a Meeting', and
> enter 9272158344
> Join via Chime clients (auto-call): If you invite auto-call as attendee,
> Chime will call you when the meeting starts, select 'Answer'
> *Join via browser screen share*: https://chime.aws/9272158344
> *Join via phone* (US): +1-929-432-4463,,,9272158344#
> *Join via phone (US toll-free)*: +1-855-552-4463,,,9272158344#
> International dial-in: https://chime.aws/dialinnumbers/
> In-room video system: Ext: 62000, Meeting PIN: 9272158344#
>
> --
> *Chaitanya Prakash Bapat*
> *+1 (973) 953-6299*
>
> [image: https://www.linkedin.com//in/chaibapat25]
> [image: https://www.facebook.com/chaibapat
> ]
> [image:
> https://twitter.com/ChaiBapchya] [image:
> https://www.linkedin.com//in/chaibapat25]
> 
>


Re: [apache/incubator-mxnet] [RFC] Apache MXNet 2.0 Roadmap (#16167)

2020-02-19 Thread Lin Yuan
Is there a plan to remove the cudnn_off argument from the neural network 
operators such as Dropout, Convolution, Pool etc. It creates a few usability 
issues:
(1) Once a model is exported. It requires users to change this flag in all the 
layers manually if they want to enable/disable cuDNN. When the cudnn_off is set 
to true in some layers, the global env variable `MXNET_CUDNN_AUTOTUNE_DEFAULT` 
becomes don't care. It's very confusing to users to see an error message like 
"Please turn off MXNET_CUDNN_AUTOTUNE_DEFAULT" by indeed it does not do 
anything.
(2) Why did we expose such implementation detail to users at the first place? 
In the worst case, we should just provide a global variable to turn on/off 
cuDNN in all layers instead of at operator level.

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16167#issuecomment-588530481

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Lin Yuan
Pedro,

While I agree with you we need to fix this usability issue, I don't think
this is a release blocker as Przemek mentioned above. Could we fix this in
the next minor release?

Thanks,

Lin

On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy 
wrote:

> Right. Would it be possible to have the CMake build also use libgomp for
> consistency with the releases until these issues are resolved?
> This can affect anyone compiling the distribution with CMake and also
> happens randomly in CI, worsening the contributor experience due to CI
> failures.
>
> On Tue, Feb 4, 2020 at 9:33 AM Przemysław Trędak 
> wrote:
>
> > Hi Pedro,
> >
> > From the issue that you linked it seems that you are using the LLVM
> > OpenMP, whereas I believe the actual release uses libgomp (at least
> that's
> > what seems to be the conclusion from this issue:
> > https://github.com/apache/incubator-mxnet/issues/16891)?
> >
> > Przemek
> >
> > On 2020/02/04 03:42:30, Pedro Larroy 
> > wrote:
> > > -1
> > >
> > > Unit tests passed in CPU build.
> > >
> > > I observe crashes related to openmp using cpp unit tests:
> > >
> > > https://github.com/apache/incubator-mxnet/issues/17043
> > >
> > > Pedro.
> > >
> > > On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat 
> > wrote:
> > >
> > > > +1
> > > > Successfully built MXNet 1.6.0rc2 on Linux
> > > > Tested for OpPerf utility
> > > > For CPU -
> > > > https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c
> > > >
> > > > Works well!
> > > >
> > > >
> > > >
> > > > On Mon, 3 Feb 2020 at 15:43, Lin Yuan  wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Tested Horovod with mnist example. My compiler flags are below:
> > > > >
> > > > > [✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔
> > CPU_SSE2,
> > > > ✔
> > > > > CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖
> > > > CPU_AVX2, ✔
> > > > > OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖
> > > > BLAS_MKL, ✖
> > > > > BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✔
> > > > > DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✖
> > DEBUG, ✖
> > > > > TVM_OP]
> > > > >
> > > > > Lin
> > > > >
> > > > > On Sat, Feb 1, 2020 at 9:55 PM Tao Lv  wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > I tested below items:
> > > > > > 1. download artifacts from Apache dist repo;
> > > > > > 2. the signature looks good;
> > > > > > 3. build from source code with MKL-DNN and MKL on centos;
> > > > > > 4. run fp32 and int8 inference of ResNet50 under
> > > > /example/quantization/.
> > > > > >
> > > > > > thanks,
> > > > > > -tao
> > > > > >
> > > > > > On Sun, Feb 2, 2020 at 11:00 AM Tao Lv  wrote:
> > > > > >
> > > > > > > I see. I was looking at this page:
> > > > > > >
> https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > >
> > > > > > > On Sun, Feb 2, 2020 at 4:54 AM Przemysław Trędak <
> > ptre...@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Tao,
> > > > > > >>
> > > > > > >> Could you tell me where did you look for it and did not find
> > it? I
> > > > > just
> > > > > > >> checked and both
> > > > > > >>
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > and
> > > > > > >> draft of the release on GitHub have them.
> > > > > > >>
> > > > > > >> Thank you
> > > > > > >> Przemek
> > > > > > >>
> > > > > > >> On 2020/02/01 14:23:11, Tao Lv  wrote:
> > > > > > >> > It seems the src tar and signature are missing from the tag.
> > > > > > >> >
> > > > > > >> > On Fri, Jan 31, 2020 at 11:09 AM Przemysław Trędak

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-03 Thread Lin Yuan
+1

Tested Horovod with mnist example. My compiler flags are below:

[✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔
CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔
OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖
BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✔
DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✖ DEBUG, ✖
TVM_OP]

Lin

On Sat, Feb 1, 2020 at 9:55 PM Tao Lv  wrote:

> +1
>
> I tested below items:
> 1. download artifacts from Apache dist repo;
> 2. the signature looks good;
> 3. build from source code with MKL-DNN and MKL on centos;
> 4. run fp32 and int8 inference of ResNet50 under /example/quantization/.
>
> thanks,
> -tao
>
> On Sun, Feb 2, 2020 at 11:00 AM Tao Lv  wrote:
>
> > I see. I was looking at this page:
> > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> >
> > On Sun, Feb 2, 2020 at 4:54 AM Przemysław Trędak 
> > wrote:
> >
> >> Hi Tao,
> >>
> >> Could you tell me where did you look for it and did not find it? I just
> >> checked and both
> >> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/ and
> >> draft of the release on GitHub have them.
> >>
> >> Thank you
> >> Przemek
> >>
> >> On 2020/02/01 14:23:11, Tao Lv  wrote:
> >> > It seems the src tar and signature are missing from the tag.
> >> >
> >> > On Fri, Jan 31, 2020 at 11:09 AM Przemysław Trędak <
> ptre...@apache.org>
> >> > wrote:
> >> >
> >> > > Dear MXNet community,
> >> > >
> >> > > This is the vote to release Apache MXNet (incubating) version 1.6.0.
> >> > > Voting starts today and will close on Monday 2/3/2020 23:59 PST.
> >> > >
> >> > > Link to release notes:
> >> > >
> https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
> >> > >
> >> > > Link to release candidate:
> >> > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> >> > >
> >> > > Link to source and signatures on apache dist server:
> >> > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> >> > >
> >> > > The differences comparing to previous release candidate 1.6.0.rc1:
> >> > >  * Fixes for license issues (#17361, #17375, #17370, #17460)
> >> > >  * Bugfix for saving LSTM layer parameter (#17288)
> >> > >  * Bugfix for downloading the model from model zoo from multiple
> >> processes
> >> > > (#17372)
> >> > >  * Fixed a symbol.py in AMP for GluonNLP (#17408)
> >> > >
> >> > >
> >> > > Please remember to TEST first before voting accordingly:
> >> > > +1 = approve
> >> > > +0 = no opinion
> >> > > -1 = disapprove (provide reason)
> >> > >
> >> > >
> >> > > Best regards,
> >> > > Przemyslaw Tredak
> >> > >
> >> >
> >>
> >
>


Re: Requesting slack access

2020-01-27 Thread Lin Yuan
Done. Welcome to MXNet community!

Lin

On Sat, Jan 25, 2020 at 5:48 PM Tajinder Singh  wrote:

> Hi,
>
> Please add me to slack work space. My email: tsingh2...@gmail.com
>
> Thanks,
> Tajinder
>


Re: [apache/incubator-mxnet] [RFC] Deferred compute in imperative interface to unify imperative and symbolic interface (#16376)

2020-01-27 Thread Lin Yuan
This seems to be a big change to the existing operator mode (imperative and 
symbolic). Could you please provide more information.

AFAIK, symbolic API already does deferred init, imperative API is provided to 
improve user experience. Based on this RFC, what's the advantage of this new 
deferred_compute mode? As a user, when should I use it or not.

Another question. We all know deferred init cause bad user experience when it 
comes to debugging. Would this RFC address the debuggability issue?

If it's about performance optimization, could we have some initial data of 
using this new deferred mode vs. existing imperative mode?

Thanks,

Lin 


-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16376#issuecomment-579077455

Re: Request to join slack channel

2020-01-22 Thread Lin Yuan
Done

On Wed, Jan 22, 2020 at 12:13 PM Salim Chemlal  wrote:

> Hi,
>
> I am an AI adjunct faculty at Old Dominion University and a DL engineer. I
> am requesting to be added to mxnet slack channel, email is
> drchem...@gmail.com
>
> Thank you
>
> Salim
>


Re: Slack Access

2020-01-22 Thread Lin Yuan
Invitation sent. Thanks for your interest.

Lin

On Wed, Jan 22, 2020 at 1:15 AM João Costa  wrote:

> Hi
>
> Can you give me access to slack?
>
> Thanks
> João Costa
>


[NOTIFICATION] CI Restart

2020-01-21 Thread Lin Yuan
Dear Community,

Since Jan 14, 2020, our developers have identified frequently occurrence of
test time-out in our CI system. Nick and Pedro have helped to investigate
this random test timeout, however, due to the design of CI system the
failed instances are already reclaimed and not enough logging is captured
to identify the rootcause.

To resume the development cadence promptly and remove the burden of
repeated PR submission from developers, we decided to take the following
steps:

1) Restart the CI jenkins master
2) Modify the CI configuration so that the snapshot will be taken before
the randomly failed CI instance is reclamied.
3) Keep investigating the rootcause of the random test timeout issue.

In the meanwhile, if you already have PRs currently running in the CI
pipeline, please resubmit your PRs to make sure they will run the pipeline
after restart.

We are sorry for any inconvenience caused.

Best Regards,

Lin


Re: MXNet 1.6 as last release with Python 2 support?

2020-01-17 Thread Lin Yuan
+1

On Fri, Jan 17, 2020 at 10:04 AM Xingjian SHI  wrote:

> +1. We should move to support Python>=3.5 only.
>
> Get Outlook for iOS
> 
> From: Lausen, Leonard 
> Sent: Friday, January 17, 2020 10:02:30 AM
> To: d...@mxnet.apache.org 
> Subject: Re: MXNet 1.6 as last release with Python 2 support?
>
> If the lazy consensus passes, I believe the minimum Python version
> supported
> would be Python 3.5.
>
> Python 3.5 because it seems to be the minimum Python 3 version tested by
> our CI,
> specifically in the jobs running on Ubuntu 16.04.
>
> Best regards
> Leonard
>
> On Fri, 2020-01-17 at 17:36 +, Lausen, Leonard wrote:
> > Dear MXNet community,
> >
> > as effective January 1, 2020, no new bug reports, fixes, or changes will
> be
> > made
> > to Python 2, and as MXNet 1.6 will be released after January 1, 2020, I
> > suggest
> > to announce in the MXNet 1.6 release notes that MXNet 1.6 is the last
> release
> > supporting Python 2.
> >
> > We have previously reached consensus on announcing that Python 2 is
> dropped in
> > the next major release (ie. MXNet 2), however, given the delay in 1.6
> release,
> > the plan to release 1.7 in the future and that Python 2 is dead already I
> > think
> > we can revisit this assumption.
> >
> > Advantages are
> > - Time savings for developers, as Python 3 standard library contains more
> >   features than Python 2, and it is more efficient to target only 1
> language
> >   (Python 3) instead of 2 languages (Python 2 & 3)
> > - Simplification and cost savings for CI
> >
> > I thus suggest 72h lazy consensus for announcing dropping of Python 2 as
> > described above. If you disagree, please veto (send "-1") and we can
> continue
> > supporting Python 2 in all 1.x releases as per previous consensus. Note
> that
> > at
> > the time of previous consensus, no 1.7 release was planned.
> >
> > Best regards
> > Leonard
>


Re: [DISCUSS] Enforce tighter control on API related changes

2020-01-14 Thread Lin Yuan
Sheng,

I will provide more detail on the GitHub issue.

The "API Change" labeling for PRs sounds like a good solution to keep
consistent API design across MXNet. I guess we can close the discussion on
this topic now.

Best,

Lin

On Tue, Jan 14, 2020 at 6:02 PM Sheng Zha  wrote:

> > 2)  Regarding issue #17292, it was not broken by 4ed14e2 but an C API
> > change in in https://github.com/apache/incubator-mxnet/pull/17128. The
> > later commit 4ed14e2 was trying to fix this API change but it did not
> seem
> > to work yet.
>
> None of the existing C API was changed in #17128. #17128 had an
> unnecessary addition of a C API which was removed in 4ed14e2. Neither
> change should have broken third party integration if it's not making
> assumptions on where to find the implementation.
>
> As I don't see any further discussion on this in the issue #17292, let's
> make sure the related details are added there please.
>
> -sz
>
> On 2020/01/14 18:24:13, Lin Yuan  wrote:
> > Hi Sheng,
> >
> > Thanks for your reply.
> >
> > 1) Adding a "API Change" label is a good way to flag PRs with API change.
> > It would be great if we could make this labeling automatic with some hook
> > in API related modules, so we don't miss them in PRs.
> >
> > 2)  Regarding issue #17292, it was not broken by 4ed14e2 but an C API
> > change in in https://github.com/apache/incubator-mxnet/pull/17128. The
> > later commit 4ed14e2 was trying to fix this API change but it did not
> seem
> > to work yet.
> >
> > Horovod integration does not call any inline function from MXNet, it
> > includes an exported header c_api_error.h from mxnet to throw and catch
> > mxnet exception. Same header is included in other project, such as BytePS
> > https://github.com/bytedance/byteps/blob/master/byteps/mxnet/ops.h#L22.
> >
> > I agree with you we need a better design to allow other third party
> > libraries to build on top of MXNet. E.g. TensorFlow provides customer
> > operators for Horovod to push their allreduce actions to TensorFlow as a
> > Custom Operator instead of low-level C API calls. It seems our Custom
> > Dynamic Operator
> > <
> https://cwiki.apache.org/confluence/display/MXNET/Dynamic+CustomOp+Support
> >
> > project may enable this feature in MXNet 2.0 and I am looking forward to
> it
> > :)
> >
> > Cheers,
> >
> > Lin
> >
> >
> >
> >
> > On Mon, Jan 13, 2020 at 7:24 PM Sheng Zha  wrote:
> >
> > > Hi Lin,
> > >
> > > Thanks for the suggestions.
> > >
> > > With respect to your proposal:
> > >
> > > > (2) Any PR that contains API change should clearly state this in PR
> > > title.
> > > > Otherwise, committer can reject the PR
> > >
> > > I agree that PRs with API changes should be made more prominent.
> Another
> > > mechanism that has already been used is to tag the PRs with the "API
> > > change" label [1].
> > >
> > > On the other hand, relying on the community to call out the PRs with
> API
> > > changes may not be reliable. Oftentimes, people didn't realize that a
> > > change constitutes an API change. If a committer identifies such a
> change,
> > > a more friendly response would be to just label the PR and call out
> where
> > > the API change happens in a comment.
> > >
> > > > (1) Any PR involving change of APIs should be approved by at least
> one of
> > > > the committers from a "API Committee" before it can be merged. I will
> > > > explain how to form this committee at the end of email
> > >
> > > I'm not convinced that more hierarchy should be created among
> committers.
> > > All committers are entrusted by the PPMC to use their judgement to the
> best
> > > interest of this project, and additional qualification seems
> > > counter-productive.
> > >
> > > With respect to your linked issue #17292, as @stephenrawls pointed
> out, it
> > > comes from 4ed14e2 where the inline definition of MXAPIHandleException
> is
> > > moved to a .cc file, and I'm the one that actually made this change to
> > > unblock the PR. I want to call out that:
> > > - This is not an API change in that there's no change in the function
> > > signature or visibility in the symbol table of libmxnet.so.
> > > - It should not be the responsibility of MXNet to maintain the
> assumption
> > > that downstream projects like horovod make in their building 

Re: [DISCUSS] Enforce tighter control on API related changes

2020-01-14 Thread Lin Yuan
Hi Sheng,

Thanks for your reply.

1) Adding a "API Change" label is a good way to flag PRs with API change.
It would be great if we could make this labeling automatic with some hook
in API related modules, so we don't miss them in PRs.

2)  Regarding issue #17292, it was not broken by 4ed14e2 but an C API
change in in https://github.com/apache/incubator-mxnet/pull/17128. The
later commit 4ed14e2 was trying to fix this API change but it did not seem
to work yet.

Horovod integration does not call any inline function from MXNet, it
includes an exported header c_api_error.h from mxnet to throw and catch
mxnet exception. Same header is included in other project, such as BytePS
https://github.com/bytedance/byteps/blob/master/byteps/mxnet/ops.h#L22.

I agree with you we need a better design to allow other third party
libraries to build on top of MXNet. E.g. TensorFlow provides customer
operators for Horovod to push their allreduce actions to TensorFlow as a
Custom Operator instead of low-level C API calls. It seems our Custom
Dynamic Operator
<https://cwiki.apache.org/confluence/display/MXNET/Dynamic+CustomOp+Support>
project may enable this feature in MXNet 2.0 and I am looking forward to it
:)

Cheers,

Lin




On Mon, Jan 13, 2020 at 7:24 PM Sheng Zha  wrote:

> Hi Lin,
>
> Thanks for the suggestions.
>
> With respect to your proposal:
>
> > (2) Any PR that contains API change should clearly state this in PR
> title.
> > Otherwise, committer can reject the PR
>
> I agree that PRs with API changes should be made more prominent. Another
> mechanism that has already been used is to tag the PRs with the "API
> change" label [1].
>
> On the other hand, relying on the community to call out the PRs with API
> changes may not be reliable. Oftentimes, people didn't realize that a
> change constitutes an API change. If a committer identifies such a change,
> a more friendly response would be to just label the PR and call out where
> the API change happens in a comment.
>
> > (1) Any PR involving change of APIs should be approved by at least one of
> > the committers from a "API Committee" before it can be merged. I will
> > explain how to form this committee at the end of email
>
> I'm not convinced that more hierarchy should be created among committers.
> All committers are entrusted by the PPMC to use their judgement to the best
> interest of this project, and additional qualification seems
> counter-productive.
>
> With respect to your linked issue #17292, as @stephenrawls pointed out, it
> comes from 4ed14e2 where the inline definition of MXAPIHandleException is
> moved to a .cc file, and I'm the one that actually made this change to
> unblock the PR. I want to call out that:
> - This is not an API change in that there's no change in the function
> signature or visibility in the symbol table of libmxnet.so.
> - It should not be the responsibility of MXNet to maintain the assumption
> that downstream projects like horovod make in their building logic.
>
> A more pressing issue should have been the way that a third-party
> communication library like horovod integrates with MXNet. So far the
> horovod integration seemed brittle and there have been many issues [2]. For
> this specific issue, to me, it doesn't seem like a good decision on the
> horovod side to assume any function would be defined inline on the MXNet
> side.
>
> With the development of MXNet 2.0, it's a good time to rethink how horovod
> integration should work with MXNet. I'm hoping that MXNet 2.0 item 4.11
> AbstractKVStore interface (See #17115) could help simplify and alleviate
> the coupling in the current way of integration.
>
> -sz
>
> [1]
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93=is%3Apr+label%3A%22API+change%22+
> [2]
> https://github.com/apache/incubator-mxnet/issues?utf8=%E2%9C%93=is%3Aissue+horovod
>
> On 2020/01/14 00:37:55, Lin Yuan  wrote:
> > Dear Community,
> >
> > Recently, there were some changes to C APIs that broke another downstream
> > project Horovod: https://github.com/apache/incubator-mxnet/issues/17292.
> > Since we do not have integration tests for downstream project, it becomes
> > critical for us to update APIs with extreme caution.
> >
> > I would like to propose the following mechanism for us to maintain a
> clean
> > and robust APIs: including both C API and Python API at the moment.
> >
> > (1) Any PR involving change of APIs should be approved by at least one of
> > the committers from a "API Committee" before it can be merged. I will
> > explain how to form this committee at the end of email
> >
> > (2) Any PR that contains API change should clearly state this i

[DISCUSS] Enforce tighter control on API related changes

2020-01-13 Thread Lin Yuan
Dear Community,

Recently, there were some changes to C APIs that broke another downstream
project Horovod: https://github.com/apache/incubator-mxnet/issues/17292.
Since we do not have integration tests for downstream project, it becomes
critical for us to update APIs with extreme caution.

I would like to propose the following mechanism for us to maintain a clean
and robust APIs: including both C API and Python API at the moment.

(1) Any PR involving change of APIs should be approved by at least one of
the committers from a "API Committee" before it can be merged. I will
explain how to form this committee at the end of email

(2) Any PR that contains API change should clearly state this in PR title.
Otherwise, committer can reject the PR

API Committee:
- This committee should consist of both seasoned MXNet developers and users.
- Committee members should have a comprehensive view of MXNet APIs to make
sure their usage are consistent across stack.
- Committee members review PRs that involve API change with extra caution.
- Committee members are required to attend the roadmap discussion for each
new release.
- For API breaking changes, committe members should reach consensus before
the change is made.

Any other suggestion is welcome here.

Best,

Lin


Re: Stopping nightly releases to Pypi

2020-01-13 Thread Lin Yuan
Awesome work! It's really convenient to have this page.

Two cents:
(1) create a link on mxnet page to this one
(2) reorder the nightly as Tao suggested. Newest first.

On Mon, Jan 13, 2020 at 10:25 AM Skalicky, Sam 
wrote:

> Hi All,
>
> The html page source is available at the link (view source, its all in a
> single html file), if someone wants to make modifications I’ll be happy to
> help integrate those changes and get the latest version published in the S3
> bucket. Whenever the final location of the nightly builds is identified we
> can move/modify the script appropriately.
>
> Sam
>
> > On Jan 12, 2020, at 5:41 PM, Tao Lv  wrote:
> >
> > Thank you for the effort, Sam. One minor suggestion: can we sort and put
> > the latest build at the top of the table?
> >
> > -tao
> >
> > On Mon, Jan 13, 2020 at 7:03 AM Marco de Abreu 
> > wrote:
> >
> >> Hi Sam,
> >>
> >> that's a great idea, thanks! Can you please adjust the script so it uses
> >> the artifacts that will be published once Shengs PR gets merged?
> >>
> >> Best regards,
> >> Marco
> >>
> >> Skalicky, Sam  schrieb am So., 12. Jan.
> 2020,
> >> 23:23:
> >>
> >>> Hi dev,
> >>>
> >>> I made an html page that generates the links to the nightly builds
> >>> available in the public S3 bucket so you don’t have to log into AWS to
> >> see
> >>> them.
> >>>
> >>> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/index.html
> >>>
> >>> Keep in mind we only have builds from January 2020 and December 2019 so
> >>> far.
> >>>
> >>> Sam
> >>>
> >>> On Jan 10, 2020, at 3:05 AM, Sheng Zha  >>> zhash...@apache.org>> wrote:
> >>>
> >>> Size of a change doesn't necessarily reflect the time one spends on the
> >>> navigating the code base and finding the solution. Also, I tend to
> >> believe
> >>> that everyone genuinely wants what's best for the project, just from
> >>> different perspectives.
> >>>
> >>> Let's focus on improving the CD solution so that security concerns can
> be
> >>> addressed too.
> >>>
> >>> -sz
> >>>
> >>> On 2020/01/09 21:57:30, Chris Olivier  >>> cjolivie...@apache.org>> wrote:
> >>> If this tiny fix is representative of the bulk of the reasoning behind
> >> all
> >>> the the CD churn recently, then this seems to be of some concern.
> >>>
> >>> -Chris
> >>>
> >>> On Thu, Jan 9, 2020 at 6:32 AM Marco de Abreu  >>> <mailto:marco.g.ab...@gmail.com>>
> >>> wrote:
> >>>
> >>> Great, thanks a lot sheng!
> >>>
> >>> -Marco
> >>>
> >>> Sheng Zha mailto:zhash...@apache.org>> schrieb am
> >>> Do., 9. Jan. 2020, 14:28:
> >>>
> >>> I'm fixing the CD pipeline in
> >>> https://github.com/apache/incubator-mxnet/pull/17259/files and will
> >>> update the s3 publish path so that it's friendly for automatically
> >>> generating such page.
> >>>
> >>> -sz
> >>>
> >>> On 2020/01/06 18:19:52, "Lausen, Leonard"  >>> <mailto:lau...@amazon.com.INVALID>>
> >>> wrote:
> >>> Consider a user finds a bug in a nightly version but we can't narrow
> >>> down the
> >>> version of mxnet used as the name is constant over time. Or users wan't
> >>> to
> >>> revert back to the previous nightly version installed but don't know
> >>> which date
> >>> it was from due to constant name.
> >>>
> >>> Instead I suggest we introduce an autogenerated page like
> >>> https://download.pytorch.org/whl/nightly/cu101/torch_nightly.html
> >>>
> >>> Then "pip install -f URLTOPAGE mxnet" will install the latest available
> >>> version.
> >>> Maybe the team maintaining the S3 bucket can reconsider creating such
> >>> page for
> >>> the intermediate time until the CD based nighlty build is operating.
> >>>
> >>> On Mon, 2020-01-06 at 10:01 -0800, Lin Yuan wrote:
> >>> +1 for a nightly pip with fixed name.
> >>>
> >>> We need this to track mxnet integration with other packages such as
> >>> Horovod.
> >>>
> >>> 

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc1

2020-01-10 Thread Lin Yuan
We can release one cpu-mkl and one CUDA wheel  for testing various
applications. Other people can build from source if they want other flavors

Lin

On Fri, Jan 10, 2020 at 4:00 PM Karan Jariwala 
wrote:

> Yes, agree with your point. But we will be requiring  many flavors of pip
> wheel.
>
> MKL/ without MKL
> CUDA/ no CUDA
> Linux/windows/Mac
>
> Thanks,
> Karan
>
> On Fri, Jan 10, 2020 at 3:54 PM Haibin Lin 
> wrote:
>
> > Shall we provide pip wheels for later release votes?
> >
> > Not everyone knows how to build MXNet from source (and building from
> source
> > also takes very long). Providing a pip wheel would lower the bar for
> users
> > who wants to test MXNet and participate in voting.
> >
> > Best,
> > Haibin
> >
> > On Fri, Jan 10, 2020 at 3:50 PM Haibin Lin 
> > wrote:
> >
> > > +1
> > >
> > > Built from source with USE_CUDA=1 on Ubuntu. Run gluon-nlp unit tests
> and
> > > they passed.
> > >
> > > On Fri, Jan 10, 2020 at 3:18 PM Karan Jariwala <
> karankjariw...@gmail.com
> > >
> > > wrote:
> > >
> > >> +1
> > >>
> > >> Tested MXNet with and without MKL-DNN on Ubuntu 16.04 with Horovod
> > 0.18.2.
> > >> No regression seen between 1.5.1 and 1.6.0.rc1 when running
> > horovod_MXNet
> > >> integration test.
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Karan
> > >>
> > >> On Fri, Jan 10, 2020 at 2:47 PM Markus Weimer 
> > wrote:
> > >>
> > >> > +1 (binding)
> > >> >
> > >> > I tested on Ubuntu 18.04 on the Windows Subsystem for Linux.
> > >> >
> > >> > Tested:
> > >> >   * Built from source using the instructions here [0]
> > >> >   * Ran the tests in `./build/tests/mxnet_unit_tests`
> > >> >   * SHA512 of the archive
> > >> >
> > >> > Not tested:
> > >> >   * Language bindings
> > >> >   * CUDA or other GPU acceleration
> > >> >   * LICENSE and compliance status
> > >> >   * Signature of the archive
> > >> >
> > >>
> > >
> >
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc1

2020-01-07 Thread Lin Yuan
Correction: it was built from source on Ubuntu 16.04

On Tue, Jan 7, 2020 at 11:42 AM Lin Yuan  wrote:

> +1
>
> Build from source on Ubuntu 18 with CUDA/CUDNN/NCCL on and verified it
> works with Horovod 0.18.2
>
> On Tue, Jan 7, 2020 at 9:55 AM Przemysław Trędak 
> wrote:
>
>> Dear MXNet community,
>>
>> This is the vote to release Apache MXNet (incubating) version 1.6.0.
>> Voting starts today and will close on Friday 1/10/2020 23:59 PST.
>>
>> Link to release notes:
>> https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
>>
>> Link to release candidate:
>> https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc1
>>
>> Link to source and signatures on apache dist server:
>> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc1/
>>
>> The differences comparing to previous release candidate 1.6.0.rc0:
>> * Fix for RNN gradient calculation for MKLDNN ([v1.6.x] Cherry-pick
>> MKL-DNN Rnn operator enhancements to v1.6.x (#17225))
>> * Fix for Windows CMake build (Backport #16980 #17031 #17018 #17019 to
>> 1.6 branch (#17213))
>> * CPU counterpart to contrib multihead attention operators (Interleaved
>> MHA for CPU path (#17138) (#17211))
>> * Fix for #16060 (fix norm sparse fallback (#17149))
>> * Fix for inconsistent names in estimator API (fix parameter names in the
>> estimator api (#17051) (#17162))
>> * Fixes for OpenMP (Backport 3rdparty/openmp fixes (#17193))
>> * Fix for pointwise fusion speed for large networks (which was the reason
>> of -1 in the vote for rc0) as well as fixes for nondeterminism in sum of
>> squares operator and trainer parameter order (Backport #17002, #17068 and
>> #17114 to 1.6 branch (#17137))
>>
>>
>> Please remember to TEST first before voting accordingly:
>> +1 = approve
>> +0 = no opinion
>> -1 = disapprove (provide reason)
>>
>>
>> Best regards,
>> Przemyslaw Tredak
>>
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc1

2020-01-07 Thread Lin Yuan
+1

Build from source on Ubuntu 18 with CUDA/CUDNN/NCCL on and verified it
works with Horovod 0.18.2

On Tue, Jan 7, 2020 at 9:55 AM Przemysław Trędak  wrote:

> Dear MXNet community,
>
> This is the vote to release Apache MXNet (incubating) version 1.6.0.
> Voting starts today and will close on Friday 1/10/2020 23:59 PST.
>
> Link to release notes:
> https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
>
> Link to release candidate:
> https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc1
>
> Link to source and signatures on apache dist server:
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc1/
>
> The differences comparing to previous release candidate 1.6.0.rc0:
> * Fix for RNN gradient calculation for MKLDNN ([v1.6.x] Cherry-pick
> MKL-DNN Rnn operator enhancements to v1.6.x (#17225))
> * Fix for Windows CMake build (Backport #16980 #17031 #17018 #17019 to 1.6
> branch (#17213))
> * CPU counterpart to contrib multihead attention operators (Interleaved
> MHA for CPU path (#17138) (#17211))
> * Fix for #16060 (fix norm sparse fallback (#17149))
> * Fix for inconsistent names in estimator API (fix parameter names in the
> estimator api (#17051) (#17162))
> * Fixes for OpenMP (Backport 3rdparty/openmp fixes (#17193))
> * Fix for pointwise fusion speed for large networks (which was the reason
> of -1 in the vote for rc0) as well as fixes for nondeterminism in sum of
> squares operator and trainer parameter order (Backport #17002, #17068 and
> #17114 to 1.6 branch (#17137))
>
>
> Please remember to TEST first before voting accordingly:
> +1 = approve
> +0 = no opinion
> -1 = disapprove (provide reason)
>
>
> Best regards,
> Przemyslaw Tredak
>


Re: Stopping nightly releases to Pypi

2020-01-06 Thread Lin Yuan
+1 for a nightly pip with fixed name.

We need this to track mxnet integration with other packages such as Horovod.

Sam, when do you think we can have this nightly build with a fixed name?

Thanks,

Lin

On Sun, Jan 5, 2020 at 7:48 PM Skalicky, Sam 
wrote:

> Hi Tao,
>
> We dont have this yet, but we did think about putting the latest wheels in
> a specific place in the s3 bucket so they are always updated. Initially we
> decided not to do this since the main MXNet CD should have been fixed. But
> since its still not fixed yet, we might try and go ahead and do this.
>
> Sam
>
> On Jan 5, 2020, at 6:02 PM, Lv, Tao A  tao.a...@intel.com>> wrote:
>
> Hi,
>
> How to install the latest available build of a flavor without specifying
> the build date? Something like `pip install mxnet --pre`.
>
> Thanks,
> -tao
>
> -Original Message-
> From: Skalicky, Sam  sska...@amazon.com.INVALID>>
> Sent: Monday, January 6, 2020 2:09 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Stopping nightly releases to Pypi
>
> Hi Haibin,
>
> You typed the correct URLs, the cu100 build has been failing since
> December 30th but other builds have succeeded. The wheels are being
> delivered into a public bucket that anyone with an AWS account can access
> and go poke around, here’s the URL for web access:
>
>
> https://s3.console.aws.amazon.com/s3/buckets/apache-mxnet/dist/2020-01-01/dist/?region=us-west-2=overview
>
> You will have to log into your AWS account to access it however (which
> means you’ll need an AWS account).
>
> It looks like only the following flavors are available for 2020-01-01:
> mxnet
> mxnet-cu92
> mxnet-cu92mkl
> mxnet-mkl
>
> Sam
>
> On Jan 4, 2020, at 9:06 PM, Haibin Lin  haibin.lin@gmail.com>> wrote:
>
> I was trying the nightly builds, but none of them is available:
>
> pip3 install
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-01/dist/mxnet_cu100-1.6.0b20200101-py2.py3-none-manylinux1_x86_64.whl
> --user
> 
> pip3 install
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-02/dist/mxnet_cu100-1.6.0b20200102-py2.py3-none-manylinux1_x86_64.whl
> --user
> 
> pip3 install
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-03/dist/mxnet_cu100-1.6.0b20200103-py2.py3-none-manylinux1_x86_64.whl
> --user
> 
> pip3 install
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-04/dist/mxnet_cu100-1.6.0b20200104-py2.py3-none-manylinux1_x86_64.whl
> --user
> 
>
> ERROR: Could not install requirement mxnet-cu100==1.6.0b20200103 from
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-03/dist/mxnet_cu100-1.6.0b20200103-py2.py3-none-manylinux1_x86_64.whl
> because of HTTP error 404 Client Error: Not Found for url:
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-03/dist/mxnet_cu100-1.6.0b20200103-py2.py3-none-manylinux1_x86_64.whl
> for URL
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-03/dist/mxnet_cu100-1.6.0b20200103-py2.py3-none-manylinux1_x86_64.whl
>
> Please let me know if I typed wrong URLs.
>
> 1. The discoverability of available nightly builds needs improvement. If
> someone can help write a script to list all links that exist, that would be
> very helpful.
> 2. If any nightly build is not built successfully, how do the community
> know the reason of the failure, and potentially offer helps? Currently I
> don't have much visibility of the nightly build status.
>
> Best,
> Haibin
>
>
> On Fri, Jan 3, 2020 at 5:47 PM Pedro Larroy  >
> wrote:
>
> Just to clarify, the current CI is quite an overhead to maintain for
> several reasons, this complexity is overkill for CD. Jenkins also has
> constant plugin upgrades, security vulnerabilities, has to be restarted
> from time to time as it stops working... and to make binary builds from an
> environment which runs unsafe code, I don't think is good practice. So for
> that, having a separate Jenkins, CodeBuild, Drone or using a separate
> Jenkins node is the right solution. Agree with you that is just a
> scheduler, but somebody is making efforts to keep it running. If you have
> the appetite and resources to duplicate it for CD please go ahead.
>
> On Fri, Jan 3, 2020 at 3:25 PM Marco de Abreu  >
> wrote:
>
> Regarding your point of finding somebody to maintain the 

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc0

2019-12-27 Thread Lin Yuan
No, I just wanted to call it out because the title of the issue says "Failed
OpenMP assertion when loading MXNet compiled with DEBUG=1
<https://github.com/apache/incubator-mxnet/issues/10856#>".
If this is considered a release blocker, I think we should backport it to
1.6.

Thanks,
Lin

On Fri, Dec 27, 2019 at 10:47 AM Sheng Zha  wrote:

> Reading these issues it’s pretty clear to me that these are fixes for
> broken builds. I think we do consider broken builds to be release blockers.
>
> Lin, am I missing something on which you base your suggestion for delaying
> these changes?
>
> -sz
>
> > On Dec 27, 2019, at 10:30 AM, Lin Yuan  wrote:
> >
> > Are these release blocker? It's very risky to make such last-minute big
> > change after code freeze.
> >
> > Can we do this in the next release?
> >
> > Lin
> >
> >> On Fri, Dec 27, 2019 at 7:37 AM Lausen, Leonard
> 
> >> wrote:
> >>
> >> In case of backporting #17012, also
> >> https://github.com/apache/incubator-mxnet/pull/17098 must be
> backported.
> >> The
> >> updated OpenMP added a new target which is not used by MXNet but breaks
> the
> >> build on some systems with nvptx. #17098 disables building this unused
> and
> >> broken feature.
> >>
> >>> On Thu, 2019-12-26 at 12:55 -0800, Pedro Larroy wrote:
> >>> https://github.com/apache/incubator-mxnet/pull/17012  should be also
> >> ported
> >>> to the release branch.
> >>>
> >>> On Fri, Dec 20, 2019 at 1:39 PM Przemysław Trędak 
> >>> wrote:
> >>>
> >>>> That issue is now fixed in master, I am in the process of
> >> cherry-picking
> >>>> the fix to v1.6.x branch. I will prepare the RC1 once that is ready.
> >>>>
> >>>> Thanks
> >>>> Przemek
> >>>>
> >>>> On 2019/12/20 20:07:36, Lin Yuan  wrote:
> >>>>> What's the next step for the release? Should we continue testing
> >> this and
> >>>>> vote or wait until the
> >>>>> https://github.com/apache/incubator-mxnet/issues/17105 is fixed?
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> Lin
> >>>>>
> >>>>> On Wed, Dec 18, 2019 at 12:55 AM Lausen, Leonard
> >>>> 
> >>>>> wrote:
> >>>>>
> >>>>>> Thanks Przemysław for managing this release and everyone who
> >>>> contributed
> >>>>>> to it.
> >>>>>>
> >>>>>> Unfortunately Zechen Wang just discovered another issue with GPU
> >>>> Pointwise
> >>>>>> Fusion: https://github.com/apache/incubator-mxnet/issues/17105
> >>>>>>
> >>>>>> Thus, -1.
> >>>>>>
> >>>>>> Unfortunately, as the nightly release pipeline was broken until
> >>>> recently
> >>>>>> (and
> >>>>>> still isn't re-set up completely yet), the issue hasn't been
> >> discovered
> >>>>>> earlier.
> >>>>>>
> >>>>>> Przemysław may have a quick fix for the issue. Another option
> >> would be
> >>>> to
> >>>>>> release 1.6 with MXNET_USE_FUSION default to 0.
> >>>>>>
> >>>>>> Best regards
> >>>>>> Leonard
> >>>>>>
> >>>>>> On Wed, 2019-12-18 at 05:30 +, Chen, Ciyong wrote:
> >>>>>>> Appreciate Tredak to push out voting for 1.6 release.
> >>>>>>>
> >>>>>>> +1 as we've done lots of tests with expected performance in many
> >>>>>> different
> >>>>>>> scenarios including both single-node and multi-node (horovod
> >> based),
> >>>>>> both FP32
> >>>>>>> and INT8 precision on many topologies.
> >>>>>>>
> >>>>>>> -Ciyong
> >>>>>>>
> >>>>>>> -Original Message-
> >>>>>>> From: Zhao, Patric 
> >>>>>>> Sent: Tuesday, December 17, 2019 8:51 AM
> >>>>>>> To: dev@mxnet.incubator.apache.org; d...@mxnet.apache.org
> >>>>>>> Subject: RE: [VOTE] Release Apache MXNet (incubating) version
> >>>> 1.6.0.rc0
&g

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc0

2019-12-27 Thread Lin Yuan
Are these release blocker? It's very risky to make such last-minute big
change after code freeze.

Can we do this in the next release?

Lin

On Fri, Dec 27, 2019 at 7:37 AM Lausen, Leonard 
wrote:

> In case of backporting #17012, also
> https://github.com/apache/incubator-mxnet/pull/17098 must be backported.
> The
> updated OpenMP added a new target which is not used by MXNet but breaks the
> build on some systems with nvptx. #17098 disables building this unused and
> broken feature.
>
> On Thu, 2019-12-26 at 12:55 -0800, Pedro Larroy wrote:
> > https://github.com/apache/incubator-mxnet/pull/17012  should be also
> ported
> > to the release branch.
> >
> > On Fri, Dec 20, 2019 at 1:39 PM Przemysław Trędak 
> > wrote:
> >
> > > That issue is now fixed in master, I am in the process of
> cherry-picking
> > > the fix to v1.6.x branch. I will prepare the RC1 once that is ready.
> > >
> > > Thanks
> > > Przemek
> > >
> > > On 2019/12/20 20:07:36, Lin Yuan  wrote:
> > > > What's the next step for the release? Should we continue testing
> this and
> > > > vote or wait until the
> > > > https://github.com/apache/incubator-mxnet/issues/17105 is fixed?
> > > >
> > > > Thanks!
> > > >
> > > > Lin
> > > >
> > > > On Wed, Dec 18, 2019 at 12:55 AM Lausen, Leonard
> > > 
> > > > wrote:
> > > >
> > > > > Thanks Przemysław for managing this release and everyone who
> > > contributed
> > > > > to it.
> > > > >
> > > > > Unfortunately Zechen Wang just discovered another issue with GPU
> > > Pointwise
> > > > > Fusion: https://github.com/apache/incubator-mxnet/issues/17105
> > > > >
> > > > > Thus, -1.
> > > > >
> > > > > Unfortunately, as the nightly release pipeline was broken until
> > > recently
> > > > > (and
> > > > > still isn't re-set up completely yet), the issue hasn't been
> discovered
> > > > > earlier.
> > > > >
> > > > > Przemysław may have a quick fix for the issue. Another option
> would be
> > > to
> > > > > release 1.6 with MXNET_USE_FUSION default to 0.
> > > > >
> > > > > Best regards
> > > > > Leonard
> > > > >
> > > > > On Wed, 2019-12-18 at 05:30 +, Chen, Ciyong wrote:
> > > > > > Appreciate Tredak to push out voting for 1.6 release.
> > > > > >
> > > > > > +1 as we've done lots of tests with expected performance in many
> > > > > different
> > > > > > scenarios including both single-node and multi-node (horovod
> based),
> > > > > both FP32
> > > > > > and INT8 precision on many topologies.
> > > > > >
> > > > > > -Ciyong
> > > > > >
> > > > > > -Original Message-
> > > > > > From: Zhao, Patric 
> > > > > > Sent: Tuesday, December 17, 2019 8:51 AM
> > > > > > To: dev@mxnet.incubator.apache.org; d...@mxnet.apache.org
> > > > > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
> > > 1.6.0.rc0
> > > > > > Thanks, Tredak, I will add some words for the new feature in the
> > > release
> > > > > note.
> > > > > > +1 for voting because we have ran multiple time of tests in
> local and
> > > > > got the
> > > > > > expected performance boost.
> > > > > >
> > > > > > --Patric
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: Przemysław Trędak 
> > > > > > > Sent: Tuesday, December 17, 2019 4:49 AM
> > > > > > > To: d...@mxnet.apache.org
> > > > > > > Subject: [VOTE] Release Apache MXNet (incubating) version
> 1.6.0.rc0
> > > > > > >
> > > > > > > Dear MXNet community,
> > > > > > >
> > > > > > > This is the vote to release Apache MXNet (incubating) version
> > > 1.6.0.
> > > > > > > Voting starts now and will close on Friday, 20th December 2019
> > > > > 23:59:59 PST.
> > > > > > > Link to release notes:
> > > > > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
> > > > > > > Link to release candidate:
> > > > > > >
> https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc0
> > > > > > >
> > > > > > > Link to source and signatures on apache dist server:
> > > > > > >
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc0/
> > > > > > >
> > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > > +1 = approve
> > > > > > > +0 = no opinion
> > > > > > > -1 = disapprove (provide reason)
> > > > > > >
> > > > > > > Additional notes:
> > > > > > >  - There was an issue[1] raised that 1.6.0.rc0 does not build
> with
> > > > > > > clang on FreeBSD - I decided to not block the voting for this
> and
> > > > > > > instead let the Community decide whether this is a blocker for
> the
> > > > > release.
> > > > > > >  - Patric Zhao and Tao Lv - could you help preparing a
> paragraph on
> > > > > > > MKLDNN
> > > > > > > 1.0 update in the New features section in the release notes?
> > > > > > >
> > > > > > > [1] https://github.com/apache/incubator-mxnet/issues/17076
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Przemyslaw Tredak
>


Re: Proposal for MXNet website improving

2019-12-23 Thread Lin Yuan
Agree with Patric. We should make performance of MXNet more visible in the
website.

Lin

On Sun, Dec 22, 2019 at 9:43 PM Zhao, Patric  wrote:

> From my view, performance is a big plus for MXNet and the reason why lots
> of people adopted in MXNet.
>
> I still think we need to have a top-level class for "performance".
>
> Thanks,
>
> --Patric
>
> > -Original Message-
> > From: Chen, Ciyong 
> > Sent: Monday, December 23, 2019 12:08 PM
> > To: dev@mxnet.incubator.apache.org
> > Subject: RE: Proposal for MXNet website improving
> >
> > Hi Aaron,
> >
> > Thanks for your valuable feedback.
> > I'll prepare to contribute this change and PR soon, and update the
> contents
> > as suggested.
> >
> > Regarding making "Performance" a Key Feature to replace with "Tools &
> > Libraries", anything I need to take care when removing "Tools &
> Libraries"
> > part?
> >
> > Thanks!
> > -Ciyong
> >
> > -Original Message-
> > From: Aaron Markham 
> > Sent: Saturday, December 21, 2019 4:14 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: Proposal for MXNet website improving
> >
> > Hi Ciyong, thanks for the proposal.
> > I like your suggestions. Will you be submitting a PR?
> >
> > Some feedback:
> >
> > * Regarding changing the URLs, let's avoid that. We just had a lot of
> work
> > trying to fix broken links.
> > * As far as changing the headings, sure, Tutorials and FAQs makes sense.
> > * Adding performance as a nav item - my preference and going from UX
> > guidelines, is to keep the number of them down to less than five or six.
> > - What about making performance a Key Feature and highlighting that
> on
> > the main page? I'd switch it with Tools & Libraries since Ecosystem is
> the next
> > thing below.
> >
> > Cheers,
> > Aaron
> >
> > On Thu, Dec 19, 2019 at 2:03 AM Chen, Ciyong 
> > wrote:
> > >
> > > Hi MXNet community,
> > >
> > > While doing search for MXNet from the official
> > website[https://mxnet.incubator.apache.org/], it's not that convenient
> to
> > get the recent/latest performance data, besides there's some mismatch
> > between the link and description in the current websites.
> > > We can also add some new contents (like distributed training via
> Horovod,
> > and AMP with bfloat16 data type) descriptions in FAQ section.
> > >
> > > So I propose to improve the current website structure from below 3
> > > areas
> > >
> > > 1.Add a new Tab "Performance" in the header, and change
> > "Doc" to "Tutorials" according to the current contents.
> > >
> > > 2.Align description of FAQ section to the inner page.
> > >
> > > 3.FAQ list adjustment
> > >
> > > Please check the details via below link
> > >
> > https://drive.google.com/open?id=1gQrC1V1LeJH5NT6zRqBl8Ub2qSr1dc8O
> > >
> > > Suggestions and comments are highly appreciated.
> > >
> > > Thanks!
> > > -Ciyong
> > >
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc0

2019-12-20 Thread Lin Yuan
What's the next step for the release? Should we continue testing this and
vote or wait until the
https://github.com/apache/incubator-mxnet/issues/17105 is fixed?

Thanks!

Lin

On Wed, Dec 18, 2019 at 12:55 AM Lausen, Leonard 
wrote:

> Thanks Przemysław for managing this release and everyone who contributed
> to it.
>
> Unfortunately Zechen Wang just discovered another issue with GPU Pointwise
> Fusion: https://github.com/apache/incubator-mxnet/issues/17105
>
> Thus, -1.
>
> Unfortunately, as the nightly release pipeline was broken until recently
> (and
> still isn't re-set up completely yet), the issue hasn't been discovered
> earlier.
>
> Przemysław may have a quick fix for the issue. Another option would be to
> release 1.6 with MXNET_USE_FUSION default to 0.
>
> Best regards
> Leonard
>
> On Wed, 2019-12-18 at 05:30 +, Chen, Ciyong wrote:
> > Appreciate Tredak to push out voting for 1.6 release.
> >
> > +1 as we've done lots of tests with expected performance in many
> different
> > scenarios including both single-node and multi-node (horovod based),
> both FP32
> > and INT8 precision on many topologies.
> >
> > -Ciyong
> >
> > -Original Message-
> > From: Zhao, Patric 
> > Sent: Tuesday, December 17, 2019 8:51 AM
> > To: dev@mxnet.incubator.apache.org; d...@mxnet.apache.org
> > Subject: RE: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc0
> >
> > Thanks, Tredak, I will add some words for the new feature in the release
> note.
> >
> > +1 for voting because we have ran multiple time of tests in local and
> got the
> > expected performance boost.
> >
> > --Patric
> >
> > > -Original Message-
> > > From: Przemysław Trędak 
> > > Sent: Tuesday, December 17, 2019 4:49 AM
> > > To: d...@mxnet.apache.org
> > > Subject: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc0
> > >
> > > Dear MXNet community,
> > >
> > > This is the vote to release Apache MXNet (incubating) version 1.6.0.
> > > Voting starts now and will close on Friday, 20th December 2019
> 23:59:59 PST.
> > >
> > > Link to release notes:
> > > https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
> > >
> > > Link to release candidate:
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc0
> > >
> > > Link to source and signatures on apache dist server:
> > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc0/
> > >
> > > Please remember to TEST first before voting accordingly:
> > > +1 = approve
> > > +0 = no opinion
> > > -1 = disapprove (provide reason)
> > >
> > > Additional notes:
> > >  - There was an issue[1] raised that 1.6.0.rc0 does not build with
> > > clang on FreeBSD - I decided to not block the voting for this and
> > > instead let the Community decide whether this is a blocker for the
> release.
> > >  - Patric Zhao and Tao Lv - could you help preparing a paragraph on
> > > MKLDNN
> > > 1.0 update in the New features section in the release notes?
> > >
> > > [1] https://github.com/apache/incubator-mxnet/issues/17076
> > >
> > > Best regards,
> > > Przemyslaw Tredak
>


Re: Stopping nightly releases to Pypi

2019-12-10 Thread Lin Yuan
Is there a way to install the latest nightly package without having to
specify exact date?

Thanks,

Lin

On Sun, Dec 8, 2019 at 6:13 PM Lausen, Leonard 
wrote:

> From Shanghai, the closest endpoint (automatically chosen endpoint) is in
> Tokyo
> and download speed for mxnet-mkl was on average 1.7 MB/s with a maximum of
> 5
> MB/s during my test.
>
> On Sun, 2019-12-08 at 01:30 +, Sheng Zha wrote:
> > > Heres a set of links for today’s builds
> > >
> > > (Plain mxnet, no mkl no cuda)
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > (mxnet-mkl)
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > (mxnet-cuXXX)
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > (mxnet-cuXXXmkl)
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> > These links are not utilizing the s3 accelerate feature (i.e. not backed
> by
> > cloudfront edges). Please use repo.mxnet.io instead. The updated links
> are:
> > (Plain mxnet, no mkl no cuda)
> >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > (mxnet-mkl)
> >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > (mxnet-cuXXX)
> >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu90-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu92-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu100-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu101-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > (mxnet-cuXXXmkl)
> >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu90mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu92mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu100mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu101mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> >
> > When updating the installation doc we should use repo.mxnet.io domain
> name
> > too.
> >
> > Best,
> > -sz
> >
> > On 2019/12/07 17:39:40, "Skalicky, Sam" 
> wrote:
> > > Hi MXNet Community,
> > >
> > > We have been working on getting nightly builds fixed and made available
> > > again. We’ve made another system using AWS CodeBuild & S3 to work
> around the
> > > problems with Jenkins CI, PyPI, etc. It is currently building all the
> > > flavors and publishing to an S3 bucket here:
> > >
> https://us-west-2.console.aws.amazon.com/s3/buckets/apache-mxnet/dist/?region=us-west-2
> > >
> > > There are folders for each set of nightly builds, try out the wheels
> > > starting today 2019-12-07. Builds start at 1:30am PT (9:30am GMT) and
> arrive
> > > in the bucket 30min-2hours later. Inside each folder are the wheels
> for each
> > > flavor of MXNet. Currently we’re only building for linux, builds for
> > > windows/Mac will come later.
> > >
> > > If you want to download the wheels easily you can use a URL in the
> form of:
> > > https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/
> /dist/-1.6.0b-py2.py3-none-manylinux1_x86_64.whl
> > >
> > > Heres a set of links for today’s builds
> > >
> > > (Plain mxnet, no mkl no cuda)
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > (mxnet-mkl)
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > (mxnet-cuXXX)
> > >
> 

Re: Proposal to make MKLDNN as default CPU backend

2019-11-19 Thread Lin Yuan
Also per Sam's suggestion, we could still release a build without MKLDNN
(name it mxnet-nomkldnn?) and track the usage/download for one or two
releases. If there is no usage, we could drop that build in the future.

Best,

Lin

On Tue, Nov 19, 2019 at 1:23 PM Lin Yuan  wrote:

> Just to summarize base on the concerns Marco raised and discussed abvove:
>
> - AMD CPU (it should work with MKLDNN:
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking
> )
> - ARM CPU (we don't have it today w/o MKLDNN either)
> - Windows (Windows support is there regardless of MKLDNN or not)
> - GPU and MKLDNN enabled (already supported)
> - Fully reproducible results (medical and financial sector requested that
> and we have some flags for cuda) (The nondeterminism exists even today w/o
> MKLDNN. We should address it regardless of MLKDNN)
>
> Marco, please let us know if your concerns are properly addressed?
>
> Given that MKLDNN gives significant performance speed up in CPU, I am
> inclined to make it default in pip build.
>
> Best,
>
> Lin
>
> On Tue, Nov 19, 2019 at 8:08 AM Chris Olivier 
> wrote:
>
>> Thanks, Patric. I was just trying to point out that there was currently no
>> guarantee of deterministic results without MKL, so there’s not necessarily
>> an expectation of determinism with MKL (ie requirement isn’t relaxed).
>>
>> On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric 
>> wrote:
>>
>> > It may be a concern but little noise can't affect the final results if
>> the
>> > algorithm is stable in numerical.
>> > The MKLDNN backend with mxnet-mkl has been used for 2 years and we
>> didn't
>> > see the coverage issue caused by multiple threading.
>> > In other words, GPU programming mode works well on training where the
>> > non-deterministic also exists from multiple threads.
>> >
>> > Parts of training accuracy was pasted in the first PR when MKLDNN is
>> > integrated.
>> >
>> https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-359674818
>> >
>> > In conclusion, it may happen with very little probability. I believe we
>> > can get a solution in case it happens someday.
>> >
>> > Thanks,
>> >
>> > --Patric
>> >
>> >
>> > > -Original Message-
>> > > From: Chris Olivier 
>> > > Sent: Tuesday, November 19, 2019 11:51 AM
>> > > To: dev@mxnet.incubator.apache.org
>> > > Cc: Tao Lv 
>> > > Subject: Re: Proposal to make MKLDNN as default CPU backend
>> > >
>> > > (for non mkl dropout, for instance)
>> > >
>> > > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier 
>> > > wrote:
>> > >
>> > > > To address the deterministic item, I know for a fact that training
>> > > > will not be deterministic in some cases where the “parallel random”
>> > > > class is utilized in parallel threads, such as OMP, if the number of
>> > > > cores is different, even with the same seed, because threads are
>> > > > seeded independently and different number of threads will end up
>> > > > generating different random number sequences. Dropout operator being
>> > > an example.
>> > > >
>> > > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
>> > > >  wrote:
>> > > >
>> > > >> For AMD CPUs, you’d want to perform validation because now MKL-DNN
>> > > >> would be enabled by default. Historically, other intel libraries
>> > > >> (along with the ICC
>> > > >> compiler) have had performance issues on AMD CPUs. It’s just worth
>> > > >> double checking to make sure that’s not the case here. Perhaps some
>> > > >> MKL-DNN authors can chime in though. It’s not sufficient to double
>> > > >> check that an
>> > > >> AVX2 package passes tests.
>> > > >>
>> > > >> Agreed in the case we’re not releasing ARM binaries.
>> > > >>
>> > > >> The reproducibility argument is around the results being
>> numerically
>> > > >> reproducible. That is, eg; if I train a model with some fixed set
>> of
>> > > >> data, some random seed, etc. and then run inference on it do I get
>> > > >> the exact same floating point values for the weights and results?
>> > > >> Does MxNet already offer this without MKL-DNN?
>> > > 

Re: Proposal to make MKLDNN as default CPU backend

2019-11-19 Thread Lin Yuan
Just to summarize base on the concerns Marco raised and discussed abvove:

- AMD CPU (it should work with MKLDNN:
https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking
)
- ARM CPU (we don't have it today w/o MKLDNN either)
- Windows (Windows support is there regardless of MKLDNN or not)
- GPU and MKLDNN enabled (already supported)
- Fully reproducible results (medical and financial sector requested that
and we have some flags for cuda) (The nondeterminism exists even today w/o
MKLDNN. We should address it regardless of MLKDNN)

Marco, please let us know if your concerns are properly addressed?

Given that MKLDNN gives significant performance speed up in CPU, I am
inclined to make it default in pip build.

Best,

Lin

On Tue, Nov 19, 2019 at 8:08 AM Chris Olivier  wrote:

> Thanks, Patric. I was just trying to point out that there was currently no
> guarantee of deterministic results without MKL, so there’s not necessarily
> an expectation of determinism with MKL (ie requirement isn’t relaxed).
>
> On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric 
> wrote:
>
> > It may be a concern but little noise can't affect the final results if
> the
> > algorithm is stable in numerical.
> > The MKLDNN backend with mxnet-mkl has been used for 2 years and we didn't
> > see the coverage issue caused by multiple threading.
> > In other words, GPU programming mode works well on training where the
> > non-deterministic also exists from multiple threads.
> >
> > Parts of training accuracy was pasted in the first PR when MKLDNN is
> > integrated.
> >
> https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-359674818
> >
> > In conclusion, it may happen with very little probability. I believe we
> > can get a solution in case it happens someday.
> >
> > Thanks,
> >
> > --Patric
> >
> >
> > > -Original Message-
> > > From: Chris Olivier 
> > > Sent: Tuesday, November 19, 2019 11:51 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Cc: Tao Lv 
> > > Subject: Re: Proposal to make MKLDNN as default CPU backend
> > >
> > > (for non mkl dropout, for instance)
> > >
> > > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier 
> > > wrote:
> > >
> > > > To address the deterministic item, I know for a fact that training
> > > > will not be deterministic in some cases where the “parallel random”
> > > > class is utilized in parallel threads, such as OMP, if the number of
> > > > cores is different, even with the same seed, because threads are
> > > > seeded independently and different number of threads will end up
> > > > generating different random number sequences. Dropout operator being
> > > an example.
> > > >
> > > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
> > > >  wrote:
> > > >
> > > >> For AMD CPUs, you’d want to perform validation because now MKL-DNN
> > > >> would be enabled by default. Historically, other intel libraries
> > > >> (along with the ICC
> > > >> compiler) have had performance issues on AMD CPUs. It’s just worth
> > > >> double checking to make sure that’s not the case here. Perhaps some
> > > >> MKL-DNN authors can chime in though. It’s not sufficient to double
> > > >> check that an
> > > >> AVX2 package passes tests.
> > > >>
> > > >> Agreed in the case we’re not releasing ARM binaries.
> > > >>
> > > >> The reproducibility argument is around the results being numerically
> > > >> reproducible. That is, eg; if I train a model with some fixed set of
> > > >> data, some random seed, etc. and then run inference on it do I get
> > > >> the exact same floating point values for the weights and results?
> > > >> Does MxNet already offer this without MKL-DNN?
> > > >>
> > > >> On November 18, 2019 at 6:32:07 PM, Tao Lv (mutou...@gmail.com)
> > > wrote:
> > > >>
> > > >> Regarding the cases listed by Marco:
> > > >> - AMD CPU
> > > >> From my architecture knowledge, what works on C4 instances (with
> AVX2
> > > >> support) should also work well on m5a, right? I think mxnet-mkl and
> > > >> mxnet-cuxxmkl packages have been fully validated on AVX2 machines.
> > > >> Also, we didn't perform any validation on AMD CPU before, why we
> need
> > > >> do that for this time?
> > > >>
> > > >> - ARM CPU
> > > >> I don't know we're releasing any convenience binaries for ARM CPU.
> > > >> This proposal mainly targets those pypi packages.
> > > >>
> > > >> - Windows
> > > >> Already validated by CI. We're also releasing mxnet-mkl packages for
> > Win.
> > > >>
> > > >> - GPU and MKLDNN enabled
> > > >> Already validated by CI and mxnet-cuxxmkl packages have been
> released
> > > >> for several versions.
> > > >>
> > > >> - Fully reproducible results (medical and financial sector requested
> > > >> that and we have some flags for cuda) Not sure I understand this
> > > >> case. We already have MKL-DNN backend for a while. Functionality and
> > > >> correctness of it have been verified by MXNet users.
> > > >>
> > > >> -tao
> > > >>
> > > >> On Tue, Nov 19, 2019 at 4:41 AM 

Re: [apache/incubator-mxnet] [RFC] Unified API for Distributed Data Parallel Training (#16795)

2019-11-12 Thread Lin Yuan
In the Limitation, I suppose you meant 'use case 1,3,4', right?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16795#issuecomment-553085374

Re: BytePS-MXNet Integration

2019-11-09 Thread Lin Yuan
Very interesting proposal. I have tried BytePS on some examples and did see
better performance than Horovod. I look forward to this integration and
feel free to let the community know if any help is needed.

Lin


Re: ONNX Support

2019-10-07 Thread Lin Yuan
Hi Anirudh,

Could you provide more exact data points regarding the ONNX usage and MXNet
version? If no one is actively maintaining ONNX any more, I don't see a
compelling reason for an engineer to spend quality time to fix an ONNX test
in order for his/her PRs to move forward.

Lin

On Mon, Oct 7, 2019 at 1:19 PM Skalicky, Sam 
wrote:

> Hi Chai,
>
> If there is no one maintaining MXNet-ONNX support (or no one currently
> available to help debug issues), then we shouldn’t block forward progress
> because of failing ONNX tests.
>
> It would be great if someone wanted to work with Chai to debug the failing
> tests. But I do not see any forward plans/proposals to continue to develop
> or even just maintain the current ONNX support.
>
> Anirudh, if you can point those who are willing to maintain the ONNX
> support to the issue Chai mentioned that would be a good place to start.
> But if not, we should help Chai continue the great work he’s doing by
> disabling the failing tests (like we normally do for any failing/flaky
> tests already)
>
> Sam
>
> > On Oct 7, 2019, at 12:45 PM, Anirudh Acharya 
> wrote:
> >
> > Hi Chaitanya,
> >
> > The last I checked( a couple of months back) there are a few
> > customers/users of MXNet in Amazon who use ONNX in production.
> >
> > The last commit for ONNX module was on Aug 29th
> > - b7cca015553d707cd1c4ce292826d7311309419c
> >
> > So IMO disabling any of the tests is not a good idea.
> >
> >
> > Thanks
> > Anirudh
> >
> >
> > On Mon, Oct 7, 2019 at 12:27 PM Chaitanya Bapat 
> > wrote:
> >
> >> Hello MXNet community,
> >>
> >> I wanted to know if MXNet should continue support for ONNX. Is there
> anyone
> >> actively working on MXNet ONNX or maintaining it?
> >>
> >> If not, can we skip/disable the ONNX tests from the CI.
> >> Reason - Whilst working on a Transpose operator PR [1], I encountered
> >> failure for ONNX [2]. Given operator passes rest of the CI pipeline
> tests.
> >> I am able to reproduce the error. However, the root cause for ONNX model
> >> failure couldn't be found. Moreover, there seems to be near zero
> activity
> >> as far as PR check-ins are concerned.
> >>
> >> How does ONNX fit in for MXNet going forward?
> >> Thank you
> >> Chai
> >>
> >>
> >> [1] https://github.com/apache/incubator-mxnet/pull/16104
> >> [2]
> >>
> >>
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-16104/14/pipeline
> >>
> >> --
> >> *Chaitanya Prakash Bapat*
> >> *+1 (973) 953-6299*
> >>
> >> [image: https://www.linkedin.com//in/chaibapat25]
> >> [image:
> https://www.facebook.com/chaibapat
> >> ]
> >> [image:
> >> https://twitter.com/ChaiBapchya]  >[image:
> >> https://www.linkedin.com//in/chaibapat25]
> >> 
> >>
>
>


Re: Update for 1.5.1 patch release

2019-09-28 Thread Lin Yuan
Ping @Sheng and @Lai who released 1.5.0 for help.

Could you please update the Release Process doc after you find the right
answers to these questions?

Thanks,

Lin

On Sat, Sep 28, 2019 at 7:49 AM Tao Lv  wrote:

> Hi dev,
>
> I'm glad to say that the rc0 of 1.5.1 patch release has passed the vote on
> general@. Please find the voting thread at:
>
> https://lists.apache.org/thread.html/282f7911768dab61ddf8f70adcce34ef0afb285046093b3ff0bafb7e@%3Cgeneral.incubator.apache.org%3E
>
>
> Now I'm proceeding the release process and have several questions there.
> Hope someone can help to answer:
>
> 1. Change the 1.5.1.rc0 tag to formal 1.5.1. Seems the step 3.1.1 on the
> cwiki page [1] doesn't work. It says:
>
> "Go to the GitHub repo’s “releases” tab
>
> Click “Draft a new release”
>
> Provide the release tag in the form of “..”
> Select the commit by clicking Target: master > the passing release
> candidate tag"
>
> But I cannot find "the passing release candidate tag" in the drop list.
> There're branches and recent commits. Once I create a new tag, how about
> the old tag of 1.5.1.rc0?
>
> 2. Step 3.1.2 is also confusing to me. Not sure what should be done at step
> 3 and what need to be uploaded at step 4.
>
> 3. As we have a new website now, I guess there are some changes for the
> step 3.2. Can anyone help to clarify this?
>
> Thanks,
> -tao
>
> [1] https://cwiki.apache.org/confluence/display/MXNET/Release+Process
>


Re: new website, docs code freeze

2019-09-20 Thread Lin Yuan
Looks very neat. Thank you Aaron and many others for launching this!

On Fri, Sep 20, 2019 at 7:31 AM Carin Meier  wrote:

> Nice!!! Congrats everyone!
>
> On Fri, Sep 20, 2019 at 10:28 AM Aaron Markham 
> wrote:
>
> > Alrighty! The new site is launched. You might need to clear your cache.
> >
> > Cheers,
> > Aaron
> >
> > On Thu, Sep 19, 2019 at 3:33 PM Aaron Markham  >
> > wrote:
> > >
> > > Thanks everyone. The PRs passed CI, but please continue holding off on
> > > docs and CI edits. Unless there are any objections, I'd like to launch
> > > the new website today.
> > >
> > > On Wed, Sep 18, 2019 at 7:46 AM Aaron Markham <
> aaron.s.mark...@gmail.com>
> > wrote:
> > > >
> > > > Hi everyone,
> > > > The last two PRs [1][2] for the new website and docs have passed CI
> > > > (finally). Please do not make changes to /docs or /ci until we get
> > > > these approved and merged. Every time there's a merge conflict it has
> > > > set us back a day or two while shepherding the PRs through CI again.
> > > > Unless there are catastrophic issues discovered in a review, I
> > > > recommend that we hold any patches or updates to the PRs to follow-up
> > > > PRs.
> > > >
> > > > There are four steps to launch:
> > > > 1. Once the PRs are approved, the plan is to merge 15885 to delete
> the
> > > > old content first.
> > > > 2. Then immediately merge 15883 to add in the new CI flows and
> updates
> > > > to the content Thomas and I have already had merged in 15884 [3].
> > > > 3. I will change the website validation Jenkins pipeline to point to
> > > > the new pipeline.
> > > > 4. I will change the website publishing Jenkins pipeline to point to
> > > > its new pipeline as well. Once triggered, the old site will be
> > > > replaced with the new one.
> > > >
> > > > Post launch we'll need to update the DNS for beta.mxnet.io to point
> to
> > > > production, and there will likely be some redirect/.htaccess updates
> > > > needed next week to assist with any deep linking and 404 issues that
> > > > pop up.
> > > >
> > > > Cheers,
> > > > Aaron
> > > >
> > > > [1] https://github.com/apache/incubator-mxnet/pull/15885
> > > > [2] https://github.com/apache/incubator-mxnet/pull/15883
> > > > [3] https://github.com/apache/incubator-mxnet/pull/15884
> >
>


Re: [VOTE] Release Apache MXNet (incubating) 1.5.1.rc0

2019-09-19 Thread Lin Yuan
+1
Tested Horovod on GPU

On Wed, Sep 18, 2019 at 6:16 AM Zhao, Patric  wrote:

> +1
>
> Tested MKLDNN backend and everything looks great.
>
> > -Original Message-
> > From: Qing Lan 
> > Sent: Wednesday, September 18, 2019 2:20 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: [VOTE] Release Apache MXNet (incubating) 1.5.1.rc0
> >
> > +1 for Scala/Java test. Passed all tests for CPU/GPU build.
> > Also tested build from source with static build.
> >
> > Thanks,
> > Qing
> > 
> > From: Tao Lv 
> > Sent: Tuesday, September 17, 2019 14:14
> > To: dev@mxnet.incubator.apache.org 
> > Subject: [VOTE] Release Apache MXNet (incubating) 1.5.1.rc0
> >
> > Dear MXNet community,
> >
> >
> >
> > This is the 3-day vote to release Apache MXNet (incubating) version
> 1.5.1.
> >
> > Voting on dev@ will start September 17, 12:00pm (PST)  and close on
> > September 20, 12:00pm (PST).
> >
> >
> >
> > 1) Link to release notes:
> >
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Notes
> >
> >
> >
> > 2) Link to release candidate:
> >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.5.1.rc0
> >
> >
> >
> > 3) Link to source and signatures on Apache dist server:
> >
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.1.rc0/
> >
> >
> >
> > Please remember to TEST first before voting accordingly:
> >
> > +1 = approve
> >
> > +0 = no opinion
> >
> > -1 = disapprove (provide reason)
> >
> >
> >
> > Thanks,
> >
> > -tao
>


Re: Code freeze for 1.5.1 patch release

2019-09-17 Thread Lin Yuan
Hi Tao,

If the voting is on the source and not requiring the language bindings, can
we do the voting in parallel with the release?

Carin/Qing, please help to clarify if the R/Clojure language bindings are
required for voting.

Thanks,

Lin

On Tue, Sep 17, 2019 at 7:32 AM Tao Lv  wrote:

> Thanks for your help, Carin. Per step 1.14 of  the release process [1],
> stage repo links should be included in the voting email.
>
> Scala packages are done. Thanks to the help from @Lanking.
>
> We still have problem with R package on Windows. @yajiedesign is helping on
> that.
>
> Again thanks for all of your support and patience.
>
> -tao
>
> [1] https://cwiki.apache.org/confluence/display/MXNET/Release+Process
>
> On Tue, Sep 17, 2019 at 6:10 PM Carin Meier  wrote:
>
> > I will be able to build the Clojure packages on Friday. I don’t believe
> > this needs to hold up the voting. I believe the voting is only on the
> > source.
> >
> > -Carin
> >
> > On Mon, Sep 16, 2019 at 7:05 PM Lin Yuan  wrote:
> >
> > > Hi Tao,
> > >
> > > Thanks for uploading the artifacts. May I know what the current status
> of
> > > Scala, Clojure and R packages and any help you need from the community
> to
> > > complete?
> > >
> > > Thanks,
> > >
> > > Lin
> > >
> > > On Fri, Sep 6, 2019 at 7:35 AM Tao Lv  wrote:
> > >
> > > > Update:
> > > >
> > > > Artifacts of 1.5.1.rc0 have been uploaded to github and Apache dist.
> > > Before
> > > > voting, we still need some time to build packages for Scala, Clojure
> > and
> > > R.
> > > >
> > > > Thank you for your patience.
> > > >
> > > > -tao
> > > >
> > > > On Thu, Sep 5, 2019 at 10:15 PM Tao Lv  wrote:
> > > >
> > > > >
> > > > > Following the release process [1], I just created the tag for
> > 1.5.1.rc0
> > > > > [2]. Artifacts uploading and validation are still WIP. Will keep
> you
> > > > > posted. Hopefully we can start the veto soon for a new release. :)
> > > > >
> > > > > Let me know if you any question or suggestion for the release.
> > > > >
> > > > > Thanks,
> > > > > -tao
> > > > >
> > > > > [1]
> > https://cwiki.apache.org/confluence/display/MXNET/Release+Process
> > > > > [2]
> https://github.com/apache/incubator-mxnet/releases/tag/1.5.1.rc0
> > > > >
> > > > >
> > > > > On Wed, Sep 4, 2019 at 9:23 AM Tao Lv  wrote:
> > > > >
> > > > >>
> > > > >> Code freezing!
> > > > >>
> > > > >> If you happen to be around github, please help to review the PR
> [1]
> > > for
> > > > >> bumping version strings on the release branch. Thanks.
> > > > >>
> > > > >> I will continue working on the rest steps for the release.
> > > > >>
> > > > >> Thanks,
> > > > >> -tao
> > > > >>
> > > > >> [1] https://github.com/apache/incubator-mxnet/pull/16072
> > > > >>
> > > > >> On Mon, Sep 2, 2019 at 9:51 PM Tao Lv  wrote:
> > > > >>
> > > > >>>
> > > > >>> I drafted the release notes for 1.5.1 patch release:
> > > > >>>
> > > https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Notes
> > > > >>>
> > > > >>> Any comments or suggestions are highly appreciated!
> > > > >>>
> > > > >>> -tao
> > > > >>>
> > > > >>> On Mon, Sep 2, 2019 at 2:00 PM kellen sunderland <
> > > > >>> kellen.sunderl...@gmail.com> wrote:
> > > > >>>
> > > > >>>> Thanks for organizing the release Tao.
> > > > >>>>
> > > > >>>> On Sun, Sep 1, 2019, 5:53 PM Tao Lv  wrote:
> > > > >>>>
> > > > >>>> > Hi Community,
> > > > >>>> >
> > > > >>>> > Code freeze for 1.5.1 patch release will be 9/3 6pm PST (9/4
> 9am
> > > > >>>> CST). If
> > > > >>>> > you have any additional fix in progress and would like to
> > include
> > > it
> > > > >>>> in
> > > > >>>> > this release, please assure they have been merged before code
> > > > freeze.
> > > > >>>> >
> > > > >>>> > Thanks for all your support and contribution.
> > > > >>>> >
> > > > >>>> > -tao
> > > > >>>> >
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
>


Re: Code freeze for 1.5.1 patch release

2019-09-16 Thread Lin Yuan
Hi Tao,

Thanks for uploading the artifacts. May I know what the current status of
Scala, Clojure and R packages and any help you need from the community to
complete?

Thanks,

Lin

On Fri, Sep 6, 2019 at 7:35 AM Tao Lv  wrote:

> Update:
>
> Artifacts of 1.5.1.rc0 have been uploaded to github and Apache dist. Before
> voting, we still need some time to build packages for Scala, Clojure and R.
>
> Thank you for your patience.
>
> -tao
>
> On Thu, Sep 5, 2019 at 10:15 PM Tao Lv  wrote:
>
> >
> > Following the release process [1], I just created the tag for 1.5.1.rc0
> > [2]. Artifacts uploading and validation are still WIP. Will keep you
> > posted. Hopefully we can start the veto soon for a new release. :)
> >
> > Let me know if you any question or suggestion for the release.
> >
> > Thanks,
> > -tao
> >
> > [1] https://cwiki.apache.org/confluence/display/MXNET/Release+Process
> > [2] https://github.com/apache/incubator-mxnet/releases/tag/1.5.1.rc0
> >
> >
> > On Wed, Sep 4, 2019 at 9:23 AM Tao Lv  wrote:
> >
> >>
> >> Code freezing!
> >>
> >> If you happen to be around github, please help to review the PR [1] for
> >> bumping version strings on the release branch. Thanks.
> >>
> >> I will continue working on the rest steps for the release.
> >>
> >> Thanks,
> >> -tao
> >>
> >> [1] https://github.com/apache/incubator-mxnet/pull/16072
> >>
> >> On Mon, Sep 2, 2019 at 9:51 PM Tao Lv  wrote:
> >>
> >>>
> >>> I drafted the release notes for 1.5.1 patch release:
> >>> https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Notes
> >>>
> >>> Any comments or suggestions are highly appreciated!
> >>>
> >>> -tao
> >>>
> >>> On Mon, Sep 2, 2019 at 2:00 PM kellen sunderland <
> >>> kellen.sunderl...@gmail.com> wrote:
> >>>
>  Thanks for organizing the release Tao.
> 
>  On Sun, Sep 1, 2019, 5:53 PM Tao Lv  wrote:
> 
>  > Hi Community,
>  >
>  > Code freeze for 1.5.1 patch release will be 9/3 6pm PST (9/4 9am
>  CST). If
>  > you have any additional fix in progress and would like to include it
>  in
>  > this release, please assure they have been merged before code
> freeze.
>  >
>  > Thanks for all your support and contribution.
>  >
>  > -tao
>  >
> 
> >>>
>


Re: [Announcement] New Committer - Junru Shao

2019-09-08 Thread Lin Yuan
Congratulations!

On Sat, Sep 7, 2019 at 8:14 PM Sheng Zha  wrote:

> Hi all,
>
> Please join me in welcoming Junru Shao as a new committer of Apache MXNet
> (incubating)!
>
> Junru made a number of contributions to this project such as cond and
> while_loop control-flow
> operators, enabling dynamic shape in control-flow ops, and zero-copy array
> creation from numpy
> array. He's also been actively working on numpy-compatible programming
> experience and
> has helped the community recently by driving the 1.4.1 patch release.
>
> Welcome, Junru!
>
> -sz
>


Re: [Discussion] MXNet 1.5.1 release

2019-08-30 Thread Lin Yuan
 >> > > frontend can help to cherry pick the commits to the v1.5.x branch.
> >> > >
> >> > > thanks,
> >> > > -tao
> >> > >
> >> > > On Wed, Aug 28, 2019 at 11:43 PM Aaron Markham <
> >> > > aaron.s.mark...@gmail.com<mailto:aaron.s.mark...@gmail.com>>
> >> > > wrote:
> >> > >
> >> > > I don't see any request for action on the Julia PRs: 5 or 6.
> >> > > We didn't put the change in right away because we wanted it to not
> >> > > break
> >> > > anything. But the changes are needed to make Julia setup more
> >> seamless.
> >> > >
> >> > > What "update" is needed?
> >> > >
> >> > >
> >> > > On Wed, Aug 28, 2019, 08:36 Tao Lv  >> > > ta...@apache.org>> wrote:
> >> > >
> >> > > @Pedro, seems the issue is still open on the master branch. Do you
> >> > > still
> >> > > think we can have your fix on the 1.5.x branch?
> >> > >
> >> > > Progress since last update:
> >> > > 1. We received several more proposals in the github thread [1]. I
> >> > > humbly
> >> > > ask the reporters to pick the fixes to the v1.5.x. I will keep
> >> > > tracking
> >> > > the
> >> > > progress and the healthy status of the release branch.
> >> > > 2. Thanks to @Lai, the licence issue of julia cat image was fixed on
> >> > > the
> >> > > master branch and I opened a PR to pick it to v1.5.x [2].
> >> > > 3. The GPU OOM issue was fixed on the master branch by @Lin [3] .
> But
> >> > > there
> >> > > is a problem with porting the fix to v1.5.x branch [4].
> >> > >
> >> > > Opens:
> >> > > 1. https://github.com/apache/incubator-mxnet/pull/15803 still can
> >> > > not
> >> > > pass
> >> > > the CI;
> >> > > 2. Call for a update from julia folks about the back porting for [5]
> >> > > and
> >> > > [6]
> >> > > 3. License issue of cub and pybind is still open. @Lai opened a PR
> >> > > [7]
> >> > > to
> >> > > update cub submodule but seems it need more effort than just commit
> >> > > id
> >> > > update. I suspect that we cannot finish this work in 1.5.1 patch
> >> > > release.
> >> > > 4. Still no progress for the sidebar issue on web page [8].
> >> > > 5. Call for a conclusion about fixing the GPU OOM issue in 1.5.1
> >> > >
> >> > > Besides, I would like to ask if there is any preference for the
> >> > > release
> >> > > timeline of 1.5.1 patch release? Please share so I can propose the
> >> > > time
> >> > > for
> >> > > code freeze.
> >> > >
> >> > > Thanks,
> >> > > -tao
> >> > >
> >> > > [1]  https://github.com/apache/incubator-mxnet/issues/15613.
> >> > > [2] https://github.com/apache/incubator-mxnet/pull/16026
> >> > > [3] https://github.com/apache/incubator-mxnet/pull/15948
> >> > > [4] https://github.com/apache/incubator-mxnet/pull/15999
> >> > > [5] https://github.com/apache/incubator-mxnet/pull/15609
> >> > > [6]  https://github.com/apache/incubator-mxnet/pull/15608
> >> > > [7] https://github.com/apache/incubator-mxnet/pull/15963
> >> > > [8] https://github.com/apache/incubator-mxnet/issues/15200
> >> > >
> >> > > On Wed, Aug 28, 2019 at 5:50 AM Pedro Larroy <
> >> > > pedro.larroy.li...@gmail.com<mailto:pedro.larroy.li...@gmail.com>
> >> > >
> >> > > wrote:
> >> > >
> >> > > Ok. I was just asking if we want this fix in 1.5.1 since it
> >> > > addresses
> >> > > crashes using multiprocessing. The problem with cherry picking is
> >> > > that
> >> > > the
> >> > > patch contains the dynamic load change which shouldn't impact
> >> > > anything
> >> > > else
> >> > > but is not supposed to go in a release branch.
> >> > >
> >> > > On Tue, Aug 27, 2019 at 1:19 PM Lin Yuan   >> > > apefor...@gmail.com>>
> >> > > wrote:
> >> 

Re: [Discussion] MXNet 1.5.1 release

2019-08-29 Thread Lin Yuan
Hi Tao,

What is the current timeline for 1.5.1 release? Since it is a patch release
to include only critical bug fix, would it make sense to have a short
release time? I propose to have code freeze as early as next week. Please
let me know if there is any other comments.

Best,

Lin

On Thu, Aug 29, 2019 at 3:23 PM Lin Yuan  wrote:

> Hi Tao,
>
> 5) is not a bug. It's just a large tensor support requirement. The PR was
> to fix a memory alignment issue introduced in master but not in 1.5.1
> (since you did not cherry pick that PR). So, I have crossed out 5) in the
> doc and I don't think we need to mention it in release note.
>
> Lin
>
> On Thu, Aug 29, 2019 at 8:12 AM Tao Lv  wrote:
>
>> @Aaron,
>> Thank you for looking into these two issues. I have removed the #15609
>> from
>> the scope of 1.5.1. Please let me know if you have any update about
>> #15608.
>>
>> @Lai,
>> I'm fine with the decision. License issue about MKL-DNN, cub and pybind is
>> moved to next release.
>>
>> @Sam,
>> I also removed the sidebar issue [3] from the scope of 1.5.1. Besides, I
>> notice one of your cherry picks is stopped by the CI. Please take a look
>> at
>> it. Thanks.
>>
>> *Nice progress since the last update:*
>> 1. Per the discussion, we decided to remove #15609, the license issue
>> about
>> MKL-DNN, cub and pybind, and the sidebar issue [3] from the scope of 1.5.1
>> patch release;
>> 2. 3 fixes [4] [5] [6] were merged into the v1.5.x branch.
>>
>> *Opens (suggested owners are highlighted):*
>> 1. @Aaron is working on #15608 to see if we can have it in v1.5.x;
>> 2. Two cherry pick PRs [7] [8] cannot pass the CI. I have pinged the
>> authors to take a look at the CI failures.
>> 3. @Kellen proposed 5 fixes [9] for TensorRT but till now only 3 are
>> picked
>> to v1.5.x. Please help to confirm if the other 2 are still needed.
>> 4. Sorry that I missed the proposal for fixing the nightly build [10] in
>> previous update. @Lai, can you help to confirm if it's still valid?
>> 5. @Lin please help to make a conclusion for the GPU OOM issue caused by
>> topk regression [11]. If it cannot be addressed on v1.5.x branch, I will
>> remove it from the scope of this release and mark it as a known issue in
>> the release note.
>>
>> Please find the details in
>>
>> https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Plan+and+Status
>> .
>>
>> Thanks,
>> -tao
>>
>> [1] https://github.com/apache/incubator-mxnet/pull/15609
>> [2] https://github.com/apache/incubator-mxnet/pull/15608
>> [3] https://github.com/apache/incubator-mxnet/issues/15200
>> [4] https://github.com/apache/incubator-mxnet/pull/16029
>> [5] https://github.com/apache/incubator-mxnet/pull/16026
>> [6] https://github.com/apache/incubator-mxnet/pull/16028
>> [7] https://github.com/apache/incubator-mxnet/pull/15803
>> [8] https://github.com/apache/incubator-mxnet/pull/16027
>> [9]
>>
>> https://github.com/apache/incubator-mxnet/issues/15613#issuecomment-520688668
>> [10]
>>
>> https://github.com/apache/incubator-mxnet/issues/15613#issuecomment-516937546
>> [11] https://github.com/apache/incubator-mxnet/issues/15703
>>
>>
>>
>> On Thu, Aug 29, 2019 at 1:06 AM Skalicky, Sam > >
>> wrote:
>>
>> > Hi Tao,
>> >
>> > I just talked with Aaron, lets leave the sidebar issue for later.
>> >
>> > I created PRs in the v1.5.x branch to cherry pick the fixes into the
>> 1.5.1
>> > release:
>> > https://github.com/apache/incubator-mxnet/pull/16027
>> > https://github.com/apache/incubator-mxnet/pull/16028
>> >
>> > Thanks for your work on this release!
>> > Sam
>> >
>> > On Aug 28, 2019, at 9:35 AM, Lai Wei > > roywei...@gmail.com>> wrote:
>> >
>> > Hi,
>> >
>> > Regrading the license issue[1],  we still have item 3, 4, 5 left.
>> > I think it's better to remove them from 1.5.1 release scope and target
>> for
>> > 1.6.0 as it need more time and requires changes that should not go into
>> > patch release.
>> >
>> >
>> > [1] https://github.com/apache/incubator-mxnet/issues/15542
>> >
>> > Best Regards
>> >
>> > Lai
>> >
>> >
>> > On Wed, Aug 28, 2019 at 9:20 AM Aaron Markham <
>> aaron.s.mark...@gmail.com
>> > <mailto:aaron.s.mark...@gmail.com>>
>> > wrote:
>> >
>> > 5 no. Install page defaults to 

Re: [Discussion] MXNet 1.5.1 release

2019-08-29 Thread Lin Yuan
ron.s.mark...@gmail.com<mailto:aaron.s.mark...@gmail.com>>
> > wrote:
> >
> > I don't see any request for action on the Julia PRs: 5 or 6.
> > We didn't put the change in right away because we wanted it to not
> > break
> > anything. But the changes are needed to make Julia setup more seamless.
> >
> > What "update" is needed?
> >
> >
> > On Wed, Aug 28, 2019, 08:36 Tao Lv  > ta...@apache.org>> wrote:
> >
> > @Pedro, seems the issue is still open on the master branch. Do you
> > still
> > think we can have your fix on the 1.5.x branch?
> >
> > Progress since last update:
> > 1. We received several more proposals in the github thread [1]. I
> > humbly
> > ask the reporters to pick the fixes to the v1.5.x. I will keep
> > tracking
> > the
> > progress and the healthy status of the release branch.
> > 2. Thanks to @Lai, the licence issue of julia cat image was fixed on
> > the
> > master branch and I opened a PR to pick it to v1.5.x [2].
> > 3. The GPU OOM issue was fixed on the master branch by @Lin [3] . But
> > there
> > is a problem with porting the fix to v1.5.x branch [4].
> >
> > Opens:
> > 1. https://github.com/apache/incubator-mxnet/pull/15803 still can
> > not
> > pass
> > the CI;
> > 2. Call for a update from julia folks about the back porting for [5]
> > and
> > [6]
> > 3. License issue of cub and pybind is still open. @Lai opened a PR
> > [7]
> > to
> > update cub submodule but seems it need more effort than just commit
> > id
> > update. I suspect that we cannot finish this work in 1.5.1 patch
> > release.
> > 4. Still no progress for the sidebar issue on web page [8].
> > 5. Call for a conclusion about fixing the GPU OOM issue in 1.5.1
> >
> > Besides, I would like to ask if there is any preference for the
> > release
> > timeline of 1.5.1 patch release? Please share so I can propose the
> > time
> > for
> > code freeze.
> >
> > Thanks,
> > -tao
> >
> > [1]  https://github.com/apache/incubator-mxnet/issues/15613.
> > [2] https://github.com/apache/incubator-mxnet/pull/16026
> > [3] https://github.com/apache/incubator-mxnet/pull/15948
> > [4] https://github.com/apache/incubator-mxnet/pull/15999
> > [5] https://github.com/apache/incubator-mxnet/pull/15609
> > [6]  https://github.com/apache/incubator-mxnet/pull/15608
> > [7] https://github.com/apache/incubator-mxnet/pull/15963
> > [8] https://github.com/apache/incubator-mxnet/issues/15200
> >
> > On Wed, Aug 28, 2019 at 5:50 AM Pedro Larroy <
> > pedro.larroy.li...@gmail.com<mailto:pedro.larroy.li...@gmail.com>
> >
> > wrote:
> >
> > Ok. I was just asking if we want this fix in 1.5.1 since it
> > addresses
> > crashes using multiprocessing. The problem with cherry picking is
> > that
> > the
> > patch contains the dynamic load change which shouldn't impact
> > anything
> > else
> > but is not supposed to go in a release branch.
> >
> > On Tue, Aug 27, 2019 at 1:19 PM Lin Yuan  > apefor...@gmail.com>>
> > wrote:
> >
> > https://github.com/apache/incubator-mxnet/pull/15762  contains
> > some
> > unrelated changes which is being reverted. Please do not cherry
> > pick
> > it
> > yet.
> >
> > On Mon, Aug 26, 2019 at 4:25 PM Pedro Larroy <
> > pedro.larroy.li...@gmail.com<mailto:pedro.larroy.li...@gmail.com>
> >
> > wrote:
> >
> > There's a fix that I did which seems to still produce crashes
> > in
> > 1.5
> > for
> > some users, which I got notice today and is fixed in master.
> >
> > Might be useful to put in 1.5.1:
> > https://github.com/apache/incubator-mxnet/pull/15762   ?
> >
> > Pedro.
> >
> > On Tue, Aug 20, 2019 at 7:49 AM Tao Lv  > ta...@apache.org>>
> > wrote:
> >
> > Hi dev,
> >
> > Here is an update for the 1.5.1 patch release.
> >
> > 1. Thanks for the effort from whole community, we have cherry
> > picked
> > a
> > bunch of fixes to v1.5.x branch. So far, the branch looks
> > healthy:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/activity/
> > 2. https://github.com/apache/incubator-mxnet/pull/15803
> > cannot
> > pass
> > the
> > CI;
> > 3. I hope julia folks can take a look at the back p

Re: [Discussion] MXNet 1.5.1 release

2019-08-27 Thread Lin Yuan
https://github.com/apache/incubator-mxnet/pull/15762  contains some
unrelated changes which is being reverted. Please do not cherry pick it yet.

On Mon, Aug 26, 2019 at 4:25 PM Pedro Larroy 
wrote:

> There's a fix that I did which seems to still produce crashes in 1.5 for
> some users, which I got notice today and is fixed in master.
>
> Might be useful to put in 1.5.1:
> https://github.com/apache/incubator-mxnet/pull/15762   ?
>
> Pedro.
>
> On Tue, Aug 20, 2019 at 7:49 AM Tao Lv  wrote:
>
> > Hi dev,
> >
> > Here is an update for the 1.5.1 patch release.
> >
> > 1. Thanks for the effort from whole community, we have cherry picked a
> > bunch of fixes to v1.5.x branch. So far, the branch looks healthy:
> >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/activity/
> > 2. https://github.com/apache/incubator-mxnet/pull/15803 cannot pass the
> > CI;
> > 3. I hope julia folks can take a look at the back porting for
> > https://github.com/apache/incubator-mxnet/pull/15609 and
> > https://github.com/apache/incubator-mxnet/pull/15608 - do we still need
> > them?
> > 4. License issue of cub and pybind is still not fixed. We also has a
> > license issue of a cat image in julia examples.
> > https://github.com/apache/incubator-mxnet/issues/15542
> > 5. Still no progress for the sidebar issue:
> > https://github.com/apache/incubator-mxnet/issues/15200
> > 6. There is a GPU OOM issue in 1.5.0 release and already root caused by
> > Lin:
> >
> >
> https://github.com/apache/incubator-mxnet/issues/15703#issuecomment-522780492
> > .
> > We need decide whether we want to get it fixed in the 1.5.1 patch
> release.
> >
> > Please find details in
> >
> >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Plan+and+Status
> > .
> >
> > Thanks,
> > -tao
> >
> > On Mon, Aug 12, 2019 at 9:57 PM Zhao, Patric 
> > wrote:
> >
> > > Thanks for the explanation, Marco & Tao. Sounds great!
> > >
> > > > -Original Message-
> > > > From: Tao Lv 
> > > > Sent: Monday, August 12, 2019 9:54 PM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Subject: Re: [Discussion] MXNet 1.5.1 release
> > > >
> > > > > Regarding the open issue, is there default code owner/maintainer?
> If
> > > > > so, he/she will be the right people to look into the issue.
> > > > > https://github.com/apache/incubator-mxnet/blob/master/CODEOWNERS
> > > > >
> > > >
> > > > I have no idea. But the CODEOWNERS is used to receive change
> > > notificaitons,
> > > > not actually indicates the maintainer of a piece of code.
> > > >
> > > > Do we have regularly build, run, functionality and performance
> testing
> > > for
> > > > > this release?
> > > >
> > > >
> > > > As Marco mentioned, build, run and functionality of v1.5.x branch are
> > > tracked
> > > > automatically by the CI for each cherry pick pull request and the
> > > nightly tests
> > > > here:
> > > > http://jenkins.mxnet-ci.amazon-
> > > > ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/activity.
> > > > I see it's healthy so far.
> > > >
> > > > For performance, Shufan will track CPU performance with his test
> suite
> > > and
> > > > send out the report once the branch is frozen. I'm not sure if there
> > are
> > > any
> > > > other performance tests.
> > > >
> > > > On Mon, Aug 12, 2019 at 9:36 PM Marco de Abreu
> > > > 
> > > > wrote:
> > > >
> > > > > Hi Patric,
> > > > >
> > > > > CI should automatically pick up the branch and validate it as
> usual.
> > > > >
> > > > > Best regards,
> > > > > Marco
> > > > >
> > > > > Zhao, Patric  schrieb am Mo., 12. Aug.
> 2019,
> > > 15:22:
> > > > >
> > > > > > It's great works, Tao 
> > > > > >
> > > > > > Regarding the open issue, is there default code owner/maintainer?
> > If
> > > > > > so, he/she will be the right people to look into the issue.
> > > > > > https://github.com/apache/incubator-
> > > > mxnet/blob/master/CODEOWNERS
> > > > > >
> > > > > > Do we have regularly build, run, functionality and performance
> > > > > > testing
> > > > > for
> > > > > > this release?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > --Patric
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: Tao Lv 
> > > > > > > Sent: Monday, August 12, 2019 8:59 PM
> > > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > > Subject: Re: [Discussion] MXNet 1.5.1 release
> > > > > > >
> > > > > > > Update:
> > > > > > >
> > > > > > > We're cherry picking fixes from the master to the v1.5.x
> branch.
> > > > > > > Some
> > > > > of
> > > > > > > them are already merged. Please find details on the cwiki page:
> > > > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Pl
> > > > > > > an+a
> > > > > > > nd+Status
> > > > > > >
> > > > > > >
> > > > > > >  There are still 3 opens:
> > > > > > > 1. Nightly test failure on CI (
> > > > > > > https://github.com/apache/incubator-mxnet/issues/15374): The
> > issue
> > > > > > > is
> > > > > > still
> > > > > > > 

Re: [Discuss] MXNet Python 2 Support Deprecation

2019-07-19 Thread Lin Yuan
+1

On Fri, Jul 19, 2019 at 12:03 AM Chaitanya Bapat 
wrote:

> +1 definitely.
>
> Going forward,
> MXNet repo as it stands has ~95,000+ lines of Python code [1]
> OpenEdx has a million (10x) LOC and this mammoth effort of porting from
> Python 2 to 3 is treated as a separate project named Incremental
> Improvement. [2]
> We can take inspiration from them and have a similar effort by calling
> action from the community. Issues can be maintained in a separate JIRA
> board to track high priority tasks.
>
> Also, I can see gluon-nlp adding themselves to the Python3 statement. Once
> the vote passes, one of us could submit a PR to add MXNet as well.
>
> [1] https://codeclimate.com/
> [2]
> https://open.edx.org/blog/python-2-is-ending-we-need-to-move-to-python-3/
>
>
> On Thu, 18 Jul 2019 at 21:39, Kshitij Kalambarkar <
> kshitijkalambar...@gmail.com> wrote:
>
> > +1
> >
> > On Fri, Jul 19, 2019, 04:28 Pedro Larroy 
> > wrote:
> >
> > > Seems 3.6 is a reasonable choice.
> > >
> > > On Thu, Jul 18, 2019 at 2:15 PM Marco de Abreu <
> marco.g.ab...@gmail.com>
> > > wrote:
> > > >
> > > > Looking at EOL is certainly a good idea! I think once we get closer
> to
> > > > deprecation, we can check adoption statistics to make a well-informed
> > > > decision that gives us the most advantages without dropping the ball
> > on a
> > > > majority of users (or supporting a branch that is going EOL soon). A
> > > survey
> > > > from 2018 [1] determined the following distribution:
> > > > 3.5: 11%
> > > > 3.6: 54%
> > > > 3.7: 30%
> > > >
> > > > Deprecation for 3.5 is scheduled for 2020-09-13 [2]. Deprecation for
> > 3.6
> > > is
> > > > scheduled for 2021-12-23 [2].Deprecation for 3.7 is scheduled
> > > > for 2023-06-27 [2].
> > > >
> > > > Following the trend, I'd say that it would be a decision between
> Python
> > > 3.6
> > > > and 3.7. Later on, I'd propose to check recent surveys and also have
> a
> > > > separate thread to determine if there's anything we're missing (e.g.
> a
> > > big
> > > > company being unable to use Python 3.7). What do you think?
> > > >
> > > > Best regards,
> > > > Marco
> > > >
> > > > [1]:
> https://www.jetbrains.com/research/python-developers-survey-2018/
> > > > [2]: https://devguide.python.org/#status-of-python-branches
> > > >
> > > > On Thu, Jul 18, 2019 at 9:42 PM Yuan Tang 
> > > wrote:
> > > >
> > > > > I would suggest supporting Python 3.5+ since the earlier versions
> > have
> > > > > reached end-of-life status:
> > > > > https://devguide.python.org/devcycle/#end-of-life-branches
> > > > >
> > > > > On Thu, Jul 18, 2019 at 3:36 PM Pedro Larroy <
> > > pedro.larroy.li...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > This would simplify CI, reduce costs and more. I think a followup
> > > > > > question is what would be the mininum Python3 version supported?
> > > > > > Depending on that we might be able to use type annotations for
> > > example
> > > > > > or other features.
> > > > > >
> > > > > > Pedro.
> > > > > >
> > > > > > On Thu, Jul 18, 2019 at 12:07 PM Yuan Tang <
> > terrytangy...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > On Thu, Jul 18, 2019 at 2:51 PM Yuxi Hu 
> > > wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > On Thu, Jul 18, 2019 at 11:31 AM Tong He <
> hetong...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > +1
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > >
> > > > > > > > > Tong He
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Jake Lee  于2019年7月18日周四 上午11:29写道:
> > > > > > > > >
> > > > > > > > > > +1
> > > > > > > > > >
> > > > > > > > > > On Thu, Jul 18, 2019 at 11:27 AM Junru Shao <
> > > > > > junrushao1...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jul 18, 2019 at 11:12 AM Anirudh Acharya <
> > > > > > > > > anirudhk...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jul 18, 2019 at 11:03 AM Marco de Abreu <
> > > > > > > > > > marco.g.ab...@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > +1
> > > > > > > > > > > > >
> > > > > > > > > > > > > -Marco
> > > > > > > > > > > > >
> > > > > > > > > > > > > Sheng Zha  schrieb am Do.,
> 18.
> > > Juli
> > > > > > 2019,
> > > > > > > > > > 19:59:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Dear MXNet community,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'd like to reopen the discussion on deprecating
> > > python2
> > > > > > > > support.
> > > > > > > > > > > This
> > > > > > > > > > > > > > would help modernize the design and engineering
> > > practice
> > > > > in
> > > > > > > > MXNet
> > > > > > > > > > to
> > > > > > > > > > > > help
> > > > > > > > > > > > > > improve speed and 

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-18 Thread Lin Yuan
t; for the
> > > > > > fix.
> > > > > > >
> > > > > > > As for the hybridize with static alloc performance regression.
> > IMO it
> > > > > > does
> > > > > > > not need to be a blocker if we have the following speed order.
> > > > > > > 1.5.0 w/o static > 1.5.0 w/ static  > 1.4.1 w/ static > 1.4.1
> w/o
> > > > static
> > > > > > > and it will be great to know the following to better make a
> > decision
> > > > on
> > > > > > > whether this should block the release.
> > > > > > > 1) if this is a model specific or a general regression.
> > > > > > > 2) if this is platform specific or general (w/ or w/o CUDA, w/
> > or w/o
> > > > > > > MKLDNN)
> > > > > > >
> > > > > > >
> > > > > > > [1]https://github.com/apache/incubator-mxnet/pull/15213
> > > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > Best Regards
> > > > > > >
> > > > > > > Lai
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jun 11, 2019 at 1:46 PM Zhi Zhang <
> zhresh...@apache.org>
> > > > wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On 2019/06/11 18:53:56, Pedro Larroy <
> > pedro.larroy.li...@gmail.com
> > > > >
> > > > > > > > wrote:
> > > > > > > > > The stack trace doesn't seem to come from MXNet, do you
> have
> > more
> > > > > > info?
> > > > > > > > >
> > > > > > > > > On Tue, Jun 11, 2019 at 11:46 AM Zhi Zhang <
> > zhresh...@apache.org
> > > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On 2019/06/11 17:36:09, Pedro Larroy <
> > > > pedro.larroy.li...@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > A bit more background into this:
> > > > > > > > > > >
> > > > > > > > > > > While tuning a model using LSTM and convolutions we
> find
> > that
> > > > > > using
> > > > > > > > > > > hybridize with static_alloc and static_shape is 15%
> > slower
> > > > in the
> > > > > > > > > > > latest revision vs in version 1.4.1 in which using
> > hybridize
> > > > with
> > > > > > > > > > > static_alloc and static_shape is 10% faster than
> without.
> > > > > > > > > > >
> > > > > > > > > > > Overwall we are still 33% faster when comparing master
> to
> > > > 1.5.
> > > > > > > > > > >
> > > > > > > > > > > Let me know if you think this is a release blocker or
> > not.
> > > > > > > > > > >
> > > > > > > > > > > Pedro.
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 10, 2019 at 4:51 PM Pedro Larroy
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > -1
> > > > > > > > > > > >
> > > > > > > > > > > > We found a performance regression vs 1.4 related to
> > > > CachedOp
> > > > > > which
> > > > > > > > > > > > affects Hybrid forward, which we are looking into.
> > > > > > > > > > > >
> > > > > > > > > > > > Pedro.
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jun 10, 2019 at 4:33 PM Lin Yuan <
> > > > apefor...@gmail.com>
> > > > > > > 

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-10 Thread Lin Yuan
-1 (Tentatively until resolved)

I tried to build MXNet 1.5.0 from source and pip install horovod but got
the following error:

Reproduce:
1) cp make/config.mk .
2) turn on USE_CUDA, USE_CUDNN, USE_NCCL
3) make -j

MXNet can build successfully.

4) pip install horovod


/home/ubuntu/src/incubator-mxnet/python/mxnet/../../include/mkldnn/mkldnn.h:55:28:
fatal error: mkldnn_version.h: No such file or directory
compilation terminated.
INFO: Unable to build MXNet plugin, will skip it.

I did not change any setting of MKLDNN in my config.mk. I am building on
DLAMI base 18.0 which is Ubuntu 16.04 and CUDA 10.0

Thanks,

Lin


On Sat, Jun 8, 2019 at 5:39 PM shiwen hu  wrote:

> +1
>
> Lai Wei  于2019年6月9日周日 上午4:12写道:
>
> > Dear MXNet community,
> >
> > This is the 3-day vote to release Apache MXNet (incubating) version
> 1.5.0.
> > Voting on dev@ will start June 8, 23:59:59(PST)  and close on June 11,
> > 23:59:59.
> >
> > 1) Link to release notes:
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes
> >
> > 2) Link to release candidate:
> >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc0
> >
> > 3) Link to source and signatures on apache dist server:
> >
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc0/
> >
> >
> > Please remember to TEST first before voting accordingly:
> > +1 = approve
> > +0 = no opinion
> > -1 = disapprove (provide reason)
> >
> >
> > Best Regards
> >
> > Lai
> >
>


Re: [RFC] Support for creation of Large Tensors in MXNet

2019-05-24 Thread Lin Yuan
y this
> issue: https://github.com/apache/incubator-mxnet/issues/13451
> >
> > But as I said before, since we support flatten or reshape
> operators, so it's possible for users to convert a tensor with large
> element size to a tensor with large dimension size. It possibly will cause
> issue there.
> >
> > To cover more cases, MKL-DNN is going to support INT64 dimension
> size in its coming 1.0 major release.
> >
> > -tao
> >
> > -Original Message-
> > From: Lin Yuan [mailto:apefor...@gmail.com]
> > Sent: Tuesday, April 30, 2019 12:56 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
> >
> > Tao,
> >
> > - what's the max size of dimensionality? Which data type is used
> to define dimensionality (ndims)?
> > We assume the max size of dimensionality is relatively small.
> Hence `int` data type is used to define ndim
> >
> > - what's the max size of each dimension? Which data type is used
> to define dimension size (shape[x])?
> > Currently, we assume the max size of each dimension is not going
> to exceed
> > 2^31 in real applications. Hence the data type is `int32_t`
> >
> > - what's the max size of total elements? Which data type is used
> to define element size (Prod(shape))?
> > We assume the total number of elements in a tensor can be larger
> than 2^32 in some applications such as deep graph library. We use the data
> type `int64_t` to represent the total element size. Currently due to
> performance regression in some operators (such as transpose), we used a
> compiler flag to set this data type to `int32_t` by default. Once we have
> ways to mitigate the performance regression, we will set the default data
> type to `int64_t`, which is part of the effort in this project that Rohit
> proposed.
> >
> > What is the plan in MKLDNN to support large tensors? We may want
> to coordinate the progress since many operators are using MKLDNN
> implementation in CPU now.
> >
> > Many Thanks,
> >
> > Lin
> >
> > On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A 
> wrote:
> >
> > > Thank you for bringing this topic to dev, Rohit.
> > >
> > > Regarding large tensor, can you articulate:
> > > - what's the max size of dimensionality? Which data type is
> used to
> > > define dimensionality (ndims)?
> > > - what's the max size of each dimension? Which data type is
> used to
> > > define dimension size (shape[x])?
> > > - what's the max size of total elements? Which data type is
> used to
> > > define element size (Prod(shape))?
> > >
> > > For me, any of these three can be *large*.
> > >
> > > -Original Message-
> > > From: Srivastava, Rohit Kumar
> > > [mailto:srivastava@buckeyemail.osu.edu]
> > > Sent: Saturday, April 27, 2019 7:33 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: [RFC] Support for creation of Large Tensors in MXNet
> > >
> > > Dear Community,
> > >
> > > Currently MXNet supports creation of Tensors containing up to
> 2^32
> > > elements. However there are cases where tensors of size over 5
> billion
> > > is required
> > >
> > > We plan to support creation of large tensors on MXNet. A
> design
> > > proposal is ready for review:
> > >
> https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
> > >
> > > We will appreciate any help and feedbacks from the community.
> > >
> > > Thank you!
> > >
> > > Rohit
> > >
> >
> >
> >
> >
> >
>


Re: [Announcement] New Committer - Yuxi Hu

2019-05-24 Thread Lin Yuan
Congrats Darren! Well deserved.

On Fri, May 24, 2019 at 6:27 AM Aaron Markham 
wrote:

> Congrats Darren!
>
> On Thu, May 23, 2019, 18:48 Zhao, Patric  wrote:
>
> > Congratulations, Darren :) Thanks for your great works in Horovod.
> >
> > > -Original Message-
> > > From: Chaitanya Bapat [mailto:chai.ba...@gmail.com]
> > > Sent: Friday, May 24, 2019 9:46 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: [Announcement] New Committer - Yuxi Hu
> > >
> > > Congratulations Darren!
> > >
> > > On Fri, 24 May, 2019, 12:51 AM Sheng Zha,  wrote:
> > >
> > > > Hi all,
> > > >
> > > > Please join me in welcoming Yuxi (Darren) Hu as a new committer of
> > > > Apache MXNet (incubating)!
> > > >
> > > > Yuxi has been one of the core contributors of Horovod integration in
> > > > MXNet. Along the way, he has been making meaningful contributions to
> > > > improve the mxnet backend, such as introducing API for engine push to
> > > > make it easier to integrate horovod and external operator library.
> > > >
> > > > Welcome, Darren!
> > > >
> > > > -sz
> > > >
> > > >
> >
>


Re: [Announcement] New Committer - Kedar Bellare

2019-05-23 Thread Lin Yuan
Welcome on board!

Lin

On Thu, May 23, 2019 at 9:01 AM Carin Meier  wrote:

> Please join me in welcoming Kedar Belllare https://github.com/kedarbellare
> as
> a new committer.
>
> Kedar has worked on the Clojure package and helped improve it by porting
> the Scala image and infer functionality to Clojure as well as adding
> examples. He also is the main driver of the new Clojure API
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092678
>
> We look forward to his continued involvement and contributions.
>
> - Carin
> on behalf of Apache MXNet PPMC
>


Re: [DISCUSS] 1.5.0 Release Plan

2019-05-23 Thread Lin Yuan
Hi Lai,

One important PR that is currently blocked by a Flaky TensorRT test:

https://github.com/apache/incubator-mxnet/pull/15041

I have retriggered it several times. If it fails again, I may need CI team
to help disable this test. It has been reported by multiple people:
https://github.com/apache/incubator-mxnet/issues/14978

Thanks,

Lin

On Wed, May 22, 2019 at 11:38 PM Zhao, Patric  wrote:

> Thanks, Lai.
>
> With the great helps from the community, all PRs listed in the roadmap are
> done :)
>
> https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642
>
> Update the status of the below list
>
>  - [1] PR#14713 is almost done and wait for internal validation results
>  - [2] PR#14893 is merged
>  - [3] PR#15031 is merged
>  - [7] PR#15038 new PR to fix the bug in C++ interface, will be merged
> soon after the review.
>
> Feel free to let me know if anything our team can help :)
>
> BR,
>
> --Patric
>
> > -Original Message-
> > From: Lai Wei [mailto:roywei...@gmail.com]
> > Sent: Thursday, May 23, 2019 6:05 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: [DISCUSS] 1.5.0 Release Plan
> >
> > Hi @dev,
> >
> > Thanks for working hard for the 1.5 release, since there has been several
> > release blockers (mostly fixed). We are extending the code freeze to
> Friday
> > 05/22/2019. Right now we are tracking the following 5 open
> PRs[1][2][3][4][5]
> > and 1 issue[6]. Please let us know if you need more time.
> >
> > I would like to encourage all downstream projects to test with latest
> MXNet
> > to avoid any incompatibility in the coming 1.5.0 release. If you have any
> > issues that may block the release, please let us know.
> > Thank you very much.
> >
> > [1] https://github.com/apache/incubator-mxnet/pull/14713
> > [2] https://github.com/apache/incubator-mxnet/pull/14893
> > [3] https://github.com/apache/incubator-mxnet/pull/15031
> > [4] https://github.com/apache/incubator-mxnet/pull/15039
> > [5] https://github.com/apache/incubator-mxnet/pull/15041
> > [6] https://github.com/apache/incubator-mxnet/issues/15034
> >
> >
> > Best Regards
> >
> > Lai
> >
> >
> > On Wed, May 15, 2019 at 9:05 PM Junru Shao 
> > wrote:
> >
> > > Hi folks,
> > >
> > > Here I may have a release blocker for 1.5.0 about implementation of
> > > dynamic shape mechanism, which somehow conflicts with Gluon's
> > deferred
> > > initialization [1].
> > >
> > > [1] https://github.com/dmlc/gluon-nlp/issues/706
> > >
> > > On Wed, May 15, 2019 at 12:09 PM Anirudh Subramanian <
> > > anirudh2...@gmail.com>
> > > wrote:
> > >
> > > > Hi Lai,
> > > >
> > > > From the discussion I had with Nvidia offline they are targeting on
> > > pushing
> > > > the required changes today.
> > > > Since this is important feature for the release, if this gets
> > > > delayed and cannot  be merged by 05/17/2019, the code freeze date
> > > > may need to be changed.
> > > >
> > > > Anirudh
> > > >
> > > > On Wed, May 15, 2019 at 1:23 AM Lv, Tao A 
> wrote:
> > > >
> > > > > Hi dev,
> > > > >
> > > > > We see there are several github issues [1][2][3][4] about mxnet
> > > > > windows build experience. The team is working intensively
> > > > > [5][6][7] on that to
> > > > fix
> > > > > some problems of MKL-DNN build on windows. We hope these fixes
> > can
> > > catch
> > > > > the code freeze and finally enter the 1.5.0 release.
> > > > >
> > > > > The PR against mshadow (#374) was already merged and MXNet PR
> > > > > #14877 is under review - great thanks to CI team for helping on
> > > > > the MKL
> > > > installation
> > > > > request. PR #14952 is document change according to build logic
> > > > > changes
> > > in
> > > > > PR #14877. So I think these two PRs should be merged
> simultaneously.
> > > > > Currently #14877 is experiencing a CI response problem.
> > > > >
> > > > > Please take your time to have a look at these two PRs. Your
> > > > > comments
> > > and
> > > > > suggestions are highly appreciated.
> > > > >
> > > > > Thanks,
> > > > > -tao
> > > > >
> > > > > [1] https://github.com/apache/incubator-mxnet/issues/14670
> > > > > [2] https://github.com/apache/incubator-mxnet/issues/14335
> > > > > [3] https://github.com/apache/incubator-mxnet/issues/14203
> > > > > [4] https://github.com/apache/incubator-mxnet/issues/14085
> > > > > [5] https://github.com/apache/incubator-mxnet/pull/14877
> > > > > [6] https://github.com/dmlc/mshadow/pull/374
> > > > > [7] https://github.com/apache/incubator-mxnet/pull/14952
> > > > >
> > > > > -Original Message-
> > > > > From: Lai Wei [mailto:roywei...@gmail.com]
> > > > > Sent: Wednesday, May 15, 2019 2:57 PM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Subject: Re: [DISCUSS] 1.5.0 Release Plan
> > > > >
> > > > > Hi Anirudh,
> > > > >
> > > > > I see there was an offline disucssion <
> > > > >
> > > >
> > > https://github.com/apache/incubator-
> > mxnet/pull/14173#pullrequestreview
> > > -235846341
> > > > > >
> > > > > and I have updated the AMP 

Re: [Announcement] New Committer - Hao Jin

2019-05-01 Thread Lin Yuan
Congrats!

On Tue, Apr 30, 2019 at 11:28 PM Alex Zai  wrote:

> Congrats Hao!
>
> On Tue, Apr 30, 2019 at 10:53 PM Steffen Rochel 
> wrote:
>
> > congratulation Hao!
> >
> > On Tue, Apr 30, 2019 at 8:05 AM MiraiWK WKCN  wrote:
> >
> > > Congrats Hao! Welcome!
> > >
> > > 
> > > From: Lv, Tao A 
> > > Sent: Tuesday, April 30, 2019 11:00:33 PM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: RE: [Announcement] New Committer - Hao Jin
> > >
> > > Congratulations Hao!
> > >
> > > -Original Message-
> > > From: Jun Wu [mailto:wujun@gmail.com]
> > > Sent: Tuesday, April 30, 2019 12:29 PM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: [Announcement] New Committer - Hao Jin
> > >
> > > Please join me in welcoming Hao Jin (https://github.com/haojin2) from
> > AWS
> > > as a new committer.
> > >
> > > Hao has designed and implemented many sophisticated algorithms for
> tensor
> > > operations. His work has greatly expanded the coverage of MXNet
> operator
> > > inventory and enhanced the performance of many operators that are hard
> to
> > > be optimized. Not only that, Hao has been active in advocating MXNet
> > > through providing high-quality translation service for quite a few
> > > technical articles and blog posts.
> > >
> >
>


Re: [Announcement] New Committer - Zhennan Qin

2019-04-30 Thread Lin Yuan
Congrats, Zhennan! Well deserved.

Lin

On Tue, Apr 30, 2019 at 3:07 PM Zhao, Patric  wrote:

> Cong, Zhennan.
>
> Really great works and it makes the MXNet/Quantization flow outstanding
> over the world!
>
> > -Original Message-
> > From: Lv, Tao A [mailto:tao.a...@intel.com]
> > Sent: Tuesday, April 30, 2019 11:01 PM
> > To: dev@mxnet.incubator.apache.org
> > Subject: RE: [Announcement] New Committer - Zhennan Qin
> >
> > Congratulations Zhennan!
> >
> > -Original Message-
> > From: Jun Wu [mailto:wujun@gmail.com]
> > Sent: Tuesday, April 30, 2019 12:29 PM
> > To: dev@mxnet.incubator.apache.org
> > Subject: [Announcement] New Committer - Zhennan Qin
> >
> > Please join me in welcoming Zhennan Qin (https://github.com/ZhennanQin)
> > from Intel as a new committer.
> >
> > Zhennan is the main author of accelerating MXNet/MKLDNN inference
> > through operator fusion and model quantization. His work has placed MXNet
> > in an advantageous place for inference workloads on Intel CPUs compared
> > with other DL frameworks.
>


Re: [RFC] Support for creation of Large Tensors in MXNet

2019-04-29 Thread Lin Yuan
Tao,

- what's the max size of dimensionality? Which data type is used to define
dimensionality (ndims)?
We assume the max size of dimensionality is relatively small. Hence `int`
data type is used to define ndim

- what's the max size of each dimension? Which data type is used to define
dimension size (shape[x])?
Currently, we assume the max size of each dimension is not going to exceed
2^31 in real applications. Hence the data type is `int32_t`

- what's the max size of total elements? Which data type is used to define
element size (Prod(shape))?
We assume the total number of elements in a tensor can be larger than 2^32
in some applications such as deep graph library. We use the data type
`int64_t` to represent the total element size. Currently due to performance
regression in some operators (such as transpose), we used a compiler flag
to set this data type to `int32_t` by default. Once we have ways to
mitigate the performance regression, we will set the default data type to
`int64_t`, which is part of the effort in this project that Rohit proposed.

What is the plan in MKLDNN to support large tensors? We may want to
coordinate the progress since many operators are using MKLDNN
implementation in CPU now.

Many Thanks,

Lin

On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A  wrote:

> Thank you for bringing this topic to dev, Rohit.
>
> Regarding large tensor, can you articulate:
> - what's the max size of dimensionality? Which data type is used to define
> dimensionality (ndims)?
> - what's the max size of each dimension? Which data type is used to define
> dimension size (shape[x])?
> - what's the max size of total elements? Which data type is used to define
> element size (Prod(shape))?
>
> For me, any of these three can be *large*.
>
> -Original Message-
> From: Srivastava, Rohit Kumar [mailto:srivastava@buckeyemail.osu.edu]
> Sent: Saturday, April 27, 2019 7:33 AM
> To: dev@mxnet.incubator.apache.org
> Subject: [RFC] Support for creation of Large Tensors in MXNet
>
> Dear Community,
>
> Currently MXNet supports creation of Tensors containing up to 2^32
> elements. However there are cases where tensors of size over 5 billion is
> required
>
> We plan to support creation of large tensors on MXNet. A design proposal
> is ready for review:
> https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
>
> We will appreciate any help and feedbacks from the community.
>
> Thank you!
>
> Rohit
>


Re: [QUESTION] mxnet/Tuple vs nnvm/Tuple

2019-04-16 Thread Lin Yuan
Jun,

Thanks! I was also leaning towards your suggestion.
I have updated nnvm::Tuple to mxnet::Tuple for a few remaining places in
MXNet.

Best,

Lin

On Tue, Apr 16, 2019 at 11:35 AM Jun Wu  wrote:

> include/mxnet/tuple.h was first copied from nnvm in this PR
> <https://github.com/apache/incubator-mxnet/pull/14270> so that we can make
> changes on it to support zero-dim and zero-size tensors without affecting
> TVM project. That PR has changed most of the places where nnvm::Tuple and
> nnvm::TShape were used to mxnet::Tuple and mxnet::TShape. If we still see a
> few locations not changed in the current codebase, we should change them to
> use mxnet Tuple as well for better cosmetics. The nnvm/tuple.h can be
> deprecated in MXNet.
>
> On Mon, Apr 15, 2019 at 10:44 PM Lin Yuan  wrote:
>
> > Dear Community,
> >
> > Currently in MXNet there are two Tuple template class defined in
> > mxnet/tuple.h and nnvm/tuple.h respectively. These two templates are
> higly
> > similar and most part are duplicated except for a couple of functions.
> > However, they were used mixedly in current codebase and causing conflict
> > sometimes.
> >
> > Is there any historical reason that we keep two copies of the same
> template
> > class? If not, can we refactor the code to consolidate into one?
> >
> > Thanks!
> >
> > Lin
> >
>


[QUESTION] mxnet/Tuple vs nnvm/Tuple

2019-04-15 Thread Lin Yuan
Dear Community,

Currently in MXNet there are two Tuple template class defined in
mxnet/tuple.h and nnvm/tuple.h respectively. These two templates are higly
similar and most part are duplicated except for a couple of functions.
However, they were used mixedly in current codebase and causing conflict
sometimes.

Is there any historical reason that we keep two copies of the same template
class? If not, can we refactor the code to consolidate into one?

Thanks!

Lin


Re: Fujitsu Breaks ImageNet Record using MXNet (under 75 sec)

2019-04-08 Thread Lin Yuan
Chai,

Thanks for sharing. This is awesome news!

Lin

On Mon, Apr 8, 2019 at 8:48 AM Chaitanya Bapat  wrote:

> Greetings!
>
> Great start to a Monday morning, as I came across this news on Import AI,
> an AI newsletter.
>
> The newsletter talked about Apache MXNet, hence thought of sharing it with
> our community. This seems to be a great achievement worth paying attention
> to.
>
> *75 seconds: How long it takes to train a network against ImageNet:*
> *...Fujitsu Research claims state-of-the-art ImageNet training scheme...*
> Researchers with Fujitsu Laboratories in Japan have further reduced the
> time it takes to train large-scale, supervised learning AI models; their
> approach lets them train a residual network to around 75% accuracy on the
> ImageNet dataset after 74.7 seconds of training time. This is a big leap
> from where we were in 2017 (an hour), and is impressive relative to
> late-2018 performance (around 4 minutes: see issue #121
> <
> https://twitter.us13.list-manage.com/track/click?u=67bd06787e84d73db24fb0aa5=28edafc07a=0b77acb987
> >
> ).
>
> *How they did it: *The researchers trained their system across *2,048 Tesla
> V100 GPUs* via the Amazon-developed MXNet deep learning framework. They
> used a large mini-batch size of 81,920, and also implemented layer-wise
> adaptive scaling (LARS) and a 'warming up' period to increase learning
> efficiency.
>
> *Why it matters:* Training large models on distributed infrastructure is a
> key component of modern AI research, and the reduction in time we've seen
> on ImageNet training is striking - I think this is emblematic of the
> industrialization of AI, as people seek to create systematic approaches to
> efficiently training models across large amounts of computers. This trend
> ultimately leads to a speedup in the rate of research reliant on
> large-scale experimentation, and can unlock new paths of research.
> *  Read more:* Yet Another Accelerated SGD: ResNet-50 Training on ImageNet
> in 74.7 seconds (Arxiv)
> <
> https://twitter.us13.list-manage.com/track/click?u=67bd06787e84d73db24fb0aa5=d2b13c879f=0b77acb987
> >
> .
>
> NVIDIA article -
>
> https://news.developer.nvidia.com/fujitsu-breaks-imagenet-record-with-v100-tensor-core-gpus/
>
> Hope that gives further impetus to strive harder!
> Have a good week!
> Chai
>
>  --
> *Chaitanya Prakash Bapat*
> *+1 (973) 953-6299*
>
> [image: https://www.linkedin.com//in/chaibapat25]
> [image: https://www.facebook.com/chaibapat
> ]
> [image:
> https://twitter.com/ChaiBapchya] [image:
> https://www.linkedin.com//in/chaibapat25]
> 
>


[RFC] Higher order gradient support in MXNet

2019-04-04 Thread Lin Yuan
Dear Community,

Higher order gradient calculation is required for many applications.
However, current MXNet only support higher order gradient for a very
limited number of operators.

We plan to support the higher order gradient calculation in the autograd
package. A design proposal is ready for review:
https://cwiki.apache.org/confluence/display/MXNET/Higher+Order+Gradient+Calculation

We will appreciate any help and feedbacks from the community.

Cheers!

Lin


Re: [DISCUSS] Rebrand Gluon to MXNet imperative or something MXNet.

2019-03-22 Thread Lin Yuan
@Junru I fully agree with what you said. What I meant is we need to make
more customers know about them.

Lin

On Fri, Mar 22, 2019 at 6:34 PM Junru Shao  wrote:

> @Lin I believe that the way to build a healthy community is to make both
> customers and developers happy. In this case, I feel like the more
> important thing about toolkits is to explain how useful they are to our
> customers, rather than positions, components or anything else.
>
> As I mentioned above, the usefulness comes from two aspects (at least).
>
> 1) they provide state-of-the-art models and training techniques
> out-of-the-box. If our customers want inference only, we have model zoo; If
> our customers want to train on their own dataset, we have awesome training
> tricks enclosed.
>
> 2) it provides exemplary codebase for anyone who wants to use Gluon
> elegantly. It does help a lot for real-world development, compared with
> simplest examples like tutorial.
>
>
> On Fri, Mar 22, 2019 at 6:07 PM Junru Shao 
> wrote:
>
> > Probably we should figure out how to explain MXNet Gluon to customers. In
> > this case, I agree with @Mu that
> >
> > 1) MXNet Gluon provides high-level API like what Keras gives to
> TensorFlow.
> >
> > 2) MXNet Gluon supports hybridization, which unifies both symbolic and
> > imperative programming style.
> >
> > Also, about toolkits, we could mention
> >
> > 3) GluonNLP and GluonCV are two awesome libraries in their respective
> > domain, both of which are built on MXNet Gluon. They not only provide an
> > awesome exemplary codebase for customers to learn the best way to use
> MXNet
> > Gluon, but also come with the state-of-the-art models and training
> > techniques out-of-the-box.
> >
> > Any other ideas?
> >
> >
> > On Fri, Mar 22, 2019 at 5:54 PM Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> > wrote:
> >
> >> +1 to MXNet Gluon given the feedbacks and explanations from everyone so
> >> far.
> >>
> >> On Fri, Mar 22, 2019 at 5:09 PM Junru Shao 
> >> wrote:
> >> >
> >> > I feel like MXNet Gluon is a good name. You don't lose customers who
> >> have
> >> > been familiar with MXNet, nor lose customers who are used to MXNet
> >> symbolic.
> >> >
> >> > On Fri, Mar 22, 2019 at 5:07 PM Davydenko, Denis <
> >> > dzianis.davydze...@gmail.com> wrote:
> >> >
> >> > > As subject suggests this is a proposal for re-branding of Gluon to
> >> align
> >> > > it with MXNet. One of the common things undertaken for re-branding
> >> > > exercises is renaming. That's what my thinking behind suggesting new
> >> name
> >> > > for Gluon. I am sincerely curious what would be alternatives to
> >> rebrand
> >> > > Gluon to align it with MXNet without changing its name.
> >> > >
> >> > >
> >> > > On 3/22/19, 4:57 PM, "Mu Li"  wrote:
> >> > >
> >> > > Are you proposing to rename Gluon? I think Pedro's opinion is
> >> about a
> >> > > better way to communicate what's Gluon and how it's related to
> >> MXNet.
> >> > >
> >> > > On Fri, Mar 22, 2019 at 4:54 PM Davydenko, Denis
> >> > > 
> >> > > wrote:
> >> > >
> >> > > > I support idea of putting brands of MXNet and Gluon closer
> >> together.
> >> > > I
> >> > > > agree with your argument, Mu, but MXNet is quite far away from
> >> TF
> >> > > place at
> >> > > > this time so I don’t know how well that argument is
> transferable
> >> > > from TF
> >> > > > position to MXNet position.
> >> > > >
> >> > > > MXNet Imperative is definitely too restrictive of a name, we
> can
> >> > > come up
> >> > > > with better one... MXNet-M for example, stands for
> >> MXNet-Modified
> >> > > (military
> >> > > > connotation). If naming is the only thing we need to figure
> out
> >> -
> >> > > that is a
> >> > > > good place to be in __
> >> > > >
> >> > > > --
> >> > > > Thanks,
> >> > > > Denis
> >> > > >
> >> > > > On 3/22/19, 4:48 PM, "Mu Li"  wrote:
> >> > 

Re: [DISCUSS] Rebrand Gluon to MXNet imperative or something MXNet.

2019-03-22 Thread Lin Yuan
@Junru GluonNLP and GluonCV are definitely awesome toolkits. I feel we
should advertise more about these hidden treasures :)

Today there is a big initiative to publicize MXNet. I feel we should also
bring GluonNLP and GluonCV on the same boat and highlight their tight
relations with MXNet.

My two cents.

Lin



On Fri, Mar 22, 2019 at 6:08 PM Junru Shao  wrote:

> Probably we should figure out how to explain MXNet Gluon to customers. In
> this case, I agree with @Mu that
>
> 1) MXNet Gluon provides high-level API like what Keras gives to TensorFlow.
>
> 2) MXNet Gluon supports hybridization, which unifies both symbolic and
> imperative programming style.
>
> Also, about toolkits, we could mention
>
> 3) GluonNLP and GluonCV are two awesome libraries in their respective
> domain, both of which are built on MXNet Gluon. They not only provide an
> awesome exemplary codebase for customers to learn the best way to use MXNet
> Gluon, but also come with the state-of-the-art models and training
> techniques out-of-the-box.
>
> Any other ideas?
>
>
> On Fri, Mar 22, 2019 at 5:54 PM Pedro Larroy  >
> wrote:
>
> > +1 to MXNet Gluon given the feedbacks and explanations from everyone so
> > far.
> >
> > On Fri, Mar 22, 2019 at 5:09 PM Junru Shao 
> > wrote:
> > >
> > > I feel like MXNet Gluon is a good name. You don't lose customers who
> have
> > > been familiar with MXNet, nor lose customers who are used to MXNet
> > symbolic.
> > >
> > > On Fri, Mar 22, 2019 at 5:07 PM Davydenko, Denis <
> > > dzianis.davydze...@gmail.com> wrote:
> > >
> > > > As subject suggests this is a proposal for re-branding of Gluon to
> > align
> > > > it with MXNet. One of the common things undertaken for re-branding
> > > > exercises is renaming. That's what my thinking behind suggesting new
> > name
> > > > for Gluon. I am sincerely curious what would be alternatives to
> rebrand
> > > > Gluon to align it with MXNet without changing its name.
> > > >
> > > >
> > > > On 3/22/19, 4:57 PM, "Mu Li"  wrote:
> > > >
> > > > Are you proposing to rename Gluon? I think Pedro's opinion is
> > about a
> > > > better way to communicate what's Gluon and how it's related to
> > MXNet.
> > > >
> > > > On Fri, Mar 22, 2019 at 4:54 PM Davydenko, Denis
> > > > 
> > > > wrote:
> > > >
> > > > > I support idea of putting brands of MXNet and Gluon closer
> > together.
> > > > I
> > > > > agree with your argument, Mu, but MXNet is quite far away from
> TF
> > > > place at
> > > > > this time so I don’t know how well that argument is
> transferable
> > > > from TF
> > > > > position to MXNet position.
> > > > >
> > > > > MXNet Imperative is definitely too restrictive of a name, we
> can
> > > > come up
> > > > > with better one... MXNet-M for example, stands for
> MXNet-Modified
> > > > (military
> > > > > connotation). If naming is the only thing we need to figure
> out -
> > > > that is a
> > > > > good place to be in __
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Denis
> > > > >
> > > > > On 3/22/19, 4:48 PM, "Mu Li"  wrote:
> > > > >
> > > > > Gluon is about imperative neural network training and data
> > > > loading.
> > > > > ndarray
> > > > > is another large imperative module. Besides, Gluon also
> > supports
> > > > > symbolic
> > > > > execution after hybridizing.  mxnet imperative might not
> be a
> > > > good
> > > > > name for
> > > > > it. Another choice is high-level API, that's how TF talks
> > about
> > > > Keras.
> > > > >
> > > > > On Fri, Mar 22, 2019 at 4:38 PM Yuan Tang <
> > > > terrytangy...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > On Fri, Mar 22, 2019 at 7:29 PM Lin Yuan <
> > apefor...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > &g

Re: [DISCUSS] Rebrand Gluon to MXNet imperative or something MXNet.

2019-03-22 Thread Lin Yuan
@Junru Thanks for the clarification. Given that we already have courseware
and books with Gluon, it makes sense to brand “Mxnet Gluon” with Gluon
being the high level API of mxnet

@Tianqi what’s the roadmap of GluonNLP/GluonCV? Are they positioned to be
high level API of MXnet or some plug-and-play components that could
potentially be put on top of other frameworks in the future? If the former,
should we always highlight Mxnet whenever we advertise GluonNLP?

Thanks

Lin

On Fri, Mar 22, 2019 at 5:41 PM Tianqi Chen 
wrote:

> Change the name gluon will result in a significant problem of backward
> compatibility for many of the current users, and that would be a huge -1
> for the current community.
> One possibility is to do that is to have a clear roadmap of 2.0(which gives
> the message of non-backward compatible) and we can discuss which features
> consolidate, but perhaps that will require a bit more thoughts and
> coordinated effort.
>
> Tianqi
>
> On Fri, Mar 22, 2019 at 5:39 PM Junru Shao 
> wrote:
>
> > @Tianqi For sure GluonCV and GluonNLP should go with the current name. No
> > reason to change.
> >
> > @Lin If customers are interested, I guess we could say they are awesome
> > toolkits built on top of MXNet Gluon API, and perfect illustration to
> write
> > clever and powerful code on the top of it.
> >
>


Re: [DISCUSS] Rebrand Gluon to MXNet imperative or something MXNet.

2019-03-22 Thread Lin Yuan
+1.

Just to give some of my real experience:
1) I advertised a recent GluonNLP blog and many responses are "This seems
nice. So is Gluon a new library to replace MXNet?"
2) We visited customers in a unicorn company who showed interests in MXNet
but none of the engineers knew the relationship between GluonNLP/GluonCV
and MXNet
3) When integrating MXNet to Horovod and adding examples, I received
comments like "What is Gluon? Is it a new library in addition to MXNet?"

Everyone is talking about PyTorch nowadays, but not Caffe2 anymore although
the latter is still serving as a backend component. Maybe we should also
doubledown on one brand?

Lin

On Fri, Mar 22, 2019 at 4:02 PM Pedro Larroy 
wrote:

> Hi dev@
>
> We heard feedback from users that the Gluon name is confusing. Some of
> them don't even know it's MXNet and it's unclear the relationship with
> MXNet
>
> Would it make sense to rebrand Gluon to just MXNet or MXNet
> imperative? Diluting brands and names is never a good idea.
>
> There's also gluonhq which is related to JavaFX which adds to the
> confusion, search engine friendliness is not high as well.
>
> Pedro.
>


Re: [Announcement] New Committer - Patric Zhao

2019-03-21 Thread Lin Yuan
Congrats, Patric!

On Thu, Mar 21, 2019 at 10:32 AM Yuxi Hu  wrote:

> Congrats, Patric! Well deserved!
>
> On Wed, Mar 20, 2019 at 1:08 PM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > Congrats Patric!
> >
> > On Sun, Mar 17, 2019 at 10:34 PM Hagay Lupesko 
> wrote:
> >
> > > Congrats Patric!
> > >
> > > On Fri, Mar 15, 2019 at 7:49 AM Joshua Z. Zhang 
> > > wrote:
> > >
> > > >
> > > >
> > > >
> > > >  Congrats Patrick!
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >  Zhi
> > > >
> > > > >
> > > > > On Mar 15, 2019 at 10:46 PM,   > > > marco.g.ab...@gmail.com)>  wrote:
> > > > >
> > > > >
> > > > >
> > > > >  Congratulations, great to have you on board!
> > > > >
> > > > > -Marco
> > > > >
> > > > > Lv, Tao Aschrieb am Fr., 15. März 2019,
> > 15:38:
> > > > >
> > > > > >  Wow, congratulation Patric!
> > > > > >
> > > > > >  -Original Message-
> > > > > >  From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> > > > > >  Sent: Friday, March 15, 2019 10:25 PM
> > > > > >  To: dev@mxnet.incubator.apache.org
> > > > > >  Cc: patric zhao  
> > > > > >  Subject: Re: [Announcement] New Committer - Patric Zhao
> > > > > >
> > > > > >  Congratulation Patrick!
> > > > > >  Steffen
> > > > > >
> > > > > >  On Fri, Mar 15, 2019 at 5:38 AM Zhao, Patric  <
> > > patric.z...@intel.com>
> > > >
> > > > > >  wrote:
> > > > > >
> > > > > >   >  I am very glad to have this opportunity to contribute to the
> > > > > >   >  Apache/MXNet community :)
> > > > > >   >
> > > > > >   >  Thanks all of the supports from the community and Intel.
> > > > > >   >
> > > > > >   >  BR,
> > > > > >   >
> > > > > >   >  --Patric
> > > > > >   >
> > > > > >   >
> > > > > >   >   >  -Original Message-
> > > > > >   >   >  From: MiraiWK WKCN [mailto:w...@live.cn]
> > > > > >   >   >  Sent: Friday, March 15, 2019 12:52 AM
> > > > > >   >   >  To: dev@mxnet.incubator.apache.org; patric zhao
> > > > > >   >   >   
> > > > > >   >   >  Subject: Re: [Announcement] New Committer - Patric Zhao
> > > > > >   >   >
> > > > > >   >   >  Welcome Peng Zhao!
> > > > > >   >   >  Peng is the AI Tech Leader in Intel Corporation. We have
> > > > good
> > > > > >   >   >  cooperation before. He is very professional and
> > contribute a
> > > > lot to
> > > > > >   >   >  MXNet,
> > > > > >   >  especially deep
> > > > > >   >   >  learning boost on CPU.
> > > > > >   >   >
> > > > > >   >   >  
> > > > > >   >   >  From: Anirudh Subramanian  
> > > > > >   >   >  Sent: Thursday, March 14, 2019 3:54:50 PM
> > > > > >   >   >  To: dev@mxnet.incubator.apache.org; patric zhao
> > > > > >   >   >  Subject: [Announcement] New Committer - Patric Zhao
> > > > > >   >   >
> > > > > >   >   >  Hi all,
> > > > > >   >   >
> > > > > >   >   >  Please join me to welcome Patric Zhao as a new committer
> > of
> > > > Apache
> > > > > >   >   >  (incubating) MXNet!
> > > > > >   >   >
> > > > > >   >   >  Patric has put in great effort around MKLDNN integration
> > > into
> > > > MXNet
> > > > > >   >   >  and
> > > > > >   >  has
> > > > > >   >   >  been involved in features like quantization, graph
> fusion
> > > and
> > > > fused
> > > > > >   >   >  RNN operators for CPU.
> > > > > >   >   >
> > > > > >   >   >  Dev List activity:
> > > > > >   >   >
> > > > > >   >
> > > >
> https://lists.apache.org/list.html?d...@mxnet.apache.org:lte=3y:patric.
> > > > > >   >  zhao
> > > > > >   >   >
> > > > > >   >   >  Issues:
> > > > > >   >   >  https://github.com/apache/incubator-
> > > > > >   >   >
> > > > mxnet/issues?utf8=%E2%9C%93=is%3Aissue+involves%3Apengzhao-intel+
> > > > > >   >   >
> > > > > >   >   >  PR Reviews:
> > > > > >   >   >  https://github.com/apache/incubator-
> > > > > >   >   >
> > > > mxnet/pulls?utf8=%E2%9C%93=is%3Apr+reviewed-by%3Apengzhao-intel
> > > > > >   >   >
> > > > > >   >   >  Proposals involved in:
> > > > > >   >   >
> > > > https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimi
> > > > > >   >   >  z
> > > > > >   >   >  ation+and+Quantization+based+on+subgraph+and+MKL-DNN
> > > > > >   >   >
> > > > https://cwiki.apache.org/confluence/display/MXNET/Fused+RNN+Operator
> > > > > >   >   >  s
> > > > > >   >   >  +for+CPU
> > > > > >   >   >   <
> > > > https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optim
> > > > > >   >   >  i
> > > > > >   >   >  zation+and+Quantization+based+on+subgraph+and+MKL-DNN>
> > > > > >   >   >
> > > > > >   >   >
> > > > > >   >   >  Thanks,
> > > > > >   >   >  Anirudh
> > > > > >   >
> > > > > >
> > > > >
> > >
> >
>
>
> --
> Yuxi(Darren) Hu, Ph.D.
> Software Development Engineer
> Amazon Web Services
>


Re: [DISCUSS] Process to remove deprecated operators

2019-02-27 Thread Lin Yuan
Agreed. When we deprecate an operator, we should add in the log message
something like "This operator X is deprecate and will be removed in the
next release. Please use operator Y instead."

Lin

On Wed, Feb 27, 2019 at 10:23 PM Junru Shao  wrote:

> Hi Lin,
>
> I would love to share some immature ideas about deprecating operators. Not
> only adopting semantic versioning, but also should we provide enough
> informative error message for customers to understand how to replace
> deprecated operators with new ones.
>
> Thanks,
> Junru
>
> On Wed, Feb 27, 2019 at 9:30 PM Lin Yuan  wrote:
>
> > Sheng,
> >
> > Thanks for your quick response.
> > If that's the case, we will wait till 2.0 release to remove the
> deprecated
> > operators from code.
> >
> > Best,
> > Lin
> >
> > On Wed, Feb 27, 2019 at 9:06 PM Sheng Zha  wrote:
> >
> > > MXNet follows semantic versioning so we will be able to delete them in
> > the
> > > next major release.
> > >
> > > -sz
> > >
> > > On Wed, Feb 27, 2019 at 8:53 PM Lin Yuan  wrote:
> > >
> > > > Dear Community,
> > > >
> > > > In MXNet there are many legacy operators such as this
> > > > <
> > > >
> > >
> >
> http://mxnet.incubator.apache.org/versions/master/api/python/symbol/symbol.html?highlight=convolution_v1#mxnet.symbol.Convolution_v1
> > > > >
> > > > that has been marked DEPRECATE for several releases. However, these
> > > > operators still exist in our code. This caused a few problems:
> > > >
> > > > 1) Make the codebase bloated and reduce readability
> > > > 2) Increase unnecessary maintanence effort
> > > > 3) Bug prone as some people will look up these legacy code as example
> > > > 4) Cause confusion to end users and make documentation page lengthy
> > > >
> > > > I would like to propose the following process (if there is no
> existing
> > > one)
> > > > to remove deprecate operators from our code base.
> > > >
> > > > 1. Documnent the deprecate operators/environment variables in the
> > release
> > > > note as well as man pages.
> > > > 2. Limit the life cycle of deprecate operators/argument to two minor
> > > > release. For example, if one operator is marked deprecate in 1.4
> > release,
> > > > it will be removed in 1.6 release.
> > > > 3. If there is some concern raised from customers during 1.4 and 1.5
> > > > release, we can convert the deprecated operator back to current and
> it
> > > will
> > > > be treated as new operator.
> > > > 4. PRs that remove deprecate operators should contain [Cleanup] in
> > title.
> > > >
> > > > Any comment is appreciated.
> > > >
> > > > Lin
> > > >
> > >
> >
>


Re: [DISCUSS] Process to remove deprecated operators

2019-02-27 Thread Lin Yuan
Sheng,

Thanks for your quick response.
If that's the case, we will wait till 2.0 release to remove the deprecated
operators from code.

Best,
Lin

On Wed, Feb 27, 2019 at 9:06 PM Sheng Zha  wrote:

> MXNet follows semantic versioning so we will be able to delete them in the
> next major release.
>
> -sz
>
> On Wed, Feb 27, 2019 at 8:53 PM Lin Yuan  wrote:
>
> > Dear Community,
> >
> > In MXNet there are many legacy operators such as this
> > <
> >
> http://mxnet.incubator.apache.org/versions/master/api/python/symbol/symbol.html?highlight=convolution_v1#mxnet.symbol.Convolution_v1
> > >
> > that has been marked DEPRECATE for several releases. However, these
> > operators still exist in our code. This caused a few problems:
> >
> > 1) Make the codebase bloated and reduce readability
> > 2) Increase unnecessary maintanence effort
> > 3) Bug prone as some people will look up these legacy code as example
> > 4) Cause confusion to end users and make documentation page lengthy
> >
> > I would like to propose the following process (if there is no existing
> one)
> > to remove deprecate operators from our code base.
> >
> > 1. Documnent the deprecate operators/environment variables in the release
> > note as well as man pages.
> > 2. Limit the life cycle of deprecate operators/argument to two minor
> > release. For example, if one operator is marked deprecate in 1.4 release,
> > it will be removed in 1.6 release.
> > 3. If there is some concern raised from customers during 1.4 and 1.5
> > release, we can convert the deprecated operator back to current and it
> will
> > be treated as new operator.
> > 4. PRs that remove deprecate operators should contain [Cleanup] in title.
> >
> > Any comment is appreciated.
> >
> > Lin
> >
>


[DISCUSS] Process to remove deprecated operators

2019-02-27 Thread Lin Yuan
Dear Community,

In MXNet there are many legacy operators such as this

that has been marked DEPRECATE for several releases. However, these
operators still exist in our code. This caused a few problems:

1) Make the codebase bloated and reduce readability
2) Increase unnecessary maintanence effort
3) Bug prone as some people will look up these legacy code as example
4) Cause confusion to end users and make documentation page lengthy

I would like to propose the following process (if there is no existing one)
to remove deprecate operators from our code base.

1. Documnent the deprecate operators/environment variables in the release
note as well as man pages.
2. Limit the life cycle of deprecate operators/argument to two minor
release. For example, if one operator is marked deprecate in 1.4 release,
it will be removed in 1.6 release.
3. If there is some concern raised from customers during 1.4 and 1.5
release, we can convert the deprecated operator back to current and it will
be treated as new operator.
4. PRs that remove deprecate operators should contain [Cleanup] in title.

Any comment is appreciated.

Lin


Re: [Announce] Runtime feature detection

2019-02-12 Thread Lin Yuan
Thanks, Pedro for contributing this long awaiting feature. I can
immediately use it for Horovod project now.

Bravo!

Lin

On Tue, Feb 12, 2019 at 2:42 AM Pedro Larroy 
wrote:

> An update on this topic, Sheng just merged the refinements to the
> feature detection so it's now a single API call. (
> https://github.com/apache/incubator-mxnet/pull/13964 ). Thank you
> Sheng for the reviews.
>
> Please use this functionality to check for capabilities of MXNet at
> runtime such as Cuda, OpenCV etc. This can simplify tests and
> automation in several places in the code.
>
> Lin Iblis is already preparing Julia support:
> https://github.com/apache/incubator-mxnet/pull/13992
>
> This is a PR that adds documentation on the feature and explains how
> to use it from Python:
> https://github.com/apache/incubator-mxnet/pull/14130
>
> Thanks.
>
> On Fri, Jan 25, 2019 at 7:08 PM Sheng Zha  wrote:
> >
> > Hi Pedro,
> >
> > Happy to help, though I was waiting for PR comments to be addressed.
> Currently the PR is close to complete, with some open comments to be
> resolved.
> >
> > -sz
> >
> > > On Jan 25, 2019, at 9:27 AM, Pedro Larroy <
> pedro.larroy.li...@gmail.com> wrote:
> > >
> > > That's Great! There's a PR that we should merge first which
> > > internalizes the enum inside the library as per Sheng's suggestion.
> > >
> > > https://github.com/apache/incubator-mxnet/pull/13964
> > >
> > > @Sheng could we merge the PR? so we can build on top of this feature?
> > > It's badly needed for tests suites etc.
> > > Thanks a lot!
> > >
> > > Pedro.
> > >
> > >
> > >> On Fri, Jan 25, 2019 at 2:22 PM Iblis Lin 
> wrote:
> > >>
> > >> Hi,
> > >>
> > >> I added the Julia binding for it.
> > >> PR is here:
> > >> https://github.com/apache/incubator-mxnet/pull/13992
> > >>
> > >> Iblis Lin
> > >> 林峻頤
> > >>
> > >>> On 1/23/19 12:39 AM, Pedro Larroy wrote:
> > >>> Hi
> > >>>
> > >>> I'm pleased to announce that runtime feature detection has been
> merged
> > >>> in master, thanks to Aaron for the merge and the many reviewers who
> > >>> gave feedback on the PR.  (
> > >>> https://github.com/apache/incubator-mxnet/pull/13549 )
> > >>>
> > >>> As the functionality matures and is exposed through other bindings,
> > >>> please feel free to try and use it to build on it, for example for
> > >>> easier test suite selection depending on what's compiled in the
> > >>> engine.
> > >>>
> > >>> Usage examples:
> > >>>
> > >>> $ ipython
> > >>> In [4]: import mxnet.mxfeatures
> > >>>
> > >>> In [5]: mxnet.mxfeatures.features_enabled()
> > >>> Out[5]:
> > >>> [,
> > >>> ,
> > >>> ,
> > >>> ,
> > >>> ,
> > >>> ,
> > >>> ,
> > >>> ,
> > >>> ,
> > >>> ,
> > >>> ]
> > >>>
> > >>> In [6]: mxnet.mxfeatures.features_enabled_str()
> > >>> Out[6]: 'CPU_SSE, CPU_SSE2, CPU_SSE3, CPU_SSE4_1, CPU_SSE4_2,
> CPU_AVX,
> > >>> F16C, BLAS_OPEN, LAPACK, SIGNAL_HANDLER, DEBUG'
> > >>>
> > >>> see also: help(mxnet.mxfeatures)
> > >>>
> > >>> Regards.
> > >>>
>


Re: [RESTARTING][VOTE] Release Apache MXNet (incubating) version 1.4.0.rc2

2019-02-11 Thread Lin Yuan
+1 binding
Horovod is going to release it's 0.16.0 in the coming week with MXNet
integration. We need to release 1.4.0 which includes all the dependencies
for Horovod integration.

Best,

Lin

On Mon, Feb 11, 2019 at 9:30 PM Steffen Rochel 
wrote:

> Dear community -
> based on Justin's and community feedback I'm suggesting to restart the
> vote.
> Current status:
> binding votes:
> +1: 2 votes (Henri, Jason)
> -1:  1 vote (Luciano)
>
> non-binding:
> +1: 1 vote (Kellen)
>
> The community is investigating feedback from Luciano that the exclusion
> file is to broad and potentially missing files which can and must have
> apache license headers not to be checked.
>
> Regards,
> Steffen
>
>
>
>
> On Mon, Feb 11, 2019 at 10:08 AM Hagay Lupesko  wrote:
>
> > Based on Justin's feedback, can we resume the vote instead of cancelling
> > it?
> >
> > On Mon, Feb 11, 2019 at 12:02 AM Justin Mclean  >
> > wrote:
> >
> > > Hi,
> > >
> > > In future don’t be so hasty to cancel a release vote, people mind can
> be
> > > changed and a -1 is not a veto on a release.
> > >
> > > Thanks,
> > > Justin
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> > >
> >
>


Re: [Announcement] New Committer -- Steffen Rochel

2019-02-05 Thread Lin Yuan
Welcome Steffen!

Lin

On Mon, Feb 4, 2019 at 7:53 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Great news.  Congrats Steffen.
>
> On Mon, Feb 4, 2019, 5:29 PM Thomas DELTEIL  wrote:
>
> > Welcome Steffen!
> >
> > On Mon, Feb 4, 2019, 15:55 Marco de Abreu  wrote:
> >
> > > Welcome!
> > >
> > > Am Di., 5. Feb. 2019, 00:45 hat Chris Olivier 
> > > geschrieben:
> > >
> > > > Dear Community:
> > > >
> > > > Please join me to welcome Steffen Rochel (steffenroc...@gmail.com)
> as
> > a
> > > > new
> > > > committer of Apache (incubating) MXNet!
> > > >
> > > > Steffen has played a role in nearly every MXNet release in the past
> 18
> > > > months, managed several of the wiki pages and has contributed in
> > > expanding
> > > > the community by managing and hosting meetups in different parts of
> the
> > > > world.
> > > >
> > > > -Chris
> > > >
> > >
> >
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.4.0.rc2

2019-02-04 Thread Lin Yuan
+1 build from source on MacOS 10.13.6 and tested mxnet-to-coreml converter.

On Mon, Feb 4, 2019 at 9:03 AM Indhu  wrote:

> +1
>
> Build from source and tested few examples from the examples folder.
>
> Thanks,
> Indu
>
>
>
> On Fri, Feb 1, 2019 at 6:21 PM Steffen Rochel 
> wrote:
>
> > Hi Sheng - thanks for the feedback.
> > TVM notice  file is missing as the 1.4.x branch/v1.4.0 release is using
> TVM
> > commit 0f053c8
> > <
> >
> https://github.com/dmlc/tvm/commit/0f053c82a747b4dcdf49570ec87c17e0067b7439
> > >
> >  from Oct 8, 2018, which didn't have the NOTICE file. IMHO, MXNet NOTICE
> > file is consistent with release content.
> > As the release started in 2018 I do think it is ok to move forward w/o
> > update to 2019 IMHO.
> >
> > All -
> > thanks to the committers/contributors (Tao, Aaron, Kellen, Aston, Yuxi)
> who
> > tested and provided feedback - we have five +1 votes.
> > As of today, Friday Feb 1st 2019 6pm PST we have two binding votes, one
> +1
> > (Carin), one +0 (Sheng). The vote continues be open waiting for feedback
> > from PMC members.
> > Hope you can spare some time over the weekend to provide feedback.
> >
> > Regards,
> > Steffen
> >
> > On Fri, Feb 1, 2019 at 12:44 AM Marco de Abreu 
> > wrote:
> >
> > > Considering the release process has been started last year and the code
> > tag
> > > has also been based on last year, I'd say that it is not really a big
> > deal.
> > >
> > > -Marco
> > >
> > > Am Fr., 1. Feb. 2019, 09:33 hat Sheng Zha 
> > > geschrieben:
> > >
> > > > I found an awesome checklist for incubator releases [1] so I'm using
> it
> > > > here:
> > > >
> > > > -[Y] Are release files in correct location?
> > > > -[Y] Do release files have the word incubating in their name?
> > > > -[Y] Are the digital signature and hashes correct?
> > > > -[Y] Does DISCLAIMER file exist?
> > > > -[Y] Do LICENSE and NOTICE files exists?
> > > > -[N/A] Is the LICENSE and NOTICE text correct? (sz: did not finish
> > > > checking)
> > > > -[N] Is the NOTICE year correct?
> > > > -[N/A] Un-included software dependencies are not mentioned in LICENSE
> > or
> > > > NOTICE? (sz: did not finish checking)
> > > > -[Y] License information is not mentioned in NOTICE?
> > > > Is there any 3rd party code contained inside the release? If so:
> > > > -[Y] Does the software have a compatible license?
> > > > -[Y] Are all software licenses mentioned in LICENSE?
> > > > -[Y] Is the full text of the licenses (or pointers to it) in LICENSE?
> > > > Is any of this code Apache licensed? Do they have NOTICE files? If
> so:
> > > > -[N] Have relevant parts of those NOTICE files been added to this
> > NOTICE
> > > > file?
> > > > TVM has Apache 2.0 license and its NOTICE hasn't been added to
> MXNet's
> > > > NOTICE file.
> > > > -[Y] Do all source files have ASF headers? (sz: enforced by license
> > > > checker)
> > > > -[Y] Do the contents of the release match with what's tagged in
> version
> > > > control?
> > > > -[N] Are there any unexpected binary files in the release?
> > > > -[Y] Can you compile from source? Are the instruction clear?
> > > >
> > > > Is the issue minor?
> > > > - Unsure. NOTICE year is wrong (it's 2019 now). TVM's NOTICE is
> missing
> > > > from MXNet's NOTICE file.
> > > > Could it possibly be fixed in the next release?
> > > > - Yes
> > > > I vote with:
> > > > +0 not sure if it should be released. Could mentors advise if we
> should
> > > fix
> > > > them before release?
> > > >
> > > > [1] https://wiki.apache.org/incubator/IncubatorReleaseChecklist
> > > >
> > > >
> > > > On Thu, Jan 31, 2019 at 10:56 PM Lv, Tao A 
> wrote:
> > > >
> > > > >
> > > > > +1. Verified below items:
> > > > >
> > > > > 1. Checkout code from tag 1.4.0rc2 and build mkldnn backend
> > > successfully
> > > > > on both cpu and gpu w/ mkl and openblas
> > > > > 2. ResNet50v1 FP32 performance looks good for both latency and
> > > throughput
> > > > > 3. Quantization script works well with ResNet50v1
> > > > > 4. ResNet50v1 INT8 model accuracy looks good
> > > > > 5. ResNet50v1 INT8 model performance speedup looks good for both
> > > latency
> > > > > and throughput
> > > > >
> > > > >
> > > > > -Original Message-
> > > > > From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> > > > > Sent: Friday, February 1, 2019 11:45 AM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > 1.4.0.rc2
> > > > >
> > > > > Great, thanks Steffen!  I added a few key files but missed that
> one.
> > > > >
> > > > > +1 from me.
> > > > >
> > > > > On Thu, Jan 31, 2019 at 9:35 AM Steffen Rochel <
> > > steffenroc...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Kellen - Sergey, the 1.4.0 release co-manager signed the tar
> file.
> > > > > > Please use his public key to validate the asc.
> > > > > > I was able to validate:
> > > > > >
> > > > > > curl https://dist.apache.org/repos/dist/dev/incubator/mxnet/KEYS
> > -o
> > > > > > KEYS
> > > > 

Re: [Announcement] New Committer -- Lin Yuan

2019-02-04 Thread Lin Yuan
Thanks folks! I am looking forward to working with you to make MXNet shine
in 2019!

Best,

Lin

On Sun, Feb 3, 2019 at 4:31 PM Qing Lan  wrote:

> Congrats Lin!
> >
> >
> > Congratulations Lin
> >
> >> On Sat, Feb 2, 2019, 3:27 PM Tianqi Chen  wrote:
> >>
> >> Dear Community:
> >>
> >> Please join me to welcome Lin Yuan(@apeforest) as a new committer of
> >> Apache(incubating) MXNet!
> >>
> >> He has contributed to various improvements, including better
> compatibility
> >> of larger arrays across the codebase.
> >>
> >> Commits:
> >> https://github.com/apache/incubator-mxnet/commits?author=apeforest
> >>
> >>
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93=is%3Apr+author%3Aapeforest
> >>
> >>
> >> Reviews:
> >> https://github.com/apache/incubator-mxnet/pulls?utf8=%
> >> E2%9C%93=reviewed-by%3Aapeforest
> >>
> >> dev@ activitivity
> >> https://lists.apache.org/list.html?*@mxnet.apache.org:lte=6M:Lin%20Yuan
> >>
> >> Tianqi
> >>
>


Re: Horovod-MXNet Integration

2019-01-30 Thread Lin Yuan
Hi Yuan,

Thanks for your interest. We have just supported MXNet in Horovod and are
working on performance tuning and adding more examples. We are definitely
interested in further extending it's support with Kubeflow.

Let's set up some time to have a more detailed discussion.

Best,

Lin

On Wed, Jan 30, 2019 at 7:42 AM Yuan Tang  wrote:

> Hi,
>
> It's great to see MXNet-Horovod integration got merged:
> https://github.com/uber/horovod/pull/542
>
> Is there any future plan for this? I've been working on Kubeflow's
> MPI-Operator (https://github.com/kubeflow/mpi-operator) lately and it
> would
> be interesting to see an example of using Horovod + MXNet + Kubeflow using
> MPI Operator. Feel free to reach out (@terrytangyuan
> <https://github.com/terrytangyuan>) if you encounter any issues.
>
> Best,
> Yuan
>
>
> On Fri, Nov 2, 2018 at 6:51 PM Lin Yuan  wrote:
>
> > Hi Mu,
> >
> > Darren (@yuxihu <https://github.com/yuxihu>) and I have been working on
> > releasing MXNet-Horovod integration in production. We have made some
> > changes on both MXNet and Horovod sides. The changes on MXNet side have
> > mostly been merged and we are working to merge code to horovod repo. We
> > will send a design doc to you for review again next week.
> >
> > Thanks for your feedback,
> >
> > Lin
> >
> > On Wed, Oct 31, 2018 at 12:03 PM Mu Li  wrote:
> >
> > > Thanks for your contribution, Carl.
> > >
> > > I remember I left a comment on the proposal, but today I found it was
> > > disappeared. My suggestion is trying best to not change the existing
> API.
> > > The reason is that we need to change all trainers on the frontend that
> > uses
> > > the existing kvstore APIs, which may cause confusion to users.
> > >
> > > The current proposal wants add the following 4 APIs into kvstore:
> > >
> > >
> > >-
> > >
> > >kv.pushpull
> > >-
> > >
> > >kv.broadcast
> > >-
> > >
> > >kv.local_rank
> > >-
> > >
> > >kv.num_local_workers
> > >
> > >
> > > Pushpull can be done with a sequential push and pull, you can do
> nothing
> > in
> > > push and put all workloads into pushpull. Broadcast can be implemented
> by
> > > pull.
> > >
> > > What's local workers? GPUs in the single machine? If so, we can query
> it
> > > directly.
> > >
> > >
> > > On Fri, Sep 14, 2018 at 4:46 PM Carl Yang  wrote:
> > >
> > > > Hi,
> > > >
> > > > Currently, MXNet distributed can only be done using parameter server.
> > > > Horovod is an open-source distributed training framework that has
> > > > shown 2x speedup compared to TensorFlow using Parameter Server. We
> > > > propose to add Horovod support to MXNet. This will help our users
> > > > achieve goal of linear scalability to 256 GPUs and beyond. Design
> > > > proposal on cwiki:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Horovod-MXNet+Integration
> > > >
> > > > Please feel free to let me know if you have any suggestions or
> > feedback.
> > > >
> > > > Regards,
> > > > Carl
> > > >
> > >
> >
>


Re: Apache MXNet v1.4.0 release status

2019-01-15 Thread Lin Yuan
Hi Steffen,

I would like to ask to include one more PR for 1.4.0.rc1:
https://github.com/apache/incubator-mxnet/pull/13845

This PR exports exception handling API of MXNet. It is needed by Horovod
with MXNet integration to elegantly throw exception at Python level rather
than a C++ abort.

Thanks,

Lin


On Tue, Jan 15, 2019 at 2:24 PM Steffen Rochel 
wrote:

> Dear MXNet community -
> Zach & friends made good progress resolving the licensing issues. One more
> PR on 1.4.x branch is expected today.
> The code freeze for 1.4.0.rc1 is Thursday Jan 17th 6pm PST.
> I'm asking the requester to add following PR to 1.4.x branch:
> Tao:
> https://github.com/apache/incubator-mxnet/pull/13882
> Kellen:
> https://github.com/apache/incubator-mxnet/pull/13697
> https://github.com/apache/incubator-mxnet/pull/13188
> https://github.com/apache/incubator-mxnet/pull/13727
> https://github.com/apache/incubator-mxnet/pull/13695
> Pedro:
> https://github.com/apache/incubator-mxnet/pull/13535
>
> If there are additional PR to be considered for 1.4.0.rc1 please send
> request to dev@.
>
> Regards,
> Steffen
>
> On Tue, Jan 8, 2019 at 11:28 AM Qing Lan  wrote:
>
> > Hi all,
> >
> > I added a section F in the document that explained the current
> > static-linked dependencies we used for official release. As there are a
> few
> > licenses are under BSD3 and GPL, we need to handle them in our next
> > release. Please take a look and leave any concerns you may have.
> >
> > Thanks,
> > Qing
> >
> > On 1/7/19, 8:33 PM, "kellen sunderland" 
> > wrote:
> >
> > So I see two quick options that should cut down on the dependency
> > licenses
> > required for TRT in the source release.
> >
> > 1: We can simply remove in the release package the submodules for
> onnx
> > in
> > folder
> > incubator-mxnet/3rdparty/onnx-tensorrt/third_party/onnx/third_party.
> > None of those dependencies are used in the build (I've just verified
> > locally on my machine).
> > 2: We can make a cmake based checkout system and ensure we only
> > checkout
> > the required files when TRT builds are enabled (similar to the
> current
> > mkl-ml setup).
> >
> > I'd suggest option 1 for this release, and that we support option 2
> > for the
> > 1.5 release.
> >
> > On Mon, Jan 7, 2019 at 8:19 PM Lv, Tao A  wrote:
> >
> > > What should I do for the double headers in
> > 3rdparty/mkldnn/src/cpu/xbyak/?
> > >
> > > -tao
> > >
> > > -Original Message-
> > > From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> > > Sent: Tuesday, January 8, 2019 10:51 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: Apache MXNet v1.4.0 release status
> > >
> > > Kellen and Tao -
> > > yes, the understanding is that dependencies need to be considered
> > and all
> > > licences referenced to include in top level LICENSE file.
> > > Appreciate your help with it.
> > > Steffen
> > >
> > > On Mon, Jan 7, 2019 at 6:39 PM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > Sorry to hear about the licensing issues.  I was following the
> > general
> > > > vote but I'm still lacking some clarity around what licenses in
> the
> > > > onnx-trt repo need to be surfaced.  I believe onnx-trt is MIT
> > > > licensed, but it includes Onnx as a third party repo which then
> > brings
> > > > in dependencies with a variety of licenses.  The proposal is that
> > we
> > > > look at these on an individual basis and then add them to our top
> > level
> > > LICENSE file right?
> > > >
> > > > An alternative is that we may be able to checkout a smaller
> source
> > > > code dependency tree if we remove a few unneeded ONNX's
> > dependencies
> > > > (pybind and google-bench).  My hope is that this wouldn't affect
> > our
> > > > compilation process and would get us down to two licenses to
> report
> > > > (just Onnx and Onnx-TRT, both MIT).
> > > >
> > > > On Mon, Jan 7, 2019 at 6:07 PM Meghna Baijal
> > > > 
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > > For some more context, these were the last emails I sent on the
> > dev
> > > > > and legal lists requesting help on the open questions  –
> > > > >
> > > > > 1. Question on legal about the CC-By-2.5 <
> > > > >
> > > >
> > http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201805.mbox
> > > > /%3CCAK1xzDe6ECToKt_2cTR_7txQQCwHeYfvxXDfmuGgfA3jaTs=
> > j...@mail.gmail.com
> > > > %3E
> > > > > >
> > > > > 2. Question on dev about googletest file <
> > > > >
> > > >
> > http://mail-archives.apache.org/mod_mbox/mxnet-dev/201804.mbox/%3CCAMG
> > > > gKDC8szdfFqQhhSNpwwT_3zi4LBS7A=u4v7kj4ule44u...@mail.gmail.com
> %3E
> > > > > >
> > > > > 3. General Request for review of the licenses wiki <
> > > > >
> > > >
> > 

Re: [Annoucement] New Committer -- Iblis Lin

2019-01-05 Thread Lin Yuan
Welcome Iblis,

Great to see a good Julia support in MXNet!

Lin

On Sat, Jan 5, 2019 at 12:32 PM Marco de Abreu 
wrote:

> Welcome Iblis,
>
> great to have you on board!
>
> -Marco
>
> Am Sa., 5. Jan. 2019, 21:13 hat Carin Meier 
> geschrieben:
>
> > Please join me in welcoming Iblis Lin as a new committer.
> >
> > He has been a long time contributor to the Julia package, is responsible
> > for bringing into the main MXNet repo, and is the current maintainer.
> >
> > https://github.com/apache/incubator-mxnet/commits?author=iblis17
> >
> > - Carin Meier
> >
>


Re: [Question] UI change policy in MXNet

2018-12-20 Thread Lin Yuan
Hi Anirudh,

Thanks a lot for your clarifications! I have some followup
questions/comments:

1) Which guideline should we follow when updating the UI in MXNet operators?
A) MXNet follows semantic versioning, so breaking changes to operator
interfaces can be introduced only in major versions.

(Lin:) My question is what style of UI guide we should follow. e.g. naming
convension, usage mode, etc. Something like numpy's style or tensorflow?

2) Who should approve the UI change?
A) Contributors who may have worked on the operator and/or other
contributors/committers.

(Lin:) Is it too local to reply on contributors to one/a few operators to
decide the UI. How can we make sure the consistency of UI across all
operators in MXNet?

3) In case of backward compatibility, should we favor breaking the backward
compatibility and update the release notes or adding a newer version of the
operator like ***_v2?
A) If the operator interfaces are not compatible, its fine to create
operator with the name "_v2" . In the next major version release, you can
add an alias for newer implementation and deprecate the older one.

(Lin) What if there is already "_v2", do we add "_v3", "_v4" as the project
evolves?

4) Which operator should go to contrib and which be implemented as regular?
A) I think this discussion may help:
https://github.com/apache/incubator-mxnet/pull/5499 . To summarize: contrib
was created for ops for which we provide limited guarantees with respect to
backward compatibility, interface changes, testing etc.

(Lin) This is definitely an informative discussion. It would be better if
we can put this in a more noticeable place for developers.


On Thu, Dec 20, 2018 at 1:39 PM Anirudh Subramanian 
wrote:

> 1) Which guideline should we follow when updating the UI in MXNet
> operators?
> A) MXNet follows semantic versioning, so breaking changes to operator
> interfaces can be introduced only in major versions.
>
> 2) Who should approve the UI change?
> A) Contributors who may have worked on the operator and/or other
> contributors/committers.
>
> 3) In case of backward compatibility, should we favor breaking the backward
> compatibility and update the release notes or adding a newer version of the
> operator like ***_v2?
> A) If the operator interfaces are not compatible, its fine to create
> operator with the name "_v2" . In the next major version release, you can
> add an alias for newer implementation and deprecate the older one.
>
> 4) Which operator should go to contrib and which be implemented as regular?
> A) I think this discussion may help:
> https://github.com/apache/incubator-mxnet/pull/5499 . To summarize:
> contrib
> was created for ops for which we provide limited guarantees with respect to
> backward compatibility, interface changes, testing etc.
>
> Anirudh
>
> On Thu, Dec 20, 2018 at 1:00 PM Lin Yuan  wrote:
>
> > Dear Community,
> >
> > As a contributor, I would like to know the current policy for updating UI
> > of an operator. I understand UI change should be introduced in major
> > release not minor release. However, it is still not quite clear to me
> > regarding the UI change process:
> >
> > 1) Which guideline should we follow when updating the UI in MXNet
> > operators?
> > 2) Who should approve the UI change?
> > 3) In case of backward compatibility, should we favor breaking the
> backward
> > compatibility and update the release notes or adding a newer version of
> the
> > operator like ***_v2?
> > 4) Which operator should go to contrib and which be implemented as
> regular?
> >
> > Any clarification is appreciated and it is helpful to guide PR reviewers
> as
> > well.
> >
> > Merry Christmas to ya'all!
> >
> > Lin
> >
>


[Question] UI change policy in MXNet

2018-12-20 Thread Lin Yuan
Dear Community,

As a contributor, I would like to know the current policy for updating UI
of an operator. I understand UI change should be introduced in major
release not minor release. However, it is still not quite clear to me
regarding the UI change process:

1) Which guideline should we follow when updating the UI in MXNet operators?
2) Who should approve the UI change?
3) In case of backward compatibility, should we favor breaking the backward
compatibility and update the release notes or adding a newer version of the
operator like ***_v2?
4) Which operator should go to contrib and which be implemented as regular?

Any clarification is appreciated and it is helpful to guide PR reviewers as
well.

Merry Christmas to ya'all!

Lin


Re: [Annoucement] New Committer -- Da Zheng

2018-12-17 Thread Lin Yuan
Congrats!

On Mon, Dec 17, 2018 at 9:19 AM Steffen Rochel 
wrote:

> Congratulation Da!
>
> On Mon, Dec 17, 2018 at 9:02 AM Tianqi Chen  wrote:
>
> > Dear Community:
> >
> > Please join me to welcome Da Zheng as a new committer of the MXNet.
> >
> > Da is the main author of MKL-DNN integration and recently he champions
> the
> > control flow support. He is one of the few "explorer style" contributors
> of
> > the community, who we desperately need in this fast change environment of
> > the deep learning system landscape.
> >
> > PRs https://github.com/apache/incubator-mxnet/commits?author=zheng-da
> > reviews  *
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93=is%3Apr+reviewed-by%3Azheng-da+
> > <
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93=is%3Apr+reviewed-by%3Azheng-da+
> > >*
> > dev@  https://lists.apache.org/list.html?d...@mxnet.apache.org:lte=3y:da-
> > zheng
> >
> > Tianqi
> >
>


[PROPOSAL] Large tensor support in MXNet

2018-12-02 Thread Lin Yuan
Dear Community,

As some of you may have already encountered, MXNet has a limitation in
supporting tensors of size greater than ~4.3 billion elements (2^32). The
root cause is because in MXNet backend 32-bit integer type is used as the
default integer data type for both computation and storage in many places.

Uplifting this limitation, however, is not simply replacing all 32-bit
integers by a larger data type (64-bit integer) as I detailed them in the
design proposal at
https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support. It
requires a systematic approach to address this problem in MXNet backend as
well as all APIs in different language bindings.

I will appreciate your suggestions in solving this problem systematically
and elegantly, as well as your help to support in different language
bindings other than Python.

Please add your comment in the design proposal or create tickets in the
JIRA epic:  https://issues.apache.org/jira/browse/MXNET-1184

Best Regards,

Lin


Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

2018-11-29 Thread Lin Yuan
Hi Steffen,

Can we add the following PR to 1.4.0 release:

https://github.com/apache/incubator-mxnet/pull/13452

It's just a Python API returning header path so it should not cause any
regression issues. But it is required for Horovod to integrate MXNet. It's
better to have this in a minor release than patch release.

Thanks,

Lin

On Thu, Nov 29, 2018 at 6:46 PM Steffen Rochel 
wrote:

> Hi Zhi - thanks for the improvement, which we should consider for 1.4.0.
> However, I don't see any tests with the PR and think it is too risky to add
> changes without tests. I will add your PR to the tracking list, but would
> like to ask you to add functional tests before completing the PR to master
> and v1.4.x branch.
>
> Steffen
>
> On Thu, Nov 29, 2018 at 5:01 PM Joshua Z. Zhang 
> wrote:
>
> > Hi, I would like to bring a critical performance and stability patch of
> > existing gluon dataloader to 1.4.0:
> > https://github.com/apache/incubator-mxnet/pull/13447 <
> > https://github.com/apache/incubator-mxnet/pull/13447>.
> >
> > This PR is finished, waiting for CI to pass.
> >
> > Steffen, could you help me add that to the tracked list?
> >
> > Best,
> > Zhi
> >
> > > On Nov 29, 2018, at 4:25 PM, Naveen Swamy  wrote:
> > >
> > > the tests are randomly failing in different stages
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-13105/
> > > This PR has failed 8 times so far
> > >
> > > On Thu, Nov 29, 2018 at 3:43 PM Steffen Rochel <
> steffenroc...@gmail.com>
> > > wrote:
> > >
> > >> Pedro - ok. Please add PR to v1.4.x branch after merge to master and
> > please
> > >> update tracking page
> > >> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> > >>>
> > >> .
> > >> Steffen
> > >>
> > >> On Thu, Nov 29, 2018 at 3:00 PM Pedro Larroy <
> > pedro.larroy.li...@gmail.com
> > >>>
> > >> wrote:
> > >>
> > >>> PR is ready from my side and passes the tests, unless somebody raises
> > >>> any concerns it's good to go.
> > >>> On Thu, Nov 29, 2018 at 9:50 PM Steffen Rochel <
> > steffenroc...@gmail.com>
> > >>> wrote:
> > 
> >  Pedro - added  to 1.4.0 tracking list
> >  <
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> > 
> > 
> >  Do you have already ETA?
> >  Steffen
> > 
> >  On Thu, Nov 29, 2018 at 6:13 AM Pedro Larroy <
> > >>> pedro.larroy.li...@gmail.com>
> >  wrote:
> > 
> > > Hi all.
> > >
> > > There are two important issues / fixes that should go in the next
> > > release in my radar:
> > >
> > > 1) https://github.com/apache/incubator-mxnet/pull/13409/files
> > > There is a bug in shape inference on CPU when not using MKL, also
> we
> > > are running activation on CPU via MKL when we compile CUDNN+MKLDNN.
> > > I'm finishing a fix for these issues in the above PR.
> > >
> > > 2) https://github.com/apache/incubator-mxnet/issues/13438
> > > We are seeing crashes due to unsafe setenv in multithreaded code.
> > > Setenv / getenv from multiple threads is not safe and is causing
> > > segfaults. This piece of code (the handlers in pthread_atfork)
> > >> already
> > > caused a very difficult to diagnose hang in a previous release,
> where
> > > a fork inside cudnn would deadlock the engine.
> > >
> > > I would remove setenv from 2) as a mitigation, but we would need to
> > > check for regressions as we could be creating additional threads
> > > inside the engine.
> > >
> > > I would suggest that we address these two major issues before the
> > >> next
> > > release.
> > >
> > > Pedro
> > >
> > >
> > >
> > > On Sun, Nov 25, 2018 at 11:41 PM Steffen Rochel <
> > >>> steffenroc...@gmail.com>
> > > wrote:
> > >>
> > >> Dear MXNet community,
> > >>
> > >> I will be the release manager for the upcoming Apache MXNet 1.4.0
> > > release.
> > >> Sergey Kolychev will be co-managing the release and providing help
> > >>> from
> > > the
> > >> committers side.
> > >> A release candidate will be cut on November 29, 2018 and voting
> > >> will
> > > start
> > >> December 7, 2018. Release notes have been drafted here [1]. If you
> > >>> have
> > > any
> > >> additional features in progress and would like to include it in
> > >> this
> > >> release, please assure they have been merged by November 27, 2018.
> > > Release
> > >> schedule is available here [2].
> > >>
> > >> Feel free to add any other comments/suggestions. Please help to
> > >>> review
> > > and
> > >> merge outstanding PR's and resolve issues impacting the quality of
> > >>> the
> > >> 

Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

2018-11-29 Thread Lin Yuan
https://github.com/apache/incubator-mxnet/pull/13452 is needed in 1.4.0 to
support Horovod integration project.

Thanks!

Lin


On Thu, Nov 29, 2018 at 1:40 PM Davydenko, Denis <
dzianis.davydze...@gmail.com> wrote:

> I suggest to include this issue into tracked ones for the release:
> https://github.com/apache/incubator-mxnet/issues/12255. It has proven to
> be a problem with MXNet start up time and it will cause even more problems
> down the line with Elastic Training, EIA where MXNet is a commodity rather
> than statically running process. Also it already causes noticeable issues
> with MMS (MXNet Model Server [1]). MMS users already noticed significant
> lag with MMS start up time, especially on beefy instances like C5.18xl with
> 72 vCPUs. MMS spins up multiple MXNet instances during its start up to
> ensure full utilization of CPU or GPU resources on the host. By default it
> spins up as many MXNet instances as there are cores (either CPU or GPU
> cores) and the bigger the host the more MXNet instances are spun up. And
> the more MXNet instances spun up - the more each instance takes time to
> start. For example, on C5.4xl users reported waiting for as long as 2
> minutes to have just 8 MXNet instances spun up with MXNet 1.3. Same efforts
> with MXNet 1.1 take less than 0.5 sec.
>
> This is quite a significant regression in MXNet when it comes to start up
> experience. I suggest to consider this as a blocker for 1.4.
>
> [1] https://github.com/awslabs/mxnet-model-server
>
> On 11/29/18, 12:51 PM, "Steffen Rochel"  wrote:
>
> added to 1.4.0 tracking list
> <
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandStatus-OpenPRstotrack
> >
> .
> Steffen
>
> On Thu, Nov 29, 2018 at 9:32 AM Zheng, Da 
> wrote:
>
> > Hello Steffen,
> >
> > Can this bug be fixed in 1.4.0 release? It's a significant
> performance
> > regression on sparse matrix multiplication.
> > https://github.com/apache/incubator-mxnet/issues/13449
> >
> > Thanks,
> > Da
> >
> > On 11/26/18, 6:42 AM, "Steffen Rochel" 
> wrote:
> >
> > Dear MXNet community,
> >
> > I will be the release manager for the upcoming Apache MXNet 1.4.0
> > release.
> > Sergey Kolychev will be co-managing the release and providing
> help
> > from the
> > committers side.
> > A release candidate will be cut on November 29, 2018 and voting
> will
> > start
> > December 7, 2018. Release notes have been drafted here [1]. If
> you
> > have any
> > additional features in progress and would like to include it in
> this
> > release, please assure they have been merged by November 27,
> 2018.
> > Release
> > schedule is available here [2].
> >
> > Feel free to add any other comments/suggestions. Please help to
> review
> > and
> > merge outstanding PR's and resolve issues impacting the quality
> of the
> > 1.4.0 release.
> >
> > Regards,
> >
> > Steffen
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> >
> > [2]
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Plan+and+Status
> >
> >
> >
> >
> > On Tue, Nov 20, 2018 at 7:15 PM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > Spoke too soon[1], looks like others have been adding Turing
> support
> > as
> > > well (thanks to those helping with this).  I believe there's
> still a
> > few
> > > changes we'd have to make to claim support though (mshadow
> CMake
> > changes,
> > > PyPi package creation tweaks).
> > >
> > > 1:
> > >
> > >
> >
> https://github.com/apache/incubator-mxnet/commit/2c3357443ec3d49a11e93c89f278264ce10c2f08
> > >
> > > On Tue, Nov 20, 2018 at 7:00 PM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > Hey Steffen, I'd like to be able to merge this PR for
> version 1.4:
> > > > https://github.com/apache/incubator-mxnet/pull/13310 . It
> fixes a
> > > > regression in master which causes incorrect feature vectors
> to be
> > output
> > > > when using the TensorRT feature.  (Thanks to Nathalie for
> helping
> > me
> > > track
> > > > down the root cause of the issue).   I'm currently blocked
> on a CI
> > issue
> > > I
> > > > haven't seen before, but hope to have it resolved by EOW.
> > > >
> > > > One call-out I would make is that we currently don't support
> Turing
> > > > architecture (sm_75).  I've been slowly trying to add
> support, but
> 

Re: [Question] Difference between "Feature" and "Feature request" labels in Github

2018-11-13 Thread Lin Yuan
Thanks guys for your prompt actions. I am so impressed!

Lin

On Tue, Nov 13, 2018 at 5:33 PM Sheng Zha  wrote:

> I was in the middle of transferring all items labeled with "Feature" to the
> "Feature request" label when "Feature" label was deleted. I'm not sure who
> deleted the "Feature" label but it's gone now.
>
> -sz
>
> On Tue, Nov 13, 2018 at 5:05 PM Anirudh Acharya 
> wrote:
>
> > This issue was raised before here -
> >
> >
> https://lists.apache.org/thread.html/3e988e6bd82cb2d69ba20c21bf763952ed22a5732e61f6fba1f89ac8@%3Cdev.mxnet.apache.org%3E
> >
> > We need someone with committer privileges to fix it.
> >
> >
> > Thanks
> > Anirudh
> >
> >
> >
> > On Tue, Nov 13, 2018 at 4:36 PM Lin Yuan  wrote:
> >
> > > Dear Community,
> > >
> > > I often see there are "Feature" and "Feature request" labels in Github
> > > issues. May I know the difference? If they are meant to be the same
> > thing,
> > > can we only keep one of them?
> > >
> > > Thanks,
> > >
> > > Lin
> > >
> >
>


[Question] Difference between "Feature" and "Feature request" labels in Github

2018-11-13 Thread Lin Yuan
Dear Community,

I often see there are "Feature" and "Feature request" labels in Github
issues. May I know the difference? If they are meant to be the same thing,
can we only keep one of them?

Thanks,

Lin


Catch divide-by-zero floating number exception in backend

2018-11-08 Thread Lin Yuan
Dear MXNet Community,

I recently found the NaN errors sometimes could be due to some
divide-by-zero float number bugs in engine backend. However, by default,
such an exception will not be thrown. I added a signal trap to catch this
error (https://github.com/apache/incubator-mxnet/pull/13190) and caught a
few exceptions when running the python unit test. But this only works for
Linux OS.

I would like to get more feedback on the best practice to catch such bugs
in the code and if we should enforce such checks in CI. Any comment is
appreciated.

Best Regards,

Lin


Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-06 Thread Lin Yuan
Hi Anton,

Thanks for helping the release.
The following PRs are needed by customers who want to use deterministic
CUDNN convolution algorithms:

https://github.com/apache/incubator-mxnet/pull/12992
https://github.com/apache/incubator-mxnet/pull/13049

Thanks!

Lin


On Tue, Nov 6, 2018 at 1:51 PM Aaron Markham 
wrote:

> Hi Anton,
> I have the following suggestions for fixes to include in 1.3.1. These each
> have updates to files that will impact docs generation for the 1.3.x
> version of the website's Python API docs:
>
> https://github.com/apache/incubator-mxnet/pull/12879
> https://github.com/apache/incubator-mxnet/pull/12871
> https://github.com/apache/incubator-mxnet/pull/12856
>
> Thanks,
> Aaron
>
> On Tue, Nov 6, 2018 at 1:29 PM Lai Wei  wrote:
>
> > Hi Anton,
> >
> > Thanks for driving this, I would like to include the following fix in
> > 1.3.1:
> > Allow infer shape partial on foreach operator:
> > https://github.com/apache/incubator-mxnet/pull/12471
> >
> > Keras-MXNet needs this functionality to infer shape partially
> > on foreach operator. (Used in RNN operators)
> >
> > Thanks a lot!
> >
> >
> > Best Regards
> > Lai Wei
> >
> >
> >
> > On Tue, Nov 6, 2018 at 10:44 AM Haibin Lin 
> > wrote:
> >
> > > Hi Naveen and Anton,
> > >
> > > Thanks for pointing that out. You are right that these are not critical
> > > fixes. Putting them in 1.4.0 is more appropriate. PRs are closed.
> > >
> > > Best,
> > > Haibin
> > >
> > > On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy 
> wrote:
> > >
> > > > Please note that this is a patch release(1.3.1) to address critical
> > > bugs!,
> > > > For everything else please wait for 1.4.0 which is planned very
> shortly
> > > > after 1.3.1
> > > >
> > > > > On Nov 6, 2018, at 7:17 AM, Anton Chernov 
> > wrote:
> > > > >
> > > > > The following PR's have been created so far:
> > > > >
> > > > > Infer dtype in SymbolBlock import from input symbol (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13117
> > > > >
> > > > > [MXNET-953] Fix oob memory read (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13118
> > > > >
> > > > > [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13119
> > > > >
> > > > > [MXNET-922] Fix memleak in profiler (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13120
> > > > >
> > > > > Set correct update on kvstore flag in dist_device_sync mode
> (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13121
> > > > >
> > > > > update mshadow (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13122
> > > > >
> > > > > CudnnFind() usage improvements (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13123
> > > > >
> > > > > Fix lazy record io when used with dataloader and multi_worker > 0
> > > > (v1.3.x)
> > > > > https://github.com/apache/incubator-mxnet/pull/13124
> > > > >
> > > > >
> > > > > As stated previously I would be rather opposed to have following
> PR's
> > > it
> > > > in
> > > > > the patch release:
> > > > >
> > > > > Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
> > > > > https://github.com/apache/incubator-mxnet/pull/13129
> > > > >
> > > > > sample_like operators (#13034) v1.3.x
> > > > > https://github.com/apache/incubator-mxnet/pull/13130
> > > > >
> > > > >
> > > > > Best
> > > > > Anton
> > > > >
> > > > > вт, 6 нояб. 2018 г. в 16:06, Anton Chernov :
> > > > >
> > > > >> Hi Haibin,
> > > > >>
> > > > >> I have a few comments regarding the proposed performance
> improvement
> > > > >> changes.
> > > > >>
> > > > >> CUDNN support for LSTM with projection & clipping
> > > > >> https://github.com/apache/incubator-mxnet/pull/13056
> > > > >>
> > > > >> There is no doubt that this change brings value, but I don't see
> it
> > > as a
> > > > >> critical bug fix. I would rather leave it for the next major
> > release.
> > > > >>
> > > > >> sample_like operators
> > > > >> https://github.com/apache/incubator-mxnet/pull/13034
> > > > >>
> > > > >> Even if it's related to performance, this is an addition of
> > > > functionality
> > > > >> and I would also push this to be in the next major release only.
> > > > >>
> > > > >>
> > > > >> Best
> > > > >> Anton
> > > > >>
> > > > >>
> > > > >> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov :
> > > > >>
> > > > >>> Hi Patric,
> > > > >>>
> > > > >>> This change was listed in the 'PR candidates suggested for
> > > > consideration
> > > > >>> for v1.3.1 patch release' section [1].
> > > > >>>
> > > > >>> You are right, I also think that this is not a critical hotfix
> > change
> > > > >>> that should be included into the 1.3.1 patch release.
> > > > >>>
> > > > >>> Thus I'm not making any further efforts to bring it in.
> > > > >>>
> > > > >>> Best
> > > > >>> Anton
> > > > >>>
> > > > >>> [1]
> > > > >>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
> > > > 

Re: [DISCUSS] Speedup non-code PR in CI

2018-11-06 Thread Lin Yuan
Kellen and Pedro,

Thanks for your pointers. I am not an expert in CI but one naive speedup I
can see is that if the PR only contains *.md file, then skip the build and
testing cycles. This can make documentation/correction easier and save
computation resource for other needed tests. Any side effect there?

Thanks,

Lin


[DISCUSS] Speedup non-code PR in CI

2018-11-06 Thread Lin Yuan
Dear Community,

I recently submitted a few small PRs with only changes in README files.
However, I noticed they still triggered the full cycle of CI including
build and test on all platforms.

Do we have a plan to speed up this process, maybe skipping non-code related
PRs in CI? Sorry, if this topic has been raised earlier and if not I
appreciate any comments.

Cheers,

Lin


Re: Horovod-MXNet Integration

2018-11-02 Thread Lin Yuan
Hi Mu,

Darren (@yuxihu ) and I have been working on
releasing MXNet-Horovod integration in production. We have made some
changes on both MXNet and Horovod sides. The changes on MXNet side have
mostly been merged and we are working to merge code to horovod repo. We
will send a design doc to you for review again next week.

Thanks for your feedback,

Lin

On Wed, Oct 31, 2018 at 12:03 PM Mu Li  wrote:

> Thanks for your contribution, Carl.
>
> I remember I left a comment on the proposal, but today I found it was
> disappeared. My suggestion is trying best to not change the existing API.
> The reason is that we need to change all trainers on the frontend that uses
> the existing kvstore APIs, which may cause confusion to users.
>
> The current proposal wants add the following 4 APIs into kvstore:
>
>
>-
>
>kv.pushpull
>-
>
>kv.broadcast
>-
>
>kv.local_rank
>-
>
>kv.num_local_workers
>
>
> Pushpull can be done with a sequential push and pull, you can do nothing in
> push and put all workloads into pushpull. Broadcast can be implemented by
> pull.
>
> What's local workers? GPUs in the single machine? If so, we can query it
> directly.
>
>
> On Fri, Sep 14, 2018 at 4:46 PM Carl Yang  wrote:
>
> > Hi,
> >
> > Currently, MXNet distributed can only be done using parameter server.
> > Horovod is an open-source distributed training framework that has
> > shown 2x speedup compared to TensorFlow using Parameter Server. We
> > propose to add Horovod support to MXNet. This will help our users
> > achieve goal of linear scalability to 256 GPUs and beyond. Design
> > proposal on cwiki:
> >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Horovod-MXNet+Integration
> >
> > Please feel free to let me know if you have any suggestions or feedback.
> >
> > Regards,
> > Carl
> >
>


Re: [Discussion] Recognise Reviewers, Besides Committers and PMC

2018-10-20 Thread Lin Yuan
+1 sounds like a great idea. We also need a mechanism to identify “good
reviewers”. Maybe we can count the number of :thumsup: in each review. Or
any other better way?

On Fri, Oct 19, 2018 at 8:22 PM Tianqi Chen 
wrote:

> Dear MXNet Community:
>
> There is a great discussion going on in terms of lowering the barrier of
> entries and encourage more contribution to the project.  One of the general
> goals is to encourage a broader pool of contributions. I want to make the
> following proposal:
>
> Besides Committers and PMC, let us also recognize Reviewers in the
> community.  This is a "pseudo role" as there is no such official role in
> Apache. But I want to explore the possibility of recognising active
> reviewers for example, by adding a list of names in the contributor list.
> In general, I find it is really helpful to have more code reviews.
> Recognising good reviewers early enables us to find candidate for
> committers, and encourage them to contribute and understand what is the bar
> of code quality that is required to merge the code.
>
> This can provide the community more evidence when recruiting new
> committers. After all committers is about write access to the code and
> understand the consequence of the responsibility -- which is usually can be
> found in high quality reviews.
>
> Please let me know what you think.
> Tianqi
>


Re: CUDNN algorithm selection failure

2018-10-01 Thread Lin Yuan
I could not reproduce the error on an EC2 g3x8 instance making it hard to
debug. I also suspect it was due to resource usage limit on ci   Instance.

On Mon, Oct 1, 2018 at 10:40 PM Pedro Larroy 
wrote:

> It doesn't look like flakiness to me at first sight. I think it might be
> related to resource usage / allocation / leak in the worst case.
>
> Could be that there was not enough memory GPU memory at the time of test
> execution. But I'm just speculating, hence my original question.
>
> Pedro.
>
> On Mon, Oct 1, 2018 at 8:16 PM Lin Yuan  wrote:
>
> > Hi Pedro,
> >
> > I also got this failure in my PR
> >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11742/27/pipeline
> >
> > I was not able to identify the root cause of it from changelist. Are you
> > suggesting there is some flakiness in the master branch too?
> >
> > Thanks,
> >
> > Lin
> >
> > On Mon, Oct 1, 2018 at 4:55 PM Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> > wrote:
> >
> > > Hi
> > >
> > > I saw this failure on CI:
> > >
> > >
> >
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1697/pipeline
> > >
> > > Have you seen other cases where we fail to select the best CUDNN
> > algorithm?
> > > In which circumstances this could happen, and do you think is a good
> idea
> > > to have one selected by default as a last resort?
> > >
> > >
> > > Pedro.
> > >
> >
>


Re: [DISCUSS] Use modernized C++11 range loops uniformly throughout the project

2018-09-28 Thread Lin Yuan
+1

Using range-based for-loop whenever possible improves code readability and
makes code less prone to human error.

I did some preliminary research on Google and did not find any complaint
about its performance drawback. Here is one piece from StackOverflow for
reference:
https://stackoverflow.com/questions/10821756/is-the-ranged-based-for-loop-beneficial-to-performance

Lin

On Fri, Sep 28, 2018 at 7:42 AM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> "Range loops aren’t always the most performant way" Do you have an example
> where there's a perf difference?
>
> "In addition, sometimes you want the index. Or maybe you want to iterate
> backwards, or not start from the first, etc. Maybe you want the iterator
> because you remove it from the list at the bottom of the loop Seems
> like a rule for the sake of having a rule."
>
> I should have been more clear about this point.  If you're using the index
> in the loop, doing reverse iteration, or not iterating from start-to-end
> this inspection is smart enough to realize it and will not suggest
> optimizing that type of loop.  The loops that would be changes are _only_
> the loops which are detected as equivalent to range-loops.  Examples can be
> found here:
> https://clang.llvm.org/extra/clang-tidy/checks/modernize-loop-convert.html
> or you can look at what's been changed in the ref PR.  I've initially set
> our confidence level at 'reasonable' but we could also set to 'safe' which
> would further reduce the number of loops the check would apply to.
>
> -Kellen
>
> On Fri, Sep 28, 2018 at 3:54 PM Chris Olivier 
> wrote:
>
> > -1
> >
> > Range loops aren’t always the most performant way. In addition, sometimes
> > you want the index. Or maybe you want to iterate backwards, or not start
> > from the first, etc. Maybe you want the iterator because you remove it
> from
> > the list at the bottom of the loop Seems like a rule for the sake of
> > having a rule.
> >
> > On Fri, Sep 28, 2018 at 2:12 AM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > Hello MXNet devs,
> > >
> > > I'd like to discuss uniformly adopting C++11 range loops in the MXNet
> > > project.  The benefits I see are:
> > >
> > > *  Improved C++ readability (examples below).
> > > *  Consistency with other languages.  The range-loops are quite similar
> > to
> > > loops almost all other programming languages.  Given we're a project
> that
> > > supports many languages this language consistency could be positive for
> > our
> > > community.
> > > * Consistency within the same project.  Currently different authors
> have
> > > different loops styles which hurts codebase readability.
> > > *  Best available performance.  There are often multiple ways to write
> > > loops in C++ with subtle differences in performance and memory usage
> > > between loop methods.  Using range-loops ensures we get the best
> possible
> > > perf using an intuitive loop pattern.
> > > *  Slightly lower chance for bugs / OOB accesses when dealing with
> > indexing
> > > in an array for example.
> > >
> > > If we decide to enable this uniformly throughout the project we can
> > enable
> > > this policy with a simple clang-tidy configuration change.  There would
> > be
> > > no need for reviewers to have to manually provide feedback when someone
> > > uses an older C++ loops style.
> > >
> > > -Kellen
> > >
> > > Reference PR:  https://github.com/apache/incubator-mxnet/pull/12356/
> > > Previous clang-tidy discussion on the list:
> > >
> > >
> >
> https://lists.apache.org/thread.html/b0ae5a9df5dfe0d9074cb2ebe432264db4fa2175b89fa43a5f6e36be@%3Cdev.mxnet.apache.org%3E
> > >
> > > -
> > > Examples:
> > > for (auto axis_iter = param.axis.begin() ; axis_iter!=
> param.axis.end();
> > > ++axis_iter) {
> > > CHECK_LT(*axis_iter, static_cast(ishape.ndim()));
> > > stride_[reverse_index] = ishape[*axis_iter];
> > > ...
> > > -->
> > > for (int axis : param.axis) {
> > > CHECK_LT(axis, static_cast(ishape.ndim()));
> > > stride_[reverse_index] = ishape[axis];
> > > ...
> > > --
> > > for (size_t i = 0; i < in_array.size(); i++) {
> > > auto  = in_array[i];
> > > pre_temp_buf_.emplace_back(nd.shape(), nd.ctx(), true, nd.dtype());
> > > }
> > > -->
> > > for (auto & nd : in_array) {
> > > pre_temp_buf_.emplace_back(nd.shape(), nd.ctx(), true, nd.dtype());
> > > }
> > >
> >
>


Re: [LAZY VOTE] Consolidating developer guide in one place (cwiki preferred)

2018-09-28 Thread Lin Yuan
Hi Aaron,

Thanks a lot for effort. This consolidation will make it more convenient
for developers to find development resource and help to attract more
contributors.

I have also created a story to make it easy for developers to navigate from
mxnet.io: https://issues.apache.org/jira/browse/MXNET-1002

Thanks!

Lin

On Wed, Sep 26, 2018 at 10:24 AM Aaron Markham 
wrote:

> I think the latest feedback has been great. It seems to be mostly user
> level issues though. Installation and usage primarily, with a sprinkle of
> *if that stuff was better then I might be able to contribute*.
>
> I've (with a few other contributors) tackled some of the very direct bits
> of feedback for the website by incremental improvement of the install
> pages, Gluon info, and UX for the API docs.
>
> I've started additional planning for updates by adding an epic with
> specific stories and tasks to Jira for the documentation pipeline (the
> backend part of the website build):
> https://issues.apache.org/jira/browse/MXNET-957
>
> I've also added one that is more specific to the website's content:
> https://issues.apache.org/jira/browse/MXNET-986
> This is where I've captured only two tasks related to transitioning content
> related to "contributing to MXNet" over to the wiki. Any pointers on which
> content to move would help. These could be added as tasks too.
>
> I welcome any suggestions, additions, and contributions to either of these
> epics.
>
> Cheers,
> Aaron
>
> On Wed, Sep 26, 2018, 00:02 Lin Yuan  wrote:
>
> > Hi Aaron,
> >
> > Do we have a resolution for this proposal yet? Recently, there have been
> > many asks for a better documentation for MXNet developers. I think it's a
> > good time that we consolidate the developer documentation in a central
> > place. Any thoughts or plan?
> >
> > Many Thanks,
> >
> > Lin
> >
> > On Tue, Sep 4, 2018 at 1:55 PM Lin Yuan  wrote:
> >
> > > +1
> > >
> > > On Tue, Sep 4, 2018 at 1:46 PM Aaron Markham <
> aaron.s.mark...@gmail.com>
> > > wrote:
> > >
> > >> I'd like to call for a lazy vote on this before proceeding. Already
> had
> > >> some +1s but let's be sure.
> > >>
> > >> The vote is to move developer guide info to cwiki. User guides would
> > >> remain
> > >> on the website.
> > >>
> > >> On Tue, Aug 21, 2018 at 12:53 PM sandeep krishnamurthy <
> > >> sandeep.krishn...@gmail.com> wrote:
> > >>
> > >> > +1
> > >> > Thanks Lin and Aaron. I agree website to cover all user facing
> > >> > documentation and a separate consolidated and organized developer
> > >> focussed
> > >> > docs in one place (cwiki).
> > >> >
> > >> >
> > >> > Note: Permissions on cwiki is currently not well managed with many
> > >> people
> > >> > having full admin rights to edit/create/delete pages. Should be fine
> > for
> > >> > now, but, when we start accumulating many documents and resources,
> we
> > >> > should probably revisit on Delete permissions.
> > >> >
> > >> >
> > >> > On Tue, Aug 21, 2018 at 11:57 AM Lin Yuan 
> > wrote:
> > >> >
> > >> > > Hi Aaron,
> > >> > >
> > >> > > Thanks for your answer. I think it's a very worthwhile effort to
> > move
> > >> all
> > >> > > the developer related content from mxet.io website to a dedicated
> > >> > > developer
> > >> > > site. Would you like to initiate this effort?
> > >> > >
> > >> > > Best,
> > >> > >
> > >> > > Lin
> > >> > >
> > >> > > On Wed, Aug 15, 2018 at 3:47 PM Haibin Lin <
> > haibin.lin@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > +1
> > >> > > >
> > >> > > > On Wed, Aug 15, 2018 at 1:10 PM, Aaron Markham <
> > >> > > aaron.s.mark...@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi Lin, I agree with this organization. If you feel like
> > >> somethings
> > >> > > > should
> > >> > > > > be transitioned from the website to the wiki, I can help with
> > >> that,
> > >> > but
> > >> > > > f

Re: [LAZY VOTE] Consolidating developer guide in one place (cwiki preferred)

2018-09-26 Thread Lin Yuan
Hi Aaron,

Do we have a resolution for this proposal yet? Recently, there have been
many asks for a better documentation for MXNet developers. I think it's a
good time that we consolidate the developer documentation in a central
place. Any thoughts or plan?

Many Thanks,

Lin

On Tue, Sep 4, 2018 at 1:55 PM Lin Yuan  wrote:

> +1
>
> On Tue, Sep 4, 2018 at 1:46 PM Aaron Markham 
> wrote:
>
>> I'd like to call for a lazy vote on this before proceeding. Already had
>> some +1s but let's be sure.
>>
>> The vote is to move developer guide info to cwiki. User guides would
>> remain
>> on the website.
>>
>> On Tue, Aug 21, 2018 at 12:53 PM sandeep krishnamurthy <
>> sandeep.krishn...@gmail.com> wrote:
>>
>> > +1
>> > Thanks Lin and Aaron. I agree website to cover all user facing
>> > documentation and a separate consolidated and organized developer
>> focussed
>> > docs in one place (cwiki).
>> >
>> >
>> > Note: Permissions on cwiki is currently not well managed with many
>> people
>> > having full admin rights to edit/create/delete pages. Should be fine for
>> > now, but, when we start accumulating many documents and resources, we
>> > should probably revisit on Delete permissions.
>> >
>> >
>> > On Tue, Aug 21, 2018 at 11:57 AM Lin Yuan  wrote:
>> >
>> > > Hi Aaron,
>> > >
>> > > Thanks for your answer. I think it's a very worthwhile effort to move
>> all
>> > > the developer related content from mxet.io website to a dedicated
>> > > developer
>> > > site. Would you like to initiate this effort?
>> > >
>> > > Best,
>> > >
>> > > Lin
>> > >
>> > > On Wed, Aug 15, 2018 at 3:47 PM Haibin Lin 
>> > > wrote:
>> > >
>> > > > +1
>> > > >
>> > > > On Wed, Aug 15, 2018 at 1:10 PM, Aaron Markham <
>> > > aaron.s.mark...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi Lin, I agree with this organization. If you feel like
>> somethings
>> > > > should
>> > > > > be transitioned from the website to the wiki, I can help with
>> that,
>> > but
>> > > > for
>> > > > > the moment I've been suggesting that new developer-focused
>> content be
>> > > > > placed on the wiki.
>> > > > >
>> > > > > On Tue, Aug 14, 2018 at 10:40 AM, Lin Yuan 
>> > > wrote:
>> > > > >
>> > > > > > Dear MXNet community,
>> > > > > >
>> > > > > > As a developer, I noticed we have some developer guide
>> scattered in
>> > > > > > different websites (mxnet.io, cwiki):
>> > > > > >
>> > > > > > E.g.
>> > > > > >
>> > > > > > How to Create New Operators (Layers): [
>> > > > > > https://mxnet.incubator.apache.org/faq/new_op.html]
>> > > > > > A Guide to Implementing Sparse Operators in MXNet Backend [
>> > > > > > https://cwiki.apache.org/confluence/display/MXNET/A+
>> > > > > > Guide+to+Implementing+Sparse+Operators+in+MXNet+Backend
>> > > > > > ]
>> > > > > >
>> > > > > > When searching developer guide by keyword, only one of them can
>> be
>> > > > > returned
>> > > > > > on either site.
>> > > > > >
>> > > > > > It will be more convenient for developers if all the developer
>> > guide
>> > > > > > resides on cwiki and all user guide (non-developer) on the
>> > mxnet.io
>> > > > > > website. We can add a link on mxnet.io to refer all developers
>> to
>> > > > cwiki
>> > > > > > for
>> > > > > > guidance.
>> > > > > >
>> > > > > > Any comment is appreciated.
>> > > > > >
>> > > > > > Best Regards,
>> > > > > >
>> > > > > > Lin
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> >
>> > --
>> > Sandeep Krishnamurthy
>> >
>>
>


Re: [DISCUSS] Build OSX builds in CI (possibly with TravisCI).

2018-09-18 Thread Lin Yuan
; > >
> > > > > > The job only compiles MXNet on Mac and currently does not run
> unit
> > > > tests
> > > > > -
> > > > > > we expect the overall execution duration to be around 6 minutes
> and
> > > > thus
> > > > > > faster than the full Jenkins pipeline. The status is set to "not
> > > > > required"
> > > > > > which means that it does not block merging if that job fails
> since
> > > the
> > > > > > pipeline is still in beta. But in general, it would be good if
> > > > committers
> > > > > > review the results in case the job shows a failure. Our last
> known
> > > > state
> > > > > is
> > > > > > that the pipeline works properly, but we will keep everybody up
> to
> > > date
> > > > > in
> > > > > > case we get aware of any problems.
> > > > > >
> > > > > > The next step will be integration of Python CPU unit tests. There
> > > will
> > > > > be a
> > > > > > separate email if we got an update on that manner.
> > > > > >
> > > > > > Special thanks to Kellen Sunderland for the contribution of this
> > > Travis
> > > > > CI
> > > > > > pipeline.
> > > > > >
> > > > > > Best regards,
> > > > > > Marco
> > > > > >
> > > > > > On Wed, Sep 5, 2018 at 8:19 PM Tianqi Chen <
> > tqc...@cs.washington.edu
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Alrite, then I think it is fine as long as we can kept up with
> > > build
> > > > > > speed
> > > > > > > without timeout.
> > > > > > >
> > > > > > >
> > > > > > > Tianqi
> > > > > > >
> > > > > > > On Wed, Sep 5, 2018 at 9:14 AM kellen sunderland <
> > > > > > > kellen.sunderl...@gmail.com> wrote:
> > > > > > >
> > > > > > > > Travis actually has explicit support for ccache, it's a
> > platform
> > > > > > feature.
> > > > > > > > I've run it and it seems to work quite well.  See for example
> > > this
> > > > > > build:
> > > > > > > >
> > > > > >
> > > >
> > https://travis-ci.org/KellenSunderland/incubator-mxnet/builds/424768656
> > > > > > > >
> > > > > > > > On Wed, Sep 5, 2018 at 7:10 PM Tianqi Chen <
> > > > tqc...@cs.washington.edu
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Travis it self is stateless, which means ccache is not
> likely
> > > > going
> > > > > > to
> > > > > > > > > work. As far as I understand, if jenkins master is in the
> > > public
> > > > > > > domain,
> > > > > > > > > you do not need to setup a vpn to the subset of the master.
> > > > > > > > >
> > > > > > > > > As for versions of MacOS, we are likely going to be fine
> with
> > > one
> > > > > > > > version,
> > > > > > > > > as usually the problems exhibits on mac are similar
> > > > > > > > >
> > > > > > > > > Tianqi
> > > > > > > > > On Wed, Sep 5, 2018 at 9:04 AM kellen sunderland <
> > > > > > > > > kellen.sunderl...@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > @Tianqi: Yeah there's going to be a lot of trade-offs to
> > > using
> > > > > > > > Travis.  I
> > > > > > > > > > hope we can get it running fast enough with ccache that
> it
> > > > won't
> > > > > > > > timeout
> > > > > > > > > > when running tests, but even that is questionable.  In my
> > > > private
> > > > > > > > testing
> > > > > > > > > > it was running in about 35 minutes and the global timeout
> > for
> > > > > > Travis
> > > &g

Re: [DISCUSS] Build OSX builds in CI (possibly with TravisCI).

2018-09-05 Thread Lin Yuan
Hi Kellen,

Many thanks for your and Marco's effort! I think this is a very crucial
piece to improve MXNet stability.

To add some data points:
1) Customers using CoreML to MXNet converter were blocked for a while
because the converter was broken and no unit test was in place to detect
that.
2) Developers on Mac cannot verify their local commits because some unit
tests on master were broken. This wasted much time and resource on jenkins
server to detect the failure.
3) Please consider running the CI on Mac OS 10.13 since this is the minimum
Mac OS version that supports CoreML (to support CoreML to MXNet converter)

Best Regards,

Lin

On Wed, Sep 5, 2018, 3:02 AM kellen sunderland 
wrote:

> I'm bumping this thread as we've recently had our first serious bug on
> MacOS that would have been caught by enabling Travis.
>
> I'm going to do a little experimental work together with Marco with the
> goal of enabling a minimal Travis build that will run python tests.  So far
> I've verified that Travis will in fact find a bug that currently exists in
> master and has been reproduced by MacOS clients.  This indicates to me that
> adding Travis will add value to our CI.
>
> My best guess is that it might take us some iteration before we find a
> scalable way to integrate Travis.  Given this we're going to enable Travis
> in non-blocking mode (i.e. failures are safe to ignore for the time being).
>
> To help mitigate the risk of timeouts, and to remove legacy code I'm going
> to re-create the travis.yml file from scratch.  I think it'll be much less
> confusing if we only have working code related to Travis in our codebase,
> so that contributors won't have to experiment to see what is or isn't
> working.  We've got some great, but slightly out-of-date functionality in
> the legacy .travis.yml file.  I hope we can work together to update the
> legacy features, ensure they work with the current folder structure and
> also make sure the features run within Travis's 45 minute global time
> window.
>
> I'd also like to set expectations that this is strictly a volunteer
> effort.  I'd welcome help from the community for support and maintenance.
> The model downloading caching work particularly stands out to me as
> something I'd like to re-enable again as soon as possible.
>
> -Kellen
>
> On Tue, Jan 9, 2018 at 11:52 AM Marco de Abreu <
> marco.g.ab...@googlemail.com>
> wrote:
>
> > Looks good! +1
> >
> > On Tue, Jan 9, 2018 at 10:24 AM, kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > I think most were in favour of at a minimum creating a clang build so
> > I've
> > > created a PR
> > > https://github.com/apache/incubator-mxnet/pull/9330/commits/
> > > 84089ea14123ebe4d66cc92e82a2d529cfbd8b19.
> > > My hope is this will catch many of the issues blocking OSX builds.  In
> > fact
> > > it already caught one issue.  If you guys are in favour I can remove
> the
> > > WIP and ask that it be merged.
> > >
> > > On Thu, Jan 4, 2018 at 6:29 PM, Chris Olivier 
> > > wrote:
> > >
> > > > Nope, I have been on vacation.
> > > >
> > > > On Thu, Jan 4, 2018 at 9:10 AM, kellen sunderland <
> > > > kellen.sunderl...@gmail.com> wrote:
> > > >
> > > > > Hope everyone had a good break.  Just wanted to check if there were
> > > > further
> > > > > thoughts on OSX builds.  Chris, did you have time to look into
> > > > virtualizing
> > > > > Mac OS?  Would it make sense for us to put something in place in
> the
> > > > > interim e.g. the clang solution?
> > > > >
> > > > > On Tue, Dec 12, 2017 at 7:59 PM, de Abreu, Marco <
> mab...@amazon.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks for looking into this, Chris! No hurries on that one,
> we’ll
> > > look
> > > > > > into it next stage when we add new system- and
> build-configurations
> > > to
> > > > > the
> > > > > > CI.
> > > > > >
> > > > > > On 12.12.17, 19:12, "Chris Olivier" 
> wrote:
> > > > > >
> > > > > > I am on vacation starting Thursday.
> > > > > >
> > > > > > On Tue, Dec 12, 2017 at 9:49 AM kellen sunderland <
> > > > > > kellen.sunderl...@gmail.com> wrote:
> > > > > >
> > > > > > > Absolutely, let's do an investigation and see if it's
> > possible
> > > to
> > > > > > > virtualize.  Would you have time to look into it a bit
> > further?
> > > > > > >
> > > > > > > On Tue, Dec 12, 2017 at 6:47 PM, Chris Olivier <
> > > > > > cjolivie...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Don’t get me wrong, I’m not saying this Mac OS Jenkins
> > > solution
> > > > > is
> > > > > > doable
> > > > > > > > but I feel like we should investigate because the payoff
> > > would
> > > > be
> > > > > > large.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Dec 12, 2017 at 9:38 AM Chris Olivier <
> > > > > > cjolivie...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Apple’s Darwin OS Is recently open-sourced.
> > > > > > > > > 

Re: [LAZY VOTE] Consolidating developer guide in one place (cwiki preferred)

2018-09-04 Thread Lin Yuan
+1

On Tue, Sep 4, 2018 at 1:46 PM Aaron Markham 
wrote:

> I'd like to call for a lazy vote on this before proceeding. Already had
> some +1s but let's be sure.
>
> The vote is to move developer guide info to cwiki. User guides would remain
> on the website.
>
> On Tue, Aug 21, 2018 at 12:53 PM sandeep krishnamurthy <
> sandeep.krishn...@gmail.com> wrote:
>
> > +1
> > Thanks Lin and Aaron. I agree website to cover all user facing
> > documentation and a separate consolidated and organized developer
> focussed
> > docs in one place (cwiki).
> >
> >
> > Note: Permissions on cwiki is currently not well managed with many people
> > having full admin rights to edit/create/delete pages. Should be fine for
> > now, but, when we start accumulating many documents and resources, we
> > should probably revisit on Delete permissions.
> >
> >
> > On Tue, Aug 21, 2018 at 11:57 AM Lin Yuan  wrote:
> >
> > > Hi Aaron,
> > >
> > > Thanks for your answer. I think it's a very worthwhile effort to move
> all
> > > the developer related content from mxet.io website to a dedicated
> > > developer
> > > site. Would you like to initiate this effort?
> > >
> > > Best,
> > >
> > > Lin
> > >
> > > On Wed, Aug 15, 2018 at 3:47 PM Haibin Lin 
> > > wrote:
> > >
> > > > +1
> > > >
> > > > On Wed, Aug 15, 2018 at 1:10 PM, Aaron Markham <
> > > aaron.s.mark...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Lin, I agree with this organization. If you feel like somethings
> > > > should
> > > > > be transitioned from the website to the wiki, I can help with that,
> > but
> > > > for
> > > > > the moment I've been suggesting that new developer-focused content
> be
> > > > > placed on the wiki.
> > > > >
> > > > > On Tue, Aug 14, 2018 at 10:40 AM, Lin Yuan 
> > > wrote:
> > > > >
> > > > > > Dear MXNet community,
> > > > > >
> > > > > > As a developer, I noticed we have some developer guide scattered
> in
> > > > > > different websites (mxnet.io, cwiki):
> > > > > >
> > > > > > E.g.
> > > > > >
> > > > > > How to Create New Operators (Layers): [
> > > > > > https://mxnet.incubator.apache.org/faq/new_op.html]
> > > > > > A Guide to Implementing Sparse Operators in MXNet Backend [
> > > > > > https://cwiki.apache.org/confluence/display/MXNET/A+
> > > > > > Guide+to+Implementing+Sparse+Operators+in+MXNet+Backend
> > > > > > ]
> > > > > >
> > > > > > When searching developer guide by keyword, only one of them can
> be
> > > > > returned
> > > > > > on either site.
> > > > > >
> > > > > > It will be more convenient for developers if all the developer
> > guide
> > > > > > resides on cwiki and all user guide (non-developer) on the
> > mxnet.io
> > > > > > website. We can add a link on mxnet.io to refer all developers
> to
> > > > cwiki
> > > > > > for
> > > > > > guidance.
> > > > > >
> > > > > > Any comment is appreciated.
> > > > > >
> > > > > > Best Regards,
> > > > > >
> > > > > > Lin
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Sandeep Krishnamurthy
> >
>


Re: Propose to discontinue supporting Apache MXNet on Windows 7

2018-09-03 Thread Lin Yuan
> > > > > On the other hand the lack of data should not prevent us from
> > > moving
> > > > > forward and dropping support for outdated OS.
> > > > > In any case we would have to announce dropping a platform
> support
> > > at
> > > > least
> > > > > a release in advance.
> > > > > Steffen
> > > > >
> > > > > On Thu, Aug 30, 2018 at 12:21 PM Sheng Zha <
> zhash...@apache.org>
> > > wrote:
> > > > >
> > > > > > Hi Kellen,
> > > > > >
> > > > > > Thanks for the explanation. Unfortunately, I don't have the
> > > usage data,
> > > > > so
> > > > > > I refrained from voting. If any of the voters have such data
> > I'd
> > > love
> > > > to
> > > > > > see it too.
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > > On 2018/08/30 14:58:09, kellen sunderland <
> > > kellen.sunderl...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > > I haven't spoken to anyone about the decision (as I'm
> > > currently on an
> > > > > > > island in the med) but to me the quick +1s are likely a
> > result
> > > of
> > > > this
> > > > > > > being a fairly straightforward decision.  The factors that
> > > went into
> > > > my
> > > > > > > thinking were (1) prioritizing growing platforms rather
> than
> > > > shrinking
> > > > > > > platforms (i.e. thinking long term rather than shirt term)
> > and
> > > (2)
> > > > > > earning
> > > > > > > our customers' trust.  Claiming support for a platform when
> > we
> > > can't
> > > > > > > realistically deliver it would lose us trust.  I'd prefer
> to
> > > over
> > > > > deliver
> > > > > > > and under promise when it come to windows 7 for this
> reason.
> > > > > > >
> > > > > > > Now on the flip side one thing I would see as valuable is
> to
> > > try and
> > > > > get
> > > > > > > windows builds working with clang.  This could be
> beneficial
> > > in the
> > > > > sense
> > > > > > > that it would be easy to maintain for mxnet devs and allow
> us
> > > to use
> > > > > > modern
> > > > > > > cpp on older windows machines without using vs 2013(which I
> > > consider
> > > > a
> > > > > > > non-starter with our codebase).
> > > > > > >
> > > > > > > You have peaked my curiousity though Sheng.  How many win7
> > > users does
> > > > > > MXNet
> > > > > > > have relative to macos/Linux?
> > > > > > >
> > > > > > > On Thu, Aug 30, 2018, 8:51 AM Sheng Zha <
> szha@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Hi Yuan,
> > > > > > > >
> > > > > > > > No problem. This is an issue that's worth having a clear
> > > > definition,
> > > > > so
> > > > > > > > there's nothing wrong about your proposal, and thanks for
> > > bringing
> > > > > > this up.
> > > > > > > >
> > > > > > > > I'm more concerned about the seemingly unanimous votes on
> > > dropping
> > > > > > support
> > > > > > > > on a platform without seeing the supporting evidence that
> > > it's the
> > > > > > right
> > > > > > > > thing. It is as if everyone who participated in the vote
> > are
> > > > already
> > > > > > on the
> > > > > > > > same page, and somehow I'm the only one that's not. But
> the
> > > only
> > > > > > argument I
> > > > > > > > hear so far is that it's technically not straightf

Re: Propose to discontinue supporting Apache MXNet on Windows 7

2018-08-30 Thread Lin Yuan
Hi Sheng,

Thanks for raising this concern. The problem now is that we cannot even
build MXNet on Windows 7 because the build process requires MS VS 2015 w/
update 3 which is incompatible on Windows 7. This leaves many Windows 7
related open issues on github without any timely response. In my opinion,
having no response to users' request is probably even worse than letting
them know the limitation of OS support.

To minimize the impact to current Windows 7 users, we can provide PyPi
package for Windows 7 in this release but defer the bug fix and feature
enhancement to later Windows OS version. Based on users' feedbacks, we can
then officially discontinue the Windows 7 support in the next MXNet
release.

I will appreciate your comments.

Lin



On Wed, Aug 29, 2018 at 1:37 PM Sheng Zha  wrote:

> Are any of the votes based on any measure of user impact, if we indeed
> decide not to fix the current problems?
>
> -sz
>
> On Wed, Aug 29, 2018 at 1:29 PM Hagay Lupesko  wrote:
>
> > +1 (non-binding)
> > Thanks for raising this Lin!
> > Are you suggesting to do it as part of MXNet 1.3?
> >
> > On Wed, Aug 29, 2018 at 9:14 AM Srivastava, Rohit Kumar <
> > srivastava@buckeyemail.osu.edu> wrote:
> >
> > > +1
> > >
> > > On 8/29/18, 8:39 AM, "sandeep krishnamurthy" <
> > sandeep.krishn...@gmail.com>
> > > wrote:
> > >
> > > +1 Thanks for bringing this up.
> > >
> > > On Wed, Aug 29, 2018 at 6:38 AM Marco de Abreu
> > >  wrote:
> > >
> > > > +1
> > > >
> > > > On Wed, Aug 29, 2018 at 1:08 PM kellen sunderland <
> > > > kellen.sunderl...@gmail.com> wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > On Wed, Aug 29, 2018, 1:18 AM Anirudh Acharya <
> > > anirudhk...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1 for discontinuing.
> > > > > >
> > > > > > On Tue, Aug 28, 2018 at 4:11 PM Naveen Swamy <
> > mnnav...@gmail.com
> > > >
> > > > wrote:
> > > > > >
> > > > > > > +1 to stop supporting Win7
> > > > > > >
> > > > > > > On Tue, Aug 28, 2018 at 3:54 PM Lin Yuan <
> > apefor...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Dear Community,
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Currently, our MXNet installation guide for Windows does
> > not
> > > work
> > > > for
> > > > > > > > Windows 7. e.g. Microsoft Visual Studio 2015 is not
> > > supported on
> > > > > > Windows
> > > > > > > 7
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://visualstudio.microsoft.com/vs/support/vs2015/received-error-specified-program-requires-newer-version-windows/
> > > > > > > > >.
> > > > > > > > In addition, MSFT ended “Mainstream” support for Windows
> 7
> > > in 2015
> > > > (
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://support.microsoft.com/en-us/help/13853/windows-lifecycle-fact-sheet
> > > > > > > > ).
> > > > > > > > Therefore, it is not possible for developers to build
> MXNet
> > > and
> > > > > verify
> > > > > > > the
> > > > > > > > fix on Windows 7 platform. Given that there have been
> > several
> > > > issues
> > > > > > > about
> > > > > > > > MXNet error on Windows 7 (issue#9271
> > > > > > > > <https://github.com/apache/incubator-mxnet/issues/9271>,
> > > issue
> > > > #8921
> > > > > > > > <https://github.com/apache/incubator-mxnet/issues/8921>,
> > > issue
> > &

Propose to discontinue supporting Apache MXNet on Windows 7

2018-08-28 Thread Lin Yuan
Dear Community,



Currently, our MXNet installation guide for Windows does not work for
Windows 7. e.g. Microsoft Visual Studio 2015 is not supported on Windows 7
.
In addition, MSFT ended “Mainstream” support for Windows 7 in 2015 (
https://support.microsoft.com/en-us/help/13853/windows-lifecycle-fact-sheet).
Therefore, it is not possible for developers to build MXNet and verify the
fix on Windows 7 platform. Given that there have been several issues about
MXNet error on Windows 7 (issue#9271
, issue #8921
, issue #11163
), it will even add
more burden on developers in the future if we were to continue supporting
Windows 7.



I therefore would like to propose that we discontinue the support of MXNet
on Windows 7 in the next release.


Specifically, this means the following required actions:

1) state the discontinuation of Windows 7 support in the release note

2) update the MXNet webpage if Windows version is mentioned.

3) update the open Github issues related to Windows 7


Please share your thoughts about this proposal and/or suggest if there is
any other missing action item from the above.


Best Regards,


Lin


Re: build from source instructions

2018-08-28 Thread Lin Yuan
When a user chooses to build from source, it is reasonable to infer that
they want to run the make process and install the python package
subsequently. The current automated build script is confusing in that I
really have no idea what I should do if I want to change some of the source
code in MXNet. Furthermore, building from source on MacOS should have the
same/similar process as in building from source on Linux since they have
the same shell environment. Having two different build instructions on
MacOS just adds additional confusion.



On Tue, Aug 28, 2018 at 10:44 AM Bhavin Thaker 
wrote:

> The automated build script on macOS was written with the intention to have
> an automated, easy and quick way to build and install MXNet by any user,
> new-bie or advanced. The build script aims to provide repeatability and an
> easy way to test the build instructions.
>
> Without the script, the build instructions had many combinations of
> possibilities which would break for various users and there was no easy way
> to test all the combinations.
>
> I propose that we have both well-written build instructions with
> corresponding automated build script to ensure that the build instructions
> are well-tested.
>
> Please remember that there can be multiple use-cases and user preferences
> to build MXNet.
>
> Bhavin Thaker.
>
> On Tue, Aug 28, 2018 at 10:29 AM Afrooze, Sina  wrote:
>
> > +1 on fully automated scripts being more confusing than helpful. It's
> > difficult to debug any issues when the entire instruction is to run a
> > single script. - Sina
> >
> >
> >
> > On 8/28/18, 9:46 AM, "Lin Yuan"  wrote:
> >
> > Aaron,
> >
> > I agree the installation page is very confusing to me. When I first
> > tried
> > to build MXNet from source on MacOS, I was totally confused about the
> > instruction. Why was it vastly different from building from source on
> > Linux
> > given these two OS have similar shell commands. I feel the automatic
> > scripts on MacOS platform is rather confusing than simplifying.
> >
> > Lin
> >
> > On Mon, Aug 27, 2018 at 9:21 PM Steffen Rochel <
> > steffenroc...@gmail.com>
> > wrote:
> >
> > > Aaron - we should keep instructions how to build from source.
> > Updating and
> > > re-organizing makes sense to me.
> > > Steffen
> > >
> > > On Mon, Aug 27, 2018 at 4:54 PM Aaron Markham <
> > aaron.s.mark...@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > > I was looking into the C++ instructions and came across this
> > seemingly
> > > > pretty old page:
> > > > https://mxnet.incubator.apache.org/install/build_from_source
> > > >
> > > > I think it has several inaccuracies as different/updated
> > installation
> > > info
> > > > has been added to different pages.
> > > >
> > > > Should it be deleted?
> > > >
> > > > Or should a specific build from source page be maintained
> > (moving/copying
> > > > info from the other more recently updated pages)?
> > > >
> > > > I'm really thinking that it would be easier to maintain if each
> OS
> > had
> > > its
> > > > own page, Python/pip info had its own page, then bindings had
> > their own
> > > > pages.
> > > >
> > > > Other suggestions?
> > > >
> > > > Cheers,
> > > > Aaron
> > > >
> > >
> >
> >
> >
> >
>


Re: Updating MXNet's Cub

2018-08-28 Thread Lin Yuan
+1

On Tue, Aug 28, 2018 at 12:39 AM Hagay Lupesko  wrote:

> Thanks for the feedback Chris. Will follow up.
>
> On Fri, Aug 24, 2018 at 10:53 AM Chris Olivier 
> wrote:
>
> > +1 for pointing to NVidia's repo for the newer Cub and subsequent
> versions.
> >
> > On Fri, Aug 24, 2018 at 10:01 AM Hagay Lupesko 
> wrote:
> >
> > > Hi all,
> > >
> > >
> > > One of MXNet’s submodule dependencies is a snapshot of Nvidia Cub (
> > > https://github.com/dmlc/cub) – the snapshot is of an older version of
> > Cub
> > > (1.7), while the latest Nvidia Cub release is 1.8.  Note that dmlc/cub
> > has
> > > no customizations of the source Cub repo.
> > >
> > >
> > > I’d like to suggest to update the existing Cub submodule to Nvidia’s
> Cub
> > > repo. Instead of the snapshot, MXNet will be using Nvidia’s repo and
> the
> > > latest release (both repos have the same BSD-3 license, so licensing
> > should
> > > not be an issue).
> > >
> > >
> > > Wanted to get feedback from the community to make sure I'm not missing
> > > anything.
> > >
> > > if there are no objections I'll submit a PR for the change.
> > >
> > >
> > > Cheers,
> > >
> > > Hagay
> > >
> >
>


Re: build from source instructions

2018-08-28 Thread Lin Yuan
Aaron,

I agree the installation page is very confusing to me. When I first tried
to build MXNet from source on MacOS, I was totally confused about the
instruction. Why was it vastly different from building from source on Linux
given these two OS have similar shell commands. I feel the automatic
scripts on MacOS platform is rather confusing than simplifying.

Lin

On Mon, Aug 27, 2018 at 9:21 PM Steffen Rochel 
wrote:

> Aaron - we should keep instructions how to build from source. Updating and
> re-organizing makes sense to me.
> Steffen
>
> On Mon, Aug 27, 2018 at 4:54 PM Aaron Markham 
> wrote:
>
> > Hello,
> > I was looking into the C++ instructions and came across this seemingly
> > pretty old page:
> > https://mxnet.incubator.apache.org/install/build_from_source
> >
> > I think it has several inaccuracies as different/updated installation
> info
> > has been added to different pages.
> >
> > Should it be deleted?
> >
> > Or should a specific build from source page be maintained (moving/copying
> > info from the other more recently updated pages)?
> >
> > I'm really thinking that it would be easier to maintain if each OS had
> its
> > own page, Python/pip info had its own page, then bindings had their own
> > pages.
> >
> > Other suggestions?
> >
> > Cheers,
> > Aaron
> >
>


Re: Consolidating developer guide in one place (cwiki preferred)

2018-08-21 Thread Lin Yuan
Hi Aaron,

Thanks for your answer. I think it's a very worthwhile effort to move all
the developer related content from mxet.io website to a dedicated developer
site. Would you like to initiate this effort?

Best,

Lin

On Wed, Aug 15, 2018 at 3:47 PM Haibin Lin  wrote:

> +1
>
> On Wed, Aug 15, 2018 at 1:10 PM, Aaron Markham 
> wrote:
>
> > Hi Lin, I agree with this organization. If you feel like somethings
> should
> > be transitioned from the website to the wiki, I can help with that, but
> for
> > the moment I've been suggesting that new developer-focused content be
> > placed on the wiki.
> >
> > On Tue, Aug 14, 2018 at 10:40 AM, Lin Yuan  wrote:
> >
> > > Dear MXNet community,
> > >
> > > As a developer, I noticed we have some developer guide scattered in
> > > different websites (mxnet.io, cwiki):
> > >
> > > E.g.
> > >
> > > How to Create New Operators (Layers): [
> > > https://mxnet.incubator.apache.org/faq/new_op.html]
> > > A Guide to Implementing Sparse Operators in MXNet Backend [
> > > https://cwiki.apache.org/confluence/display/MXNET/A+
> > > Guide+to+Implementing+Sparse+Operators+in+MXNet+Backend
> > > ]
> > >
> > > When searching developer guide by keyword, only one of them can be
> > returned
> > > on either site.
> > >
> > > It will be more convenient for developers if all the developer guide
> > > resides on cwiki and all user guide (non-developer) on the mxnet.io
> > > website. We can add a link on mxnet.io to refer all developers to
> cwiki
> > > for
> > > guidance.
> > >
> > > Any comment is appreciated.
> > >
> > > Best Regards,
> > >
> > > Lin
> > >
> >
>


Consolidating developer guide in one place (cwiki preferred)

2018-08-14 Thread Lin Yuan
Dear MXNet community,

As a developer, I noticed we have some developer guide scattered in
different websites (mxnet.io, cwiki):

E.g.

How to Create New Operators (Layers): [
https://mxnet.incubator.apache.org/faq/new_op.html]
A Guide to Implementing Sparse Operators in MXNet Backend [
https://cwiki.apache.org/confluence/display/MXNET/A+Guide+to+Implementing+Sparse+Operators+in+MXNet+Backend
]

When searching developer guide by keyword, only one of them can be returned
on either site.

It will be more convenient for developers if all the developer guide
resides on cwiki and all user guide (non-developer) on the mxnet.io
website. We can add a link on mxnet.io to refer all developers to cwiki for
guidance.

Any comment is appreciated.

Best Regards,

Lin


Enabling shared filter in JIRA

2018-08-14 Thread Lin Yuan
Dear MXNet Community,

As we are trying to create our Scrum board on JIRA, I noticed that we do
not have the permission to create shared filter, even as an administrator.
This has limited us to create scrum boards for different components of the
project.

I will really appreciate if someone in the Apache Infra team could help to
enable shared filter creation in this project.

Best Regards,

Lin


  1   2   >