Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc1

2020-01-11 Thread Hao Jin
+1 (binding)
Built from source and ran all the latest d2l notebooks without problems.
Hao

On Fri, Jan 10, 2020 at 10:03 PM Jun Wu  wrote:

> +1 (binding)
>
> Built from source. Ran all the GPU tests and test_numpy*.py cpu tests
> without problems.
>
> On Fri, Jan 10, 2020 at 9:43 PM Skalicky, Sam 
> wrote:
>
> > We can enable building nightlys for feature branches too.
> >
> > Sam
> >
> > > On Jan 10, 2020, at 7:48 PM, Lin Yuan  wrote:
> > >
> > > We can release one cpu-mkl and one CUDA wheel  for testing various
> > > applications. Other people can build from source if they want other
> > flavors
> > >
> > > Lin
> > >
> > >> On Fri, Jan 10, 2020 at 4:00 PM Karan Jariwala <
> > karankjariw...@gmail.com>
> > >> wrote:
> > >>
> > >> Yes, agree with your point. But we will be requiring  many flavors of
> > pip
> > >> wheel.
> > >>
> > >> MKL/ without MKL
> > >> CUDA/ no CUDA
> > >> Linux/windows/Mac
> > >>
> > >> Thanks,
> > >> Karan
> > >>
> > >> On Fri, Jan 10, 2020 at 3:54 PM Haibin Lin 
> > >> wrote:
> > >>
> > >>> Shall we provide pip wheels for later release votes?
> > >>>
> > >>> Not everyone knows how to build MXNet from source (and building from
> > >> source
> > >>> also takes very long). Providing a pip wheel would lower the bar for
> > >> users
> > >>> who wants to test MXNet and participate in voting.
> > >>>
> > >>> Best,
> > >>> Haibin
> > >>>
> > >>> On Fri, Jan 10, 2020 at 3:50 PM Haibin Lin  >
> > >>> wrote:
> > >>>
> >  +1
> > 
> >  Built from source with USE_CUDA=1 on Ubuntu. Run gluon-nlp unit
> tests
> > >> and
> >  they passed.
> > 
> >  On Fri, Jan 10, 2020 at 3:18 PM Karan Jariwala <
> > >> karankjariw...@gmail.com
> > 
> >  wrote:
> > 
> > > +1
> > >
> > > Tested MXNet with and without MKL-DNN on Ubuntu 16.04 with Horovod
> > >>> 0.18.2.
> > > No regression seen between 1.5.1 and 1.6.0.rc1 when running
> > >>> horovod_MXNet
> > > integration test.
> > >
> > >
> > > Thanks,
> > >
> > > Karan
> > >
> > > On Fri, Jan 10, 2020 at 2:47 PM Markus Weimer 
> > >>> wrote:
> > >
> > >> +1 (binding)
> > >>
> > >> I tested on Ubuntu 18.04 on the Windows Subsystem for Linux.
> > >>
> > >> Tested:
> > >>  * Built from source using the instructions here [0]
> > >>  * Ran the tests in `./build/tests/mxnet_unit_tests`
> > >>  * SHA512 of the archive
> > >>
> > >> Not tested:
> > >>  * Language bindings
> > >>  * CUDA or other GPU acceleration
> > >>  * LICENSE and compliance status
> > >>  * Signature of the archive
> > >>
> > >
> > 
> > >>>
> > >>
> >
>


Re: Adding AMD CPU to CI

2018-11-29 Thread Hao Jin
For CPUs, the supported instruction sets may also vary between the same
manufacturer's different product lines of the same generation (Skylake-SP
versus Skylake).
For the same instruction set, the two manufacturers should both have a
working version of the hardware implementation. If any of the
implementations does not work, then the chip would not even be considered
functioning properly.
If some AMD CPUs only support up to AVX2 instruction sets, they would just
function in the same way as an Intel CPU that supports up to AVX2
instruction sets. The performance may vary, but the capability and behavior
of the two chips would be the same when given the same machine code.
For AMD GPUs it's a totally different story, as AMD GPUs do not share the
same instruction sets with the NVIDIA ones, thus testing on AMD GPUs(if we
do have support for them) would definitely add values.
Hao

On Thu, Nov 29, 2018 at 8:37 PM Anirudh Subramanian 
wrote:

> Instruction set extensions support like AVX2, AVX512 etc. can vary between
> AMD and Intel and there can also be a time lag between when Intel supports
> it versus when AMD supports it.
> Also, in the future this setup may be useful in case MXNet supports AMD
> GPUs and AWS also happens to have support for it.
>
> Anirudh
>
>
> On Thu, Nov 29, 2018 at 4:29 PM Marco de Abreu
>  wrote:
>
> > I think it's worth a discussion to do a sanity check. While generally
> these
> > instructions are standardized, we also made the experience with ARM that
> > the theory and reality sometimes don't match. Thus, it's always good to
> > check.
> >
> > In the next months we are going to refactor our slave creation processes.
> > Chance Bair has been working on rewriting Windows slaves from scratch (we
> > used images that haven't really been updated for 2 years - we still don't
> > know what was done on them) and they're ready soon. In the following
> > months, we will also port our Ubuntu slaves to the new method (don't
> have a
> > timeline yet). Ideally, the integration of AMD instances will only be a
> > matter of running the same pipeline on a different instance type. In that
> > Case, it should not be a big deal.
> >
> > If there are big differences, that's already a yellow flag for
> > compatibility, but that's unlikely. But in that case, we would have to
> make
> > a more thorough time analysis and whether it's worth the effort. Maybe,
> > somebody else could also lend us a hand and help us with adding AMD
> > support.
> >
> > -Marco
> >
> > Am Fr., 30. Nov. 2018, 01:22 hat Hao Jin 
> > geschrieben:
> >
> > > f16c is also an instruction set supported by both brands' recent CPUs
> > just
> > > like x86, AVX, SSE etc., and any difference in behaviors (quite
> > impossible
> > > to happen or it will be a major defect) would most likely be caused by
> > the
> > > underlying hardware implementation, so still, adding AMD instances is
> not
> > > adding much value here.
> > > Hao
> > >
> > > On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > Just looked at the mf16c work and wanted to mention Rahul clearly
> _was_
> > > > thinking about AMD users in that PR.
> > > >
> > > > On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
> > > > kellen.sunderl...@gmail.com> wrote:
> > > >
> > > > > From my perspective we're developing a few features like mf16c and
> > > MKLDNN
> > > > > integration specifically for Intel CPUs.  It wouldn't hurt to make
> > sure
> > > > > those changes also run properly on AMD cpus.
> > > > >
> > > > > On Thu, Nov 29, 2018, 3:38 PM Hao Jin  > > > >
> > > > >> I'm a bit confused about why we need extra functionality tests
> just
> > > for
> > > > >> AMD
> > > > >> CPUs, aren't AMD CPUs supporting roughly the same instruction sets
> > as
> > > > the
> > > > >> Intel ones? In the very impossible case that something working on
> > > Intel
> > > > >> CPUs being not functioning on AMD CPUs (or vice versa), it would
> > > mostly
> > > > >> likely be related to the underlying hardware implementation of the
> > > same
> > > > >> ISA, to which we definitely do not have a good solution. So I
> don't
> > > > think
> > > > >> performing extra tests on functional aspect of the system on AMD
> > CPUs
> > > is
> > > > >> adding any values.
> > > > >> Hao
> > > > >>
> > > > >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu
> >  > > >
> > > > >> wrote:
> > > > >>
> > > > >> > +1
> > > > >> >
> > > > >> > On 11/29/18, 2:39 PM, "Alex Zai"  wrote:
> > > > >> >
> > > > >> > What are people's thoughts on having AMD machines tested on
> > the
> > > > CI?
> > > > >> AMD
> > > > >> > machines are now available on AWS.
> > > > >> >
> > > > >> > Best,
> > > > >> > Alex
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>


Re: Adding AMD CPU to CI

2018-11-29 Thread Hao Jin
f16c is also an instruction set supported by both brands' recent CPUs just
like x86, AVX, SSE etc., and any difference in behaviors (quite impossible
to happen or it will be a major defect) would most likely be caused by the
underlying hardware implementation, so still, adding AMD instances is not
adding much value here.
Hao

On Thu, Nov 29, 2018 at 7:03 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Just looked at the mf16c work and wanted to mention Rahul clearly _was_
> thinking about AMD users in that PR.
>
> On Thu, Nov 29, 2018 at 3:46 PM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
>
> > From my perspective we're developing a few features like mf16c and MKLDNN
> > integration specifically for Intel CPUs.  It wouldn't hurt to make sure
> > those changes also run properly on AMD cpus.
> >
> > On Thu, Nov 29, 2018, 3:38 PM Hao Jin  >
> >> I'm a bit confused about why we need extra functionality tests just for
> >> AMD
> >> CPUs, aren't AMD CPUs supporting roughly the same instruction sets as
> the
> >> Intel ones? In the very impossible case that something working on Intel
> >> CPUs being not functioning on AMD CPUs (or vice versa), it would mostly
> >> likely be related to the underlying hardware implementation of the same
> >> ISA, to which we definitely do not have a good solution. So I don't
> think
> >> performing extra tests on functional aspect of the system on AMD CPUs is
> >> adding any values.
> >> Hao
> >>
> >> On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu 
> >> wrote:
> >>
> >> > +1
> >> >
> >> > On 11/29/18, 2:39 PM, "Alex Zai"  wrote:
> >> >
> >> > What are people's thoughts on having AMD machines tested on the
> CI?
> >> AMD
> >> > machines are now available on AWS.
> >> >
> >> > Best,
> >> > Alex
> >> >
> >> >
> >> >
> >>
> >
>


Re: Adding AMD CPU to CI

2018-11-29 Thread Hao Jin
I'm a bit confused about why we need extra functionality tests just for AMD
CPUs, aren't AMD CPUs supporting roughly the same instruction sets as the
Intel ones? In the very impossible case that something working on Intel
CPUs being not functioning on AMD CPUs (or vice versa), it would mostly
likely be related to the underlying hardware implementation of the same
ISA, to which we definitely do not have a good solution. So I don't think
performing extra tests on functional aspect of the system on AMD CPUs is
adding any values.
Hao

On Thu, Nov 29, 2018 at 5:50 PM Seth, Manu 
wrote:

> +1
>
> On 11/29/18, 2:39 PM, "Alex Zai"  wrote:
>
> What are people's thoughts on having AMD machines tested on the CI? AMD
> machines are now available on AWS.
>
> Best,
> Alex
>
>
>


Re: Request to join developing PR Best practice guidelines

2018-07-27 Thread Hao Jin
Hi Naveen,
Thanks for your lead on the initiative of improving PRs! Just curious on
the actual usage of the "PR best practices for authors", is it going to be
a standard the community enforce on all PRs? That is, all PRs must follow
those best practices before they could be merged. Or is it simply a bunch
of extra tips that we recommend all contributors to follow together with
their best judgements? Could you please clarify the purpose of it a bit
here? Thanks!
Best wishes,
Hao Jin

On Thu, Jul 26, 2018 at 11:35 AM, Naveen Swamy  wrote:

> Hi All,
>
> As a part of my job I am looking for ways to improve PR response time,
> participation, and PR quality. On asking a few committers/contributors near
> by(Andrea, Hagay, Mu Li, Steffen) and their feedback was that some PRs do
> not have enough details and only those who collaborate closely can review
> the PR. This could be because developers for various reasons, they might
> forget due to time-pressure, other priorities and the current guidelines
> might not be(IMO) enough.
> I would like to volunteer to enhance the existing PR Checklist created by
> the community members, and I request members to join this effort so we can
> collaborate and come up with a good set of guidelines and propose here on
> the dev@, please reach out to me on Slack or here if you are interested to
> contribute.
>
> Currently I am thinking of splitting this effort into
> 1) PR Best Practices for authors.
> 2) PR Checklist for reviewers.
> 3) Coding Guidelines.
>
> Thanks, Naveen
>


Re: Should MXNet 1.3 contain a buggy version of nn.Embedding backward by default?

2018-07-23 Thread Hao Jin
Hi all,
Some preliminary benchmark results have been shared on the related PR, and
what we've found is that based on the sample benchmark with an input on
which the LargeBatch version is supposed to have a better performance,
there was no significant increase in performance compared with either the
new general backward kernel or the AddTakeGrad function, and the LargeBatch
version is deemed buggy based on Leo's reproduction example given in the
original issue. I would propose that we delete the LargeBatch version and
use the AddTakeGrad version by default. If there's no obvious objection
then we'll go ahead in that direction.
Hao

On Mon, Jul 23, 2018 at 9:12 PM, Naveen Swamy  wrote:

> If it is buggy, how does it matter if it is performant or not? I am not
> seeing the rationale to make the correct version only opt-in.
>
>
> On Mon, Jul 23, 2018 at 6:47 PM, Leonard Lausen <
> leonard-softw...@lausen.nl>
> wrote:
>
> > Currently the default kernel of nn.Embedding backward is known to be
> > buggy on P3 instances or using Cuda 9.2 (though the issue also occurs on
> > other instances with earlier version of Cuda, but less often).
> >
> > https://github.com/apache/incubator-mxnet/issues/11314
> >
> > There is currently an opt-in for using a bug-free kernel, but it is not
> > the default. However, the bug-free kernel is used by default for shape
> > smaller 16384.
> >
> > Should MXNet ship a more efficient but buggy kernel in v1.3 or use a
> > correct but less efficient kernel by default? As MXNet v1.3 is likely to
> > be used a lot with Cuda 9.2 I believe the default behavior should be
> > changed to use the bug-free but less efficient Kernel. Correctness and
> > providing a good user experience should be No. 1 here (?). Then users
> > that want a faster but buggy backward kernel can still select to do so.
> > Note this only affects the backward pass.
> >
> > Hao did related work on improving the take operator
> > https://github.com/apache/incubator-mxnet/pull/11326
> > https://github.com/apache/incubator-mxnet/pull/11795 which also fixes
> > the issue, but he found it to be only "slightly faster" compared to the
> > bug-free kernel that is currently under opt-in while leading to CI
> > failures on Windows.
> >
> > In my experience, there is no speed difference between the current buggy
> > and
> > opt-in bug-free kernel, but the GPU utilization of the latter is 100%
> > compared
> > to 60% of the former (benchmark script:
> > https://github.com/apache/incubator-mxnet/pull/11795#
> > issuecomment-405808567 )
> >
>


Re: [VOTE] Release MXNet version 1.2.1.RC1

2018-07-12 Thread Hao Jin
+1 Built on Ubuntu with CUDA 9.0 and CuDNN 7 and verified that sparse tests
are passing.
Hao

On Thu, Jul 12, 2018 at 3:01 PM, Sergio Fernández  wrote:

> +1 (binding)
>
> On Mon, Jul 9, 2018, 16:53 Roshani Nagmote 
> wrote:
>
> > Hi all,
> >
> > I would like to propose a vote to release Apache MXNet (incubating)
> version
> > 1.2.1.RC1. Voting will start now (Monday, Jul 9th) and end at 5:50 PM
> > PDT, Thursday, July 12th.
> >
> > Link to release candidate 1.2.1.rc1:
> > *https://github.com/apache/incubator-mxnet/releases/tag/1.2.1.rc1
> > *
> >
> > View this page, click on "Build from Source", and use the source code
> > obtained from 1.2.1.rc1 tag:
> > https://mxnet.incubator.apache.org/install/index.html
> >
> > (Note: The README.md points to the 1.2.1 tag and does not work at the
> > moment.)
> >
> > Please remember to test first before voting accordingly:
> >
> > +1 = approve
> > +0 = no opinion
> > -1 = disapprove (provide reason)
> >
> > Thanks,
> > Roshani
> >
>


Re: Squash/Merge PRs

2018-07-12 Thread Hao Jin
Hi Aaron,
To be fair, this discussion has nothing to do with "pride" of SDEs, but
rather a discussion on what is a better software engineering practice for
the project. Breaking a large project into smaller tasks is a good software
engineering practice. This article(https://arxiv.org/pdf/1702.01715.pdf) on
Google's software engineering practice says: "Engineers are encouraged to
keep each individual change small​, with larger changes preferably broken
into a series of smaller changes that a reviewer can easily review in one
go.". If you have concerns or your comments on this practice, we can take
the discussion offline. On the other hand, an important spirit of Apache
community is that "one must interact with others, and share vision and
knowledge"(https://community.apache.org/contributors/), if you observed
that a majority of the contributors have serious problems with their
writing maybe you can share some tips and hints on how to write better
documentations, in that way not only a lot of us within the community can
have some growth but also you can save some time for writing more good
documentations and blogposts for MXNet, don't you think so? Or, if you only
have to fix the doc for someone once in a while, I definitely agree that
you should be given the credit for that, and that's where the co-author
field can be useful, which was exactly what I was saying in my previous
email. Thanks!
Hao

On Thu, Jul 12, 2018 at 8:16 AM, Aaron Markham 
wrote:

>  This is a great discussion, and close to my heart, because I want to
> include more writers and editors in our community, and I'm struggling to
> see how to best manage this. It seems like being the sole contributor on a
> feature is like an engineer's Lone Ranger badge of pride! I feel like it
> should be a red flag.
>
> I work in collaboration with dozens of engineers to improve their
> documentation, explain their features, change flows to improve user
> experience, and sometimes fix bugs and write code. When the PR's docs has
> great coverage, is clear, and not loaded with bad grammar and spelling
> mistakes, I will put comments in a review. But sometimes there needs to be
> a complete rework such that hundreds of review comments isn't feasible, and
> they can't be properly addressed. In a lot of these cases, I commit my
> changes to someone else's fork. Now I know the people I work with on their
> PRs appreciate my help, but when all of this work gets squashed down to one
> commit, I'm pretty much regularly getting squashed out. I'm not sure who
> else is in this boat, but does the community recognize our contributions
> when this is the general mode of operation?
>
> Regarding co-author - I'm not sure how people would feel about me being a
> co-author on their work. Documentation and clarity in your work product is
> important, but people's views on their personal *code* contribution seems
> to be more important than the overall code & feature quality itself when
> docs are part of the package. I feel that if I do follow-on PRs instead of
> fixing things before they are merged, that we would be releasing incorrect,
> incomplete, and inferior product. But, in absence of a better solution,
> maybe that's the pill we have to swallow.
>
> -1 on lots of small PRs (until we expand our range of committers and reduce
> the latency in reviews and merges).
> +1 on improving the quality of commit messages, so we don't have to squash
> & merge all of the time.
> +1 on improving the practice of better commit & merge management so that
> the commit history is concise and meaningful. (I'm guilty of this, and
> certainly need to improve here.)
> +1 on co-author - assuming that's what most everyone thinks is a good
> solution for now. I'm unclear on how this gets managed though.
>
> Cheers,
> Aaron
>
>
> On Thu, Jul 12, 2018 at 5:19 AM, Anton Chernov 
> wrote:
>
> > Unfortunately there has been significant push back for small iterative
> PR's
> > and as a result they have grown substantially involving multiple
> > contributors, multiple commits, sometimes multiple branches to be worked
> > on.
> >
> > I fully agree and support all points that Jin raised:
> >
> > 1) Most contributions should be broken down into smallest possible PR's.
> > 2) If a PR is small enough a single person can complete it.
> > 3) If a majority of PR is done by someone, and there's some minor issue
> > he/she needs help with it could be done in a follow up PR by anybody,
> > including the reviewer.
> >
> > Best regards,
> > Anton
> >
> > чт, 12 июл. 2018 г. в 10:11, Hao Jin :
> >
> > > +1 for M

Re: Squash/Merge PRs

2018-07-12 Thread Hao Jin
+1 for Marco's proposal on using the co-author field. IMHO the usage of
co-author field should not be that often, consider:
1) If a PR is so big that it needs multiple people to contribute a
substantial part of it, why can't that PR be broken down into smaller PRs
each submitted by single one of them?
2) If a PR is small enough (for example, < 300 lines), why can't a single
person complete it?
3) If a majority of PR is done by someone, and there's some minor issue
he/she needs help with(a small bug, incomplete doc), why can't that be done
through code reviews?
>From the above 3 situations we can see that collaborations on individual
PRs should not be quite often, but I do agree that it could still be
necessary when someone lacks the related expertise/knowledge on some
necessary part of that PR given that the required knowledge cannot be
picked up in a short period of time.

I do agree that keeping the commit histories of PRs clean is very important
as it could be confusing when reviewing PRs, but that really depends on
personal preferences (For my case, I usually do "git commit --amend" on
trivial changes and get a new commit for bigger changes such as a
checkpoint of my whole PR). With growing the community and attracting more
contributors as a high priority for our project, I would prefer that we do
not put even more burden on the contributors by asking them to manage and
squash the commits themselves just for the not-so-often cases of having
multiple contributors on a single PR.
Best regards,
Hao

On Thu, Jul 12, 2018 at 12:35 AM, Marco de Abreu <
marco.g.ab...@googlemail.com.invalid> wrote:

> Hi Naveen,
>
> I'm in favour of the squashing, considering the number of commits in some
> PRs and especially because of some people making commit messages a la "fix"
> "fix" "fix" all the time. Additionally, it gets hard (not impossible, just
> more inconvenient) to determine the atomic states of master - aka, which
> commits are separate from each other. You should consider that intermediary
> commits are unstable (fail CI) and thus it could be very hard to bisect
> failures in future - and the commit history gets cluttered.
>
> As alternative, I'd like to suggest the co-author field for these cases.
> Further documentation is available at
> https://blog.github.com/2018-01-29-commit-together-with-co-authors/.
>
> I definitely agree with the second part. We should all lead by example and
> maintain a high quality by keeping our commit messages clean and
> meaningful. When I receive an email notification that a new commit has been
> added and it only contains "fix" as title, it's not that helpful and also
> it's hard to track the development of a PR overtime. E.g., why has
> something been changed? Was there maybe a bug that we didn't cover with
> tests but the author just hacked something to get it to work but the
> problem still lays somewhere? We won't know that way and it makes it harder
> for us to review.
>
> Best regards,
> Marco
>
>
> Naveen Swamy  schrieb am Do., 12. Juli 2018, 10:09:
>
> > Hi All,
> >
> > I am seeing that maintainers merge PRs into the repo, they are squashing
> > the commits in the PR, which I understand and agree is to keep a sane
> > commit history, however this is causing problem when there are multiple
> > contributors involved on a PR(by contributing to a fork of the repo) this
> > effectively removes credit for multiple contributors involved and shows
> all
> > code as authored by the contributor who created the PR.
> >
> > Can I request maintainers to not squash PRs if there are multiple
> > contributors involved on the PR.
> >
> > Also on the same note, I request contributors(regardless of multiple
> > contributors or not) to keep a clean commit history by squashing the
> > commits and not push all your WIP commits to the PR. this will help us
> keep
> > our commit history clean and meaningful.
> >
> > Let me know your thoughts/better approach or If I have misunderstood how
> > this works.
> >
> > Thanks, Naveen
> >
>


Re: The operator check for Scala Package

2018-06-21 Thread Hao Jin
Can you elaborate on how this change is bringing "alignment on the
modifications to the operators across language bindings"? Seems like this
PR is generating a hash from the scala files instead of the backend apis,
how would this benefit other language bindings?
Hao

On Thu, Jun 21, 2018 at 10:44 AM, Naveen Swamy  wrote:

> Thank you all for your input, I want to reiterate that this work is not to
> burden anyone but to bring alignment on the modifications to the operators
> across language bindings. I agree with Marco/Qing and we'll start off as a
> nightly runs and identify changes, we can discuss further later after we
> gather some data.
>
> On Wed, Jun 20, 2018 at 10:36 PM, Qing Lan  wrote:
>
> > Thanks Jun for your clarification. I think one of the advantages for
> MXNet
> > is it's language binding. Although the number of customers using Scala is
> > great less than Python, it is still one important language we need to
> > support.
> >
> > About the first question, I would say we all in. At least all breaking
> > changes should let all language binding maintainers notified. Then we
> make
> > the changes to make sure we have stable support to the customers. The
> > reason for Scala build failure could be many, so we need to diagnose the
> > problem and see if we need to collaborate and solve the problems
> together.
> > I don't think the other language maintainers and backend must take the
> > responsibilities to diagnose a problems of a language they may not
> familiar
> > with.
> >
> > 2. It depends differently case by case, we cannot bring a unique time for
> > different failures we have. If you mean the operator check, it could
> cause
> > you (time to build scalapkg) + (30 seconds to change the code) + (time to
> > run the CI to make sure it worked) + (send a PR to make that change on
> > master branch). This update should be done by the Scala developers for
> sure.
> >
> > 3. If there is a nightly build failure, maybe we can think of checking
> > that on a weekly basis. Spend 15 mins every week to review the operator
> > changes so we don't miss any important points.
> >
> > Thanks,
> > Qing
> >
> >
> >
> > On 6/20/18, 9:57 PM, "Jun Wu"  wrote:
> >
> > We should reach an agreement on the responsibility/ownership before
> > moving
> > forward.
> >
> > 1. Who will take the ownership of fixing Scala build failure if an
> > operator
> > is changed/added in C++?
> > 2. What is the expected turn around time of fixing the scala build
> > failure.
> > 3. What if we are short of resources of fixing the scala build
> > failure? How
> > do we deal with the failing nightly CI every night?
> >
> > With all being said, there is at least one thing that must be clear,
> we
> > certainly don't want to put the extra burden on the backend
> developers.
> >
> > On Wed, Jun 20, 2018 at 9:39 PM Qing Lan 
> wrote:
> >
> > > Hi All,
> > >
> > > Thank you all for your feedback. This changes do influence a lot on
> > the
> > > PRs related to operator changes. I will take what Marco proposed to
> > place
> > > that in the nightly build rather than every CI we runs.
> > >
> > > Thanks,
> > > Qing
> > >
> > > On 6/20/18, 8:40 PM, "Hao Jin"  wrote:
> > >
> > > I don't think we should keep this hash check thing. As Qing
> > stated
> > > before
> > > on this thread, if there's any change in documentation the hash
> > value
> > > is
> > > also changed and then flagged as a problem, how will that break
> > any
> > > user's
> > > code? I would go for something like Marco's proposal: moving
> > this to an
> > > asynchronous check.
> > > At this very moment, I've got 4 PRs that are all related to
> > "operator
> > > changes", adopting this kind of method is adding extra work for
> > me and
> > > every other contributor whose work involves changes on operator
> > code,
> > > and I
> > > don't think it's a reasonable kind of extra work like tracking
> > PRs on
> > > JIRA.
> > > Hao
> > >
> > > On Wed, Jun 20, 2018 at 8:10 PM, Naveen Swamy <
> > mnnav...@gmail.com>
> > >

Re: The operator check for Scala Package

2018-06-20 Thread Hao Jin
I don't think we should keep this hash check thing. As Qing stated before
on this thread, if there's any change in documentation the hash value is
also changed and then flagged as a problem, how will that break any user's
code? I would go for something like Marco's proposal: moving this to an
asynchronous check.
At this very moment, I've got 4 PRs that are all related to "operator
changes", adopting this kind of method is adding extra work for me and
every other contributor whose work involves changes on operator code, and I
don't think it's a reasonable kind of extra work like tracking PRs on JIRA.
Hao

On Wed, Jun 20, 2018 at 8:10 PM, Naveen Swamy  wrote:

> Agreed, for new APIs we can turn into a nightly job and report on dev@.
> The
> goal here is not to burden anyone but to catch changes on the backend
> before it propagates and break user's code and co-ordinate changes across
> language bindings which IMO is essential, so I would like to keep for
> changes on the existing API.
>
> On Wed, Jun 20, 2018 at 7:58 PM, Marco de Abreu <
> marco.g.ab...@googlemail.com.invalid> wrote:
>
> > I think we should go with an asychronous approach using a nightly job
> that
> > detects the changes and reports them accordingly. We could then send out
> a
> > report if there is a mismatch.
> >
> > I agree with the already stated points that we should not put the burden
> of
> > adding frontend API bindings to somebody who adds a Backend function.
> This
> > is not scalable and rather poses a risk in reduced quality or increased
> > review overhead.
> >
> > The sparse support is the perfect example. I don't know if haibin knows
> > Scala, but he would have had to make the entire mapping for Scala without
> > being very familiar with it (sorry if I'm wrong on this haibin, it's just
> > an example). Imagine we do this for even more APIs - we would basically
> > block ourselves because everyone who makes a change in the Backend has to
> > know a dozen languages.
> >
> > While in a few cases it might be just a few lines, I'd guess that
> > especially for new operators it would require a proper design, coding and
> > review to map an API to the frontend languages. This should rather be
> done
> > by an expert in my opinion.
> >
> > We could define it as a requirement for a release that all APIs have had
> to
> > be transfered before a release can be cut (We already have it as
> > requirement that all nightly jobs have to pass anyways).
> >
> > -Marco
> >
> > Naveen Swamy  schrieb am Mi., 20. Juni 2018, 19:50:
> >
> > > I understand the concerns here. However the problem here is that since
> > the
> > > Scala APIs are generated using Macros and we do not(may not ever) have
> > full
> > > test coverage on each of the APIs, we will not know for example if an
> > > operator/API changes on the backend and that propagates to Scala users,
> > > their code might very well fail.  So what we are trying here is to find
> > out
> > > early if there is an operator/API change and avoid breaking user's
> code.
> > >
> > > I think there is value in co-ordinating addition of APIs across
> bindings,
> > > as an example the sparse APIs introduced was(still not) exposed to
> Scala
> > > users. This begs the question of how do we co-ordinate such changes?
> > Should
> > > we turn addition of new APIs also as warnings ?
> > >
> > > I agree that we shouldn't fail on documentation change, may we should
> > turn
> > > that into warnings and make sure to pick it up on the Scala APIs, this
> is
> > > low risk since it documentation does not change.
> > >
> > > Open for suggestions.
> > >
> > >
> > > On Wed, Jun 20, 2018 at 7:32 PM, YiZhi Liu 
> wrote:
> > >
> > > > Hi Qing,
> > > > What is the exact risk of changing / adding operators? could you
> > > > provide an example? I also feel the way you proposed is little bit
> too
> > > > heavy to developers, and not quite friendly to new contributors.
> > > > On Wed, Jun 20, 2018 at 10:22 AM Haibin Lin <
> haibin.lin@gmail.com>
> > > > wrote:
> > > > >
> > > > > I appreciate the effort and understand the motivation. However, I'm
> > > > > concerned that it basically means merging operator PRs becomes
> > > > sequential.
> > > > > Developers who work on operators have to update their PR every
> time a
> > > new
> > > > > operator is merged to master, the burden becomes significant if
> > > there're
> > > > 20
> > > > > ONNX/sparse operators to add and many PRs are submitted/reviewed in
> > > > > parallel.
> > > > >
> > > > > On Wed, Jun 20, 2018 at 10:13 AM, Qing Lan 
> > > wrote:
> > > > >
> > > > > > Hi Haibin,
> > > > > >
> > > > > > The operator change is any changes on the operator on C++ side.
> > > Trigger
> > > > > > the check failed?
> > > > > >- change the documentation of operator in C
> > > >   Yes
> > > > > >- change the documentation such as README.md
>  No
> > > > > >- add/remove/modify operator
>  Yes
> > > > > >- add/remove/modify operator parameter
> > > >  Yes
> > > > 

Re: Vote to stop using JIRA

2018-06-08 Thread Hao Jin
+0.5
I'm an SDE working for MXNet Engine team at AWS and I've been using JIRA
for nearly every single PR (28 out of 29 PR at this moment) I created since
I joined the team 3 months ago. I wouldn't say it's a really bad
experience, but definitely not good.
I do agree that we need something like JIRA and extra effort other than
writing code to manage the project, but the current user experience with
JIRA really has room for improvement.
The more important question for the community is that, is JIRA good for
attracting and retaining open source contributors? Such a user experience
of JIRA could drive contributors away if we're really enforcing it.
In conclusion, I think the usage of JIRA should be of one's or team's own
choice, if you do have the need to manage some development process (like
our team), just continue to use it, but we should no longer enforce or
persuade anyone to use it.
I'm also attaching a typical workflow of mine using JIRA with sprint
management and story point tracking for a single task, I think everyone who
has used JIRA so far would share part of my experience.
Thanks,
Hao

Appendix: what I need to do when I'm working on a task with JIRA:
1. Before I start working on something:
1) create a JIRA ticket for that, choose the correct type and define a
proper name for it
2) go to backlog page to add it to a sprint if you want to use the
sprint management.
3) go to sprint management page to assign story point value if you need
the functionality (we recently started doing that)
2. When I start working on the task:
1) dig my ticket up (on the sprint page or backlog page or search for
it)
2) click "open and start progress" to move it to "IN PROGRESS" phase.
3. When I finish the code, for every new PR I'll have to:
1) dig my ticket up
2) copy the ticket number so that I can add it to the PR title
3) an extra click to move the ticket to REVIEW phase
4) create a PR on github, paste the "MXNET-xxx" I just copied inside an
extra pair of square brackets "[]"
5) still need to refer the related github issue in the PR if I'm
solving one of them
4. For every code review or comment I receive on the new PR:
1) the JIRA bot logs 10 mins on the ticket no matter what kind of
comment it is
2) JIRA sends me an email for every single one of such logs (so if you
receive 10 code review comments in a single code review you get 10 emails,
my highest record was 17, while github would bundle all of them in only 1
mail)
5. After the PR is merged I get an email from JIRA saying this is merged
(for the bot, merge is like another comment on the PR) but I still have to:
1) dig my ticket up
2) manually move it to DONE phase (bot does not do that automatically).
6. The task is considered finished at this point, but any new comments on
the PR will trigger the bot once again to log 10 mins on your ticket
together with another email coming to your mailbox, while github is already
sending an email for that.

On Fri, Jun 8, 2018 at 2:23 PM, Naveen Swamy  wrote:

> Eric/Mu/Tianqi,
>
> A couple of contributors  have used JIRA for the Scala project -- this is
> the first time, so there is some learning for us, We started off with a
> design proposal, followed by a call for contribution. We kept our progress
> open on the dev list and we found one contributor to help us during
> debugging/testing/maven package creation and also one of them is working on
> the Memory allocation issue in Scala not as a part of their day job; This
> is where I find value in managing the project features on JIRA, designs on
> the public wiki, etc.,. I am not claiming that this is perfect, this is a
> WIP and as a community we should give it a chance and try it out.
>
> I don't think most of us here have even looked at JIRA.
>
> P.S: I am traveling, my response will be delayed.
>
> -Naveen
>
>
> On Fri, Jun 8, 2018 at 12:34 PM, Anirudh  wrote:
>
> > Hi,
> >
> > I am not a big fan of JIRA either. But I would like to raise this issue
> of
> > committing to a decision after a VOTE has passed. I saw in PRs that there
> > was a disregard for JIRA and ignoring requests on PRs to link to JIRA.
> > Because of this, JIRA was not given a fair chance to be tried out as a
> > project management tool and eventually hardly being used. If we don't
> work
> > on this as a community, VOTEs also tend to loose their meaning.
> >
> > Anirudh
> >
> >
> >
> > On Fri, Jun 8, 2018 at 11:00 AM, Eric Xie  wrote:
> >
> > > I think the number of issues become less of a problem if we label them
> > > timely, which is already improving.
> > >
> > > Thanks,
> > > Eric
> > >
> > > On 2018/06/08 17:55:28, Tianqi Chen  wrote:
> > > > I would suggest we have a separate discussion issue about
> transparency.
> > > > First of all, I agree that transparency is important.
> > > >
> > > > This can be achieved by public roadmaps, RFCs, that do not have
> > > > particularly tie to JIRA or github issues. Having a clear guideline
> on
> > > th