Re: GitHub Label Bot Design

2018-06-14 Thread Yuelin Zhang
Hi Hen,

I am not using probot. Now my bot code is running in a AWS lambda function.
I will ask my manager and my mentors about where will the bot code be
committed.


Thanks,
Cathy

On Wed, Jun 13, 2018 at 8:09 AM, Hen  wrote:

> Where will the bot code be committed?
>
> Are you using probot?
>
> On Tue, Jun 12, 2018 at 2:21 PM Marco de Abreu <
> marco.g.ab...@googlemail.com>
> wrote:
>
> > Hello Cathy,
> > that's a great proposal. Thank you!
> >
> > A few comments from my side:
> > - Good idea with the alias. We should have a special email-list for
> > automated reports to prevent spamming dev@.
> > - "Create weekly email to internal team members:" -> email-list
> > - "Part II - Label Bot - Amazon cloudwatch event (a) will trigger lambda
> > function(a) 9am every Monday. " -> Why don't we try to classify them
> ASAP?
> > - "This bot should have restricted permissions to avoid unexpected
> > operations." -> AFAIK, Apache does not allow bot accounts and we have to
> > use a committers credentials instead. This is not a big issue since we
> > already do this, but just to keep that in mind.
> >
> > Best regards,
> > Marco
> >
> > On Tue, Jun 12, 2018 at 1:07 PM Yuelin Zhang  >
> > wrote:
> >
> > > Sorry for the messed up url format.
> > > Please forward to this link: https://tinyurl.com/mxnetbot
> > >
> > >
> > > Thanks,
> > > Cathy
> > >
> > >
> > > On Tue, Jun 12, 2018 at 10:20 AM, Yuelin Zhang <
> > zhangyuelinch...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Currently there are many issues on Incubator-MXNet
> > > >  repo, labeling issues
> can
> > > > drive attention of  contributors to specific areas. Right now, issues
> > are
> > > > all manually labelled, which is time consuming.  And every time
> > > maintainers
> > > > need to @ a committer to add labels.
> > > > I am working on this label bot to automate/simplify this labeling
> issue
> > > > process and send weekly report to maintainers. Design proposal is on
> > > cwiki:
> > > > https://cwiki.apache.org/confluence/display/MXNET/Deep+Learn
> > > > ing+Based+GitHub+Label+Bot
> > > >
> > > > Please feel free to let me know if you have suggestions/requirements/
> > > > expectations.
> > > >
> > > > Thanks,
> > > > Cathy
> > > >
> > > >
> > > >
> > >
> >
>


Re: Update on 1.2.1 release

2018-06-14 Thread Anirudh
Hi all,

We have one last PR before code freeze:
https://github.com/apache/incubator-mxnet/pull/11298

Anirudh

On Thu, Jun 14, 2018 at 11:46 AM, Anirudh  wrote:

> Waiting on CI for the PRs: #11236, #11210, #11267
>
> Other PRs have been merged.
>
> On Wed, Jun 13, 2018 at 10:50 PM, Anirudh  wrote:
>
>> Thanks Tao! Yes, this shouldn't be a blocker for 1.2.1.
>>
>> On Wed, Jun 13, 2018 at 10:46 PM, Lv, Tao A  wrote:
>>
>>>
>>> Yes, #10311 is only in master branch, so I guess it won't impact 1.2.0
>>> branch and block the release of 1.2.1, right?
>>>
>>> A PR (#11273) is submitted to disable the test temporally and hopefully
>>> it will be fixed soon.
>>>
>>> -tao
>>>
>>> -Original Message-
>>> From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
>>> Sent: Thursday, June 14, 2018 1:21 PM
>>> To: dev@mxnet.incubator.apache.org
>>> Subject: Re: Update on 1.2.1 release
>>>
>>> On windows tests a segfault is indicated by "If -764728474728". I have
>>> also seen it happen on Ubuntu, there are probably some links in the issue
>>> (on my phone right now).
>>>
>>> Anirudh  schrieb am Mi., 13. Juni 2018, 22:16:
>>>
>>> > By segfaulting test do you mean : test_gru_bidirectional. I don't see
>>> > the segfault in the logs. Can you point me to the test.
>>> > Also, this seems to be specific to the master and not in 1.2:
>>> > https://github.com/apache/incubator-mxnet/pull/10311
>>> >
>>> > Anirudh
>>> >
>>> > On Wed, Jun 13, 2018 at 10:00 PM, Marco de Abreu <
>>> > marco.g.ab...@googlemail.com.invalid> wrote:
>>> >
>>> > > I can confirm that this segfaulting test has a big impact.
>>> > >
>>> > > On Wed, Jun 13, 2018 at 9:39 PM Aaron Markham
>>> > > >> > >
>>> > > wrote:
>>> > >
>>> > > > I'd keep an eye on this one...  Flaky test: test_gru_bidirectional
>>> > #11219
>>> > > >
>>> > > > https://github.com/apache/incubator-mxnet/issues/11219
>>> > > >
>>> > > > Just reran several PR's CI runs that all had the same error!
>>> > > >
>>> > > > On Wed, Jun 13, 2018 at 5:42 PM, Anirudh 
>>> > wrote:
>>> > > >
>>> > > > > Hi,
>>> > > > >
>>> > > > > PRs still in progress : #11127, #11236, #11210, #11054, #11216.
>>> > > > >
>>> > > > > We are currently facing two issues which are delaying the merge
>>> > > > > of
>>> > some
>>> > > > of
>>> > > > > these PRs:
>>> > > > > 1. Flaky tests for scala API. A PR is already out to disable the
>>> > test:
>>> > > > > https://github.com/apache/incubator-mxnet/issues/11249
>>> > > > > 2. Builds breaking on windows:
>>> > > > > https://github.com/apache/incubator-mxnet/issues/11265
>>> > > > >
>>> > > > > Anirudh
>>> > > > >
>>> > > > >
>>> > > > > On Tue, Jun 12, 2018 at 11:59 AM, Anirudh
>>> > > > > 
>>> > > wrote:
>>> > > > >
>>> > > > > > Hi all,
>>> > > > > >
>>> > > > > > Here are the PRs that are being tracked for 1.2.1 release:
>>> > > > > >
>>> > > > > > Related to the save_params backwards incompatible change:
>>> > > > > > #11127
>>> > (In
>>> > > > > > Progress), #11236 (In Progress), #11210 (In Progress) MKLDNN
>>> > > > > > Fixes: #11212 (In Progress) Cross compilation for armv7 :
>>> > > > > > #11054 (In Progress) Scala Inference Memory leak fix: #11216
>>> > > > > > (In Progress) Docs changes: #11211 (Merged) Inplace RELU
>>> > > > > > Activation, Slice operator perf improvement: #11142
>>> > > > (Merged)
>>> > > > > > Use cudnnv7 for depthwise conv #11233 (Merged)
>>> > > > > >
>>> > > > > > Please let me know if I have missed something.
>>> > > > > >
>>> > > > > > Anirudh
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>


Re: Ccache errors

2018-06-14 Thread Marco de Abreu
Hello,

there was a bug in master related to an infrastructure script which has
been resolved. This was possible because it relied on an environment
variable that is being set on the Jenkins Master. The fix has been merged
at [1]. If you receive the following error in CI, please rebase:

[sanity] Running shell script
+ ci/build.py --docker-registry mxnetci --platform ubuntu_cpu --shm-size
500m /work/runtime_functions.sh sanity_check
Traceback (most recent call last):
  File "ci/build.py", line 347, in 
sys.exit(main())
  File "ci/build.py", line 254, in main
default=default_ccache_dir(),
  File "ci/build.py", line 127, in default_ccache_dir
return ccache_dirpython
NameError: name 'ccache_dirpython' is not defined
script returned exit code 1

Best regards,
Marco

[1]: https://github.com/apache/incubator-mxnet/pull/11269

On Thu, Jun 14, 2018 at 8:19 AM Marco de Abreu 
wrote:

> Hello,
>
> I'd like to explain the recent errors we had in the past 12 hours as part
> of CI. This was caused by my preparation to enable shared ccache volumes.
> Due to a mistake from my side, the option got applied to all runs instead
> of just my PR, causing all runs to fail. I have reverted all changes and
> everything is back to normal.
>
> Please excuse the inconvenience.
>
> Best regards,
> Marco
>


Nightly tests README accurate?

2018-06-14 Thread Indhu
Is the README
 for
the nightly tests accurate? For example,

1. Are tests being run on machines with Intel i7-4790 and 4 Nvidia GTX 970
Tis?
2. Is http://ci.dmlc.ml/ the right place to look for build status?
3. Is the instruction to run on Jenkins correct?

If not, what all needs to be changed in that page?


Re: Update on 1.2.1 release

2018-06-14 Thread Anirudh
Waiting on CI for the PRs: #11236, #11210, #11267

Other PRs have been merged.

On Wed, Jun 13, 2018 at 10:50 PM, Anirudh  wrote:

> Thanks Tao! Yes, this shouldn't be a blocker for 1.2.1.
>
> On Wed, Jun 13, 2018 at 10:46 PM, Lv, Tao A  wrote:
>
>>
>> Yes, #10311 is only in master branch, so I guess it won't impact 1.2.0
>> branch and block the release of 1.2.1, right?
>>
>> A PR (#11273) is submitted to disable the test temporally and hopefully
>> it will be fixed soon.
>>
>> -tao
>>
>> -Original Message-
>> From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
>> Sent: Thursday, June 14, 2018 1:21 PM
>> To: dev@mxnet.incubator.apache.org
>> Subject: Re: Update on 1.2.1 release
>>
>> On windows tests a segfault is indicated by "If -764728474728". I have
>> also seen it happen on Ubuntu, there are probably some links in the issue
>> (on my phone right now).
>>
>> Anirudh  schrieb am Mi., 13. Juni 2018, 22:16:
>>
>> > By segfaulting test do you mean : test_gru_bidirectional. I don't see
>> > the segfault in the logs. Can you point me to the test.
>> > Also, this seems to be specific to the master and not in 1.2:
>> > https://github.com/apache/incubator-mxnet/pull/10311
>> >
>> > Anirudh
>> >
>> > On Wed, Jun 13, 2018 at 10:00 PM, Marco de Abreu <
>> > marco.g.ab...@googlemail.com.invalid> wrote:
>> >
>> > > I can confirm that this segfaulting test has a big impact.
>> > >
>> > > On Wed, Jun 13, 2018 at 9:39 PM Aaron Markham
>> > > > > >
>> > > wrote:
>> > >
>> > > > I'd keep an eye on this one...  Flaky test: test_gru_bidirectional
>> > #11219
>> > > >
>> > > > https://github.com/apache/incubator-mxnet/issues/11219
>> > > >
>> > > > Just reran several PR's CI runs that all had the same error!
>> > > >
>> > > > On Wed, Jun 13, 2018 at 5:42 PM, Anirudh 
>> > wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > PRs still in progress : #11127, #11236, #11210, #11054, #11216.
>> > > > >
>> > > > > We are currently facing two issues which are delaying the merge
>> > > > > of
>> > some
>> > > > of
>> > > > > these PRs:
>> > > > > 1. Flaky tests for scala API. A PR is already out to disable the
>> > test:
>> > > > > https://github.com/apache/incubator-mxnet/issues/11249
>> > > > > 2. Builds breaking on windows:
>> > > > > https://github.com/apache/incubator-mxnet/issues/11265
>> > > > >
>> > > > > Anirudh
>> > > > >
>> > > > >
>> > > > > On Tue, Jun 12, 2018 at 11:59 AM, Anirudh
>> > > > > 
>> > > wrote:
>> > > > >
>> > > > > > Hi all,
>> > > > > >
>> > > > > > Here are the PRs that are being tracked for 1.2.1 release:
>> > > > > >
>> > > > > > Related to the save_params backwards incompatible change:
>> > > > > > #11127
>> > (In
>> > > > > > Progress), #11236 (In Progress), #11210 (In Progress) MKLDNN
>> > > > > > Fixes: #11212 (In Progress) Cross compilation for armv7 :
>> > > > > > #11054 (In Progress) Scala Inference Memory leak fix: #11216
>> > > > > > (In Progress) Docs changes: #11211 (Merged) Inplace RELU
>> > > > > > Activation, Slice operator perf improvement: #11142
>> > > > (Merged)
>> > > > > > Use cudnnv7 for depthwise conv #11233 (Merged)
>> > > > > >
>> > > > > > Please let me know if I have missed something.
>> > > > > >
>> > > > > > Anirudh
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>


Ccache errors

2018-06-14 Thread Marco de Abreu
Hello,

I'd like to explain the recent errors we had in the past 12 hours as part
of CI. This was caused by my preparation to enable shared ccache volumes.
Due to a mistake from my side, the option got applied to all runs instead
of just my PR, causing all runs to fail. I have reverted all changes and
everything is back to normal.

Please excuse the inconvenience.

Best regards,
Marco


Re: Feature branches for ARM and Android

2018-06-14 Thread Anton Chernov
Thank you Thomas for your suggestion, we already did exactly that and now
even have a CI verification for PR's to this branches in a public fork. The
main problem mentioned by Pedro already is that the changes we are doing
are already big and they are not going to be smaller over time. The merge
back to origin is going to be not only full of conflicts, but also
challenging to review.

As well such a PR could not be named starting with [MXNET-xxx] mentioning a
certain JIRA ticket, since it incorporates a batch of things tightly bound
to each other. It will be a big not separable list of tickets and changes
both specific to the problem and general improvements necessary to be made.
This is impossible to cherry-pick or revert separately and a completely
different branch needs to be maintained for release changes, general
improvements and specific task development.

Some general improvements require such amount of work (for example some
cmake improvements) that the initial issue becomes not solvable anymore in
a reasonable amount of time, burying both the potentially added value and
WIP improvements.

In general I don't understand the reason for such hard blocking of
contributions. None of the iterative changes proposed have an "unstable
state", they are all bringing value in a series of improvements that fix
the success already made.

Anton

ср, 13 июн. 2018 г. в 22:43, Pedro Larroy :

> The problem is that the process of porting is incremental and requires
> several patches from different collaborators to advance in different areas,
> like build system, infrastructure, code fixes, virtualization This gets
> difficult when having multiple scattered PRs open. We lost track of which
> changes where in which PR fixing the ARMv7 port with Anton.
>
> The normal way to operate in these cases in my experience is either use a
> feature branch and collaborate and share patches there, or integrate the
> patches to move towards the goal in the master branch. The latter is not
> always possible. I think going forwards we will try using an integration
> branch in our org:  MXNetEdge/incubator-mxnet which is a public fork. The
> downside is that we should be wary of merging back large patches to master,
> I think often we have problems in large patches that touch too many things.
> Happy to hear different suggestions, as is always good to find better
> branching patterns and ways of working.
>
> Pedro.
>
> On Wed, Jun 13, 2018 at 8:30 PM Thomas DELTEIL 
> wrote:
>
> > Hi Pedro,
> >
> > Is there a problem in working off a branch in your own fork and issue a
> > [WIP] PR ? This is a pattern I have seen a lot and personally I think it
> > works well, since it also gives some visibility if someone is interested
> in
> > looking at the progress of the work. You can add people collaborating
> with
> > you as collaborator to your own fork and that way your commits will be
> run
> > against the CI. Make sure to merge from apache/master and not
> larroy/master
> > if you have conflicts? Not sure why you got these conflicts otherwise.
> >
> > All the best,
> >
> > Thomas
> >
> > 2018-06-12 23:39 GMT-07:00 Pedro Larroy :
> >
> > > Thanks a lot for creating these branches and proposing the idea, for
> the
> > > reasons you listed.
> > >
> > >
> > >  We tried during this week to work with these branches with @lebeg for
> > > Android and Arm support, for the reasons listed below these branches
> are
> > > not useful for us, so you can delete them.
> > >
> > > 1. We don't have permissions to commit to these development branches,
> > > 2. they show merge conflicts that have been solved locally before
> running
> > > CI (?). I'm pretty sure I merged and resolved conflicts locally. 3. It
> > > would also pollute the repository history with continuous merges to and
> > > from these branches. I prefer to have a linear history in master so
> > > changes, regressions and bisecting can be less painful when dealing
> with
> > > issues.
> > >
> > > I think is important to share development and integrate small,
> > incremental
> > > patches towards architecture support, unfortunately these branches
> can't
> > > help us at this stage. We will share our work through a different means
> > and
> > > without polluting the project with additional branches which are not
> > meant
> > > for production or general use.
> > >
> > >
> > >
> > >
> > > On Mon, Jun 11, 2018 at 6:20 AM Marco de Abreu <
> > > marco.g.ab...@googlemail.com>
> > > wrote:
> > >
> > > > The problem with regular reviews here is that we might want to keep
> > > > temporary code or hacks as a temporary solution before we finalize
> it.
> > A
> > > > regular review would have problems with that.
> > > >
> > > > The reason against a fork is the requirement of CI. Since multiple
> > people
> > > > are working on the same branch and we have to file PRs against each
> > > other,
> > > > it would cause problems if CI is only triggered after the fact.
> > > >
> > > > Ideally, the branch