Re: Release plan - MXNET 1.3

2018-07-31 Thread Roshani Nagmote
Hi,

I have created a wiki for tracking MXNet 1.3 release with the timeline.
Please take a look here:
https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.3.0+Release+Status

I am still waiting for following 2 PRs to get merged:
TRT integration: https://github.com/apache/incubator-mxnet/pull/11325
Gluon RNN: https://github.com/apache/incubator-mxnet/pull/11482

*Code freeze date is 08/02(Thursday).* Kindly try to complete ongoing work
and get these PRs merged.

Thanks,
Roshani



On Mon, Jul 30, 2018 at 1:02 PM Roshani Nagmote 
wrote:

> Hi all,
>
> Here is an update on MXNet 1.3 release:
> I am still waiting for following PRs to get merged:
>
> TRT integration: https://github.com/apache/incubator-mxnet/pull/11325
> Gluon RNN: https://github.com/apache/incubator-mxnet/pull/11482
> Scala examples:
>
> https://github.com/apache/incubator-mxnet/pull/11753
>
> https://github.com/apache/incubator-mxnet/pull/11621
>
> *New code freeze date is: 08/03*  Please try to get your ongoing PRs
> merged by then.
>
> @Pedro, I didn't include your PRs in tracking list as you said those are
> not critical for now. Please let me know if those needs to be included.
> https://github.com/apache/incubator-mxnet/pull/11636
> https://github.com/apache/incubator-mxnet/pull/11562
>
> I also have updated project proposal cwiki page to update the status of
> PRs.
> 
>
> Please let me know if I am missing something.
>
> Thanks,
> Roshani
>
>
> On Thu, Jul 26, 2018 at 1:34 PM Pedro Larroy 
> wrote:
>
>> I would like to get these PR merged:
>>
>> https://github.com/apache/incubator-mxnet/pull/11636
>> https://github.com/apache/incubator-mxnet/pull/11562
>>
>> How much longer until the code freeze?
>>
>> On Thu, Jul 26, 2018 at 1:44 AM Roshani Nagmote <
>> roshaninagmo...@gmail.com>
>> wrote:
>>
>> > Hi all,
>> >
>> > PRs waiting to be merged for 1.3 release:
>> > https://github.com/apache/incubator-mxnet/pull/11325
>> >
>> > Are there any other PRs waiting to get merged? Please let me know.
>> >
>> > Release blocker issue:
>> > https://github.com/apache/incubator-mxnet/issues/11853
>> >
>> > @Marco, @Kellen, Thanks for bringing up the important topic. I agree
>> with
>> > you and we(internal Amazon team) will be working on fixing the disabled
>> > tests.
>> > Currently, my colleague, Hao Jin is working on compiling the list of
>> > disabled tests and leading the effort to fix them in the next few days.
>> >
>> > Thanks,
>> > Roshani
>> >
>> > On Mon, Jul 23, 2018 at 6:39 PM kellen sunderland <
>> > kellen.sunderl...@gmail.com> wrote:
>> >
>> > > Thanks again for organizing Roshani.  I believe the TensorRT work is
>> > ready
>> > > for a merge.  Thanks to Marek and all the NVIDIA people for iterating
>> on
>> > > it.  If possible could a committer review, make sure it meets their
>> > > expectations and then merge?  PR is here:
>> > > https://github.com/apache/incubator-mxnet/pull/11325
>> > >
>> > > To Marco's point.  I'd recommend we review some of those disabled
>> tests
>> > and
>> > > see how likely they are to affect users before we cut a release.
>> Many of
>> > > them are obviously not too important from a user's point of view (e.g.
>> > > downloading a sometimes-offline image in a test).  One idea would be
>> to
>> > try
>> > > and address as many of the customer impacting issues as possible
>> between
>> > > code freeze and the RC0 vote.
>> > >
>> > > On Mon, Jul 23, 2018 at 1:23 PM Marco de Abreu
>> > >  wrote:
>> > >
>> > > > Hello Roshani,
>> > > >
>> > > > frequent releases are good and I'm supportive for this in general in
>> > > order
>> > > > to provide our users with the latest features and improvements. But
>> at
>> > > the
>> > > > moment, I'm slightly concerned about the test coverage due to [1]. I
>> > want
>> > > > us to be conscious about cutting a release even though not all tests
>> > are
>> > > > enabled (29 disabled tests [2] as of today). However, I acknowledge
>> > that
>> > > we
>> > > > have improved by a lot lately thanks to everybody participating and
>> > > leading
>> > > > the efforts around improving flaky tests. From a retrospective
>> point of
>> > > > view, we could say that these efforts have actually revealed some
>> quite
>> > > > interesting bugs and thus the time was well spent and yielded good
>> > > results.
>> > > >
>> > > > What does the community think about making another sprint of
>> > improvements
>> > > > around tests followed up by a period of 1-2 weeks during which we
>> > observe
>> > > > the failures closely to ensure that no critical paths are impacted?
>> If
>> > we
>> > > > are in a good shape by then, we could continue the release process
>> and
>> > at
>> > > > the same time have the advantage of giving contributors more lead
>> time
>> > to
>> > > > finish their work to ensure it gets into the release in the desired
>> > > > quality.
>> > > >
>> > > > 

Seattle meetup

2018-07-31 Thread Marco de Abreu
Hi,

I just found this Meetup invite:
https://www.meetup.com/Apache-MXNet-Seattle-meetup/events/253104242/

Best regards,
Marco


Re: [DISCUSS] improve MXNet Scala release process

2018-07-31 Thread Qing Lan
Upon offline discussion with Marco,

He proposed a plan that can actually help us conduct 3):
1. This job will not be trigger when PR runs and strictly limit that 
only committer can run the restricted job.
2. The code being run in there will only covers the code from the 
branch you choose to go, it will be committers responsibilities not to merge 
any trivial credential grabber code.
3. Test this is simple. The restricted job uses a similar architecture 
with current CI. You can send a PR with dockerfiles, scripts and configurations 
on Jenkins to give it a test to run the job with a mock credential. Finally 
please contact people working on CI to give it a test run and they will do the 
last step to merge your change to CI. 
4. Marco also mentioned the security level of the credentials. The 
credential being used in the AWS Credential services will be assigned with an 
individual IAM role, which only allows to access to the credentials that role 
being assigned to, and used in the restricted job you have set up.

I would also like to encourage people in this list  to join the 
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Berlin+Office+Hours as 
the people who is working on improving the CI are there ready to help.

Thanks,
Qing


On 7/28/18, 11:44 PM, "Qing Lan"  wrote:

Thanks Marco, Naveen and Sheng's feedback.

About the 1): Scala side will only pack the mxnet binary only and use 
dynamic links to all the rest dependencies. So indeed it will require users to 
install all deps as the same as the builder platforms version and this will 
make them hard to use. Let's please collaborate and create a (set of) general 
CI script(s) to install the deps and bring static links to the package.

About 3): it is indeed a general problems for both Scala and Python 
publish. If there is a good way we can safely store the credentials, we can 
definitely give automated publish a go. And thanks again for Marco's option 
provided below, I think we can make use of the restricted slaves and give it a 
test run. And to Marco: 
1. Will this restricted jobs being triggered in every PR runs or it 
just depends on where you put it (like I put in nightly it will never   be 
trigger in PR)? Will there be a potential risk like a PR attack (create a PR to 
grab credentials)

2. How do we make sure the coding being run there is under control and 
not be changed by anyone?

3. If I want to test this functionality, where is the best place to 
create the job and make a test run?

Thanks,
Qing



On 7/27/18, 5:44 PM, "Marco de Abreu" 
 wrote:

Hi all,

about the credential management: We already have a solution based on
restricted slaves [1] and AWS secrets manager [2] that is generally
classified to generate binaries and handle credentials. It was designed
with continuous deployment in mind, but we haven't tested it in that 
field
yet.

To properly assess the requirements, it would be great if we have this
security critical part outlined for each release pipeline. We could then
check and see if our existing solution matches all requirements or if
further work is necessary.

Best regards,
Marco

[1]

https://cwiki.apache.org/confluence/display/MXNET/Restricted+jobs+and+nodes
[2] 
https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html

On 7/27/18, 5:43 PM, "Sheng Zha"  wrote:

Thanks, Naveen. Once we have clarity on 3), it should be no problem for
scala to reuse the same solution. For 1), if this is indeed an issue, it
seems that we may have rushed a bit on the scala releases. Are there any
user reports?

-sz

On Fri, Jul 27, 2018 at 5:26 PM, Naveen Swamy  
wrote:

> I collaborate with Qing as a part of my day time job, to give you a 
little
> more perspective on the proposed work
>
> For 1)
> What we found is that users often run into conflicts when they use a
> different version of the dependency(CUDA, CUDNN, OpenBLAS, OpenCV, 
etc,.)
> and the one we build with MXNet backend and use in the MXNet Scala 
package.
> Also it makes its not very straight-forward for users to install these
> dependencies themselves in order to lower the entry barrier and to 
make
> everything work out of the box we are thinking to build MXNet all 
these
> dependencies with MXNet (as a static library) and embed them in the 
MXNet
> Scala package. This is also inspired by the work you have done for 
Apache
> MXNet pip packages, Ideally I would like to reuse some of that work.
>
> Maven does not manage the binaries, you still have to build the 
binary and

Re: Requesting slack access

2018-07-31 Thread Steffen Rochel
Hi Juan - welcome to the MXNet community. Please check out how to get
involved .
What are your working on?

Regards,
Steffen

On Mon, Jul 30, 2018 at 11:16 PM Juan Vercellone  wrote:

> --
> -- .-
> VERCELLONE, Juan.
> (also known as 1010ad1c97efb4734854b6ffd0899401)
>


Re: Requesting slack access

2018-07-31 Thread Steffen Rochel
Hi Leonardo - welcome to the MXNet community. Please check out how to get
involved .
What are your working on?

Regards,
Steffen


On Tue, Jul 31, 2018 at 1:12 AM leonardo espinosa 
wrote:

> Hi there,
>
> I'm working in some projects using MXNet, and
> perhaps (if I get some extra free time) I'm thinking
> to add part of my research on the main branch.
>
> Thanks in advance,
>
> Leonardo
>
> ---
> Dr. Leonardo Espinosa
> *about me:*
> http://www.espinosaleal.me
>


Re: Release blocker: non-determinstic forward in gluon

2018-07-31 Thread kellen sunderland
I'd agree that we should have a repeatable process for generating
artifacts.  It would be useful for Apache release reviewers to be able to
double check the results we get in CI, and it would help give a consistent
experience for users.

I'm a little uncomfortable with the idea of generating the actual artifacts
from the CI account.  The CI account is designed to run arbitrary code from
the internet.  Generating a native binary that gets distributed to a bunch
of computers from this account seems like an unnecessary security risk.  We
could very simply run artifact builds in a different environment for which
the entire internet does not have execute permissions.  I'm not strongly
against the idea of releasing from the current CI account, but I think we
should be careful of how tightly we want to couple these processes.

On Tue, Jul 31, 2018 at 3:35 AM Hagay Lupesko  wrote:

> Thanks Pedro.
> Good to know you think it is important as well. I hope the community can
> review a proposal on the CWiki soon? that would be great...
>
> On Mon, Jul 30, 2018 at 4:26 AM Pedro Larroy  >
> wrote:
>
> > Hi Hagay
> >
> > We are aware of this and we are working in this direction which as you
> > point out, is more desirable.
> > There's a huge amount of non-trivial work that has gone into building
> these
> > distribution packages from Sheng which needs to be adapted for our CI
> > system, and taken into consideration.
> >
> > Pedro.
> >
> >
> > On Mon, Jul 30, 2018 at 9:07 AM Hagay Lupesko  wrote:
> >
> > > Thanks Tong for root-causing the issue!
> > > Thanks Sheng for following up with an updated PyPi package.
> > >
> > > What worries me is that we seem to build MXNet PyPi distribution
> packages
> > > with a build config different than the CI where all of the tests are
> > > running.
> > > Looking here [1
> > > <
> > >
> >
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh
> > > >]
> > > it seems that MXNet CI Ubuntu build uses libopenblas-dev v0.2.18, while
> > > PyPi build for MXNet 1.2.1 used v0.3.2 (I would imaging PyPi
> > distribution?)
> > >
> > > Needless to say that if we don't make sure PyPi distribution is aligned
> > > with the CI build, similar issues can happen again with other
> > dependencies.
> > > I'd think we want the build configs to be the same, or better yet have
> > the
> > > PyPi package be built from the output produced by the CI.
> > > Thoughts?
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh
> > >
> > >
> > > On Fri, Jul 27, 2018 at 11:31 AM Sheng Zha  wrote:
> > >
> > > > Tong,
> > > >
> > > > That's great news. I'm glad that OpenBLAS people are responding so
> > > quickly.
> > > > In that case it's probably a better idea to use that version instead.
> > The
> > > > latest OpenBLAS version brings many optimization for all kinds of
> > > hardware.
> > > >
> > > > -sz
> > > >
> > > > On Fri, Jul 27, 2018 at 11:10 AM, Tong He 
> wrote:
> > > >
> > > > > Hi Sheng,
> > > > >
> > > > > I also opened an issue on OpenBLAS repo:
> > > > > https://github.com/xianyi/OpenBLAS/issues/1700 .
> > > > >
> > > > > As informed that "0.3.2 should be released this weekend", I tested
> > > their
> > > > > develope branch as well, and seems the new version has fixed the
> bug.
> > > > >
> > > > > Since OpenBLAS 0.3.2 could also have performance improvement,
> > > therefore I
> > > > > propose to wait for OpenBLAS 0.3.2 for our pip post release.
> > > > >
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Tong He
> > > > >
> > > > > 2018-07-27 10:54 GMT-07:00 Sheng Zha :
> > > > >
> > > > > > Forgot to mention, the post release version is a pip package
> > version.
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > > > On Jul 27, 2018, at 10:42 AM, Sheng Zha 
> > > wrote:
> > > > > > >
> > > > > > > In this case we can regard it as a release problem, which is
> > > usually
> > > > > > what post release versions are for. It’s still the same release
> > with
> > > > > > different dependency, so there is no code change needed.
> > > > > > >
> > > > > > > -sz
> > > > > > >
> > > > > > >
> > > > > > >> On Jul 27, 2018, at 8:31 AM, Steffen Rochel <
> > > > steffenroc...@gmail.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >> Hi Tong - thanks for root causing the problem.
> > > > > > >> Sheng - what is 1.2.1.post0? Shouldn't a patch with fix be
> > > released
> > > > as
> > > > > > >> 1.2.2?
> > > > > > >> Steffen
> > > > > > >>
> > > > > > >>> On Thu, Jul 26, 2018 at 5:33 PM Sheng Zha <
> szha@gmail.com>
> > > > > wrote:
> > > > > > >>>
> > > > > > >>> Dear users and developers of Apache MXNet (Incubating),
> > > > > > >>>
> > > > > > >>> Thanks to Tong's dedication, the root cause for this issue
> was
> > > > > > identified
> > > > > > >>> to be instability in OpenBLAS's latest stable version 0.3.1.
> > For
> > > > > > details,
> > > > > > >>> see Tong's comment
> > > > > > >>> <
> > > > > > >>> 

Requesting slack access

2018-07-31 Thread leonardo espinosa
Hi there,

I'm working in some projects using MXNet, and
perhaps (if I get some extra free time) I'm thinking
to add part of my research on the main branch.

Thanks in advance,

Leonardo

---
Dr. Leonardo Espinosa
*about me:*
http://www.espinosaleal.me


Requesting slack access

2018-07-31 Thread Juan Vercellone
-- 
-- .-
VERCELLONE, Juan.
(also known as 1010ad1c97efb4734854b6ffd0899401)