Re: [apache/incubator-mxnet] [RFC] Unified API for Distributed Data Parallel Training (#16795)

2019-12-07 Thread Haibin Lin
I do expect the API to change in the future. Currently @szhengac @zhongyuchen 
and I are exploring APIs for gradient compression with a few algorithms, and we 
may bring back the best practice back to MXNet. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16795#issuecomment-562907768

Re: [apache/incubator-mxnet] [RFC] Deferred compute in imperative interface to unify imperative and symbolic interface (#16376)

2019-12-07 Thread Sheng Zha
How's this project going?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16376#issuecomment-562906794

Re: Stopping nightly releases to Pypi

2019-12-07 Thread Sheng Zha
> Heres a set of links for today’s builds
> 
> (Plain mxnet, no mkl no cuda)
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> (mxnet-mkl)
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> (mxnet-cuXXX)
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> (mxnet-cuXXXmkl)
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl

These links are not utilizing the s3 accelerate feature (i.e. not backed by 
cloudfront edges). Please use repo.mxnet.io instead. The updated links are:
(Plain mxnet, no mkl no cuda)
https://repo.mxnet.io/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
(mxnet-mkl)
https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
(mxnet-cuXXX)
https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu90-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu92-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu100-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu101-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
(mxnet-cuXXXmkl)
https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu90mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu92mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu100mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu101mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl

When updating the installation doc we should use repo.mxnet.io domain name too.

Best,
-sz

On 2019/12/07 17:39:40, "Skalicky, Sam"  wrote: 
> Hi MXNet Community,
> 
> We have been working on getting nightly builds fixed and made available 
> again. We’ve made another system using AWS CodeBuild & S3 to work around the 
> problems with Jenkins CI, PyPI, etc. It is currently building all the flavors 
> and publishing to an S3 bucket here:
> https://us-west-2.console.aws.amazon.com/s3/buckets/apache-mxnet/dist/?region=us-west-2
> 
> There are folders for each set of nightly builds, try out the wheels starting 
> today 2019-12-07. Builds start at 1:30am PT (9:30am GMT) and arrive in the 
> bucket 30min-2hours later. Inside each folder are the wheels for each flavor 
> of MXNet. Currently we’re only building for linux, builds for windows/Mac 
> will come later.
> 
> If you want to download the wheels easily you can use a URL in the form of:
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist//dist/-1.6.0b-py2.py3-none-manylinux1_x86_64.whl
> 
> Heres a set of links for today’s builds
> 
> (Plain mxnet, no mkl no cuda)
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> (mxnet-mkl)
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> (mxnet-cuXXX)
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> (mxnet-cuXXXmkl)
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> 

Re: [apache/incubator-mxnet] [RFC] Custom Operator Part 2 (#17006)

2019-12-07 Thread JackieWu
Hi @samskalicky , thank you for the contribution!
I have several suggestions.

- custom GPU operators
  1. Provide CUDA stream in `OpResource`.
  2. Share the same function on CPU and GPU.
  Users can discriminate the context by `MXTensor::dltensor::ctx`
- Call framework specific math helper
  It is important for a custom operator. Users may call gemm, even convolution 
op in custom op.

Thanks.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17006#issuecomment-562898682

Re: Please remove conflicting Open MP version from CMake builds

2019-12-07 Thread Pedro Larroy
Stop disseminating false information:

https://github.com/apache/incubator-mxnet/issues/14979


On Sat, Dec 7, 2019 at 7:04 AM Chris Olivier  wrote:

> -1
>
> mkldnn removed omp5 for licencing issues
> no bugs have actually been traced to the use of llvm openmp. only an assert
> caused by an actual bug in mxnet code. there are suitable workarounds.
>
> over time llvm omp has simply been used as a “catch all” for random
> problems that aren’t related at all (such as getenv race condition in an
> atfork call that isn’t even part of an omp parallel region).
>
> proposal is now and has always been roughly equivalent to the idea of
> “comment out an assert rather than fix the bug it’s reporting”.
>
> Up until very recently, Makefile version of mxnet used libomp5 for YEARS
> and not libgomp, with no issue reported (omp not built in debug mode), so
> the equivalent configuration from CMake mysteriously causing myriads if
> problems has questionable merit and smells more like a hubris situation.
>
> I use tensorflow as well and it links to libomp5 rather than libgomp.
>
> if the assert problem is really a problem, the bug being reported would be
> prioritized and fixed. it should be fixed regardless. all the time spent by
> some CI people trying to remove this could have simply fixed the actual bug
> in a small fraction of the time.
>
>
> On Fri, Dec 6, 2019 at 8:44 PM Lausen, Leonard 
> wrote:
>
> > I think it's reasonable to assume that the Intel MKLDNN team is an
> > "authorative"
> > source about the issue of compilation with OpenMP and the OpenMP runtime
> > library
> > related issues. Thus I suggest we follow the recommendation of Intel
> > MKLDNN team
> > within the MXNet project.
> >
> > Looking through the Intel MKLDNN documentation, I find [1]:
> >
> > > DNNL uses OpenMP runtime library provided by the compiler.
> >
> > as well as
> >
> > > it's important to ensure that only one OpenMP runtime is used
> throughout
> > the
> > > application. Having more than one OpenMP runtime linked to an
> executable
> > may
> > > lead to undefined behavior including incorrect results or crashes.
> >
> > To keep our project maintainable and error free, I thus suggest we follow
> > DNNL
> > and use the OpenMP runtime library provided by the compiler.
> > We have limited ressources and finding the root cause for any bugs
> > resulting
> > from linking multiple OpenMP libraries as currently done is, in my
> > opinion. not
> > a good use of time. We know it's due to undefined behavior and we know
> > it's best
> > practice to use OpenMP runtime library provided by the compiler. So let's
> > just
> > do that.
> >
> > I think given that MKL-DNN has also adopted the "OpenMP runtime library
> > provided
> > by the compiler" approach, this issue is not contentious anymore and
> > qualifies
> > for lazy consensus.
> >
> > Thus if there is no objection within 72 hours (lazy consensus), let's
> drop
> > bundled LLVM OpenMP from master [2]. If we find any issues due to
> > droppeing the
> > bundled LLVM OpenMP, we can always add it back prior to the next release.
> >
> > Best regards
> > Leonard
> >
> > [1]:
> >
> >
> https://github.com/intel/mkl-dnn/blob/433e086bf5d9e5ccfc9ec0b70322f931b6b1921d/doc/build/build_options.md#openmp
> > (This is the updated reference from Anton's previous comment, based on
> the
> > changes in MKLDNN done in the meantime
> >
> https://github.com/apache/incubator-mxnet/pull/12160#issuecomment-415078066
> > )
> > [2]: Alike https://github.com/apache/incubator-mxnet/pull/12160
> >
> >
> > On Fri, 2019-12-06 at 12:16 -0800, Pedro Larroy wrote:
> > > I will try to stay on the sidelines for now since previous
> conversations
> > > about OMP have not been productive here and I have spent way too much
> > time
> > > on this already, I'm not the first one giving up on trying to help with
> > > this topic.
> > >
> > > I would be glad if you guys can work together and find a solution. I
> will
> > > just put my understanding of the big picture hoping that it helps move
> it
> > > forward.
> > >
> > >
> > > Recently the intel omp library which seemed to have the best
> performance
> > of
> > > the 3 was removed from MKL.
> > >
> > > - There's 3 libraries in play, GNU Omp which is shipped with gcc
> (gomp),
> > > LLVM openmp in 3rdparty (llvm-omp), Intel OMP when using MKL, which is
> > > recently removed (iomp)
> > >
> > > - IOMP seems to have the best performance, there's stability issues
> > > producing crashes sometimes but the impact seems relatively small for
> > users
> > > and developers. In general seems linking with a different OMP version
> > that
> > > the one shipped with the compiler is known to cause stability issues
> but
> > > it's done anyway.
> > >
> > > - LLVM-OMP used when building with CMake, not used in the PIP releases
> or
> > > when building with Make. Has stability issues, hangs when running in
> > debug
> > > mode during test execution and produces tons of assertions in debug
> mode.
> > 

Re: Custom C++ Operators

2019-12-07 Thread Marco de Abreu
Awesome project, love it! It really seems easy to use, great job!

-Marco

Skalicky, Sam  schrieb am Sa., 7. Dez. 2019,
19:50:

> Hi MXNet Community,
>
> We have been working on adding support for custom C++ operators for a
> while and are happy to announce that the initial functionality is now
> available for you to try out in the master branch!
>
> CustomOp support in MXNet began with allowing users to write custom
> operators in Python and has been available for years. If you wanted to
> write a high-performance C++ operator you had to do it by adding it to the
> MXNet source code, recompiling a custom version of MXNet, and distributing
> that custom build. The Custom C++ operator support enhances this by
> enabling users to write high-performance C++ operators and compile them
> separately from MXNet. This frees up users from having to recompile MXNet
> from source and makes it easier to add custom operators to suit their needs.
>
> Heres a few pointers to get started:
> 1. Check out the overview in the cwiki [1]
> 2. Check out the PR [2]
> 3. You can try this out using the new nightly builds that are available in
> S3 [3]
> 4. Leave feedback on features to add or things to fix in a followup PR
> here [4]
>
> Credit goes to everyone involved (in no particular order)
> Manu Seth
> Sheng Zha
> Jackie Wu
> Junru Shao
> Ziyi Mu
>
> Special thanks to all the PR reviewers!
>
> Thanks!
> Sam
>
>
> [1]
> https://cwiki.apache.org/confluence/display/MXNET/Dynamic+CustomOp+Support
> [2] https://github.com/apache/incubator-mxnet/pull/15921
> [3]
> https://lists.apache.org/thread.html/0a22e10b290b4ad322ed50024d778c3736b0a772811caea317790732%40%3Cdev.mxnet.apache.org%3E
> <
> https://lists.apache.org/thread.html/0a22e10b290b4ad322ed50024d778c3736b0a772811caea317790732@
> >
> [4] https://github.com/apache/incubator-mxnet/issues/17006
>


Custom C++ Operators

2019-12-07 Thread Skalicky, Sam
Hi MXNet Community,

We have been working on adding support for custom C++ operators for a while and 
are happy to announce that the initial functionality is now available for you 
to try out in the master branch!

CustomOp support in MXNet began with allowing users to write custom operators 
in Python and has been available for years. If you wanted to write a 
high-performance C++ operator you had to do it by adding it to the MXNet source 
code, recompiling a custom version of MXNet, and distributing that custom 
build. The Custom C++ operator support enhances this by enabling users to write 
high-performance C++ operators and compile them separately from MXNet. This 
frees up users from having to recompile MXNet from source and makes it easier 
to add custom operators to suit their needs.

Heres a few pointers to get started:
1. Check out the overview in the cwiki [1]
2. Check out the PR [2]
3. You can try this out using the new nightly builds that are available in S3 
[3]
4. Leave feedback on features to add or things to fix in a followup PR here [4]

Credit goes to everyone involved (in no particular order)
Manu Seth
Sheng Zha
Jackie Wu
Junru Shao
Ziyi Mu

Special thanks to all the PR reviewers!

Thanks!
Sam


[1] https://cwiki.apache.org/confluence/display/MXNET/Dynamic+CustomOp+Support
[2] https://github.com/apache/incubator-mxnet/pull/15921
[3] 
https://lists.apache.org/thread.html/0a22e10b290b4ad322ed50024d778c3736b0a772811caea317790732%40%3Cdev.mxnet.apache.org%3E
[4] https://github.com/apache/incubator-mxnet/issues/17006


[apache/incubator-mxnet] [RFC] Custom Operator Part 2 (#17006)

2019-12-07 Thread Sam Skalicky
## Description
Request for comments on the next PR for enhancing custom operator support

Heres some suggestions from the initial PR (Part 1):
- custom GPU operators
- Random number generator resource request
- sparse data types
- migrate lambda functions in MXLoadLib in src/c_api/c_api.cc to classes 
defined elsewhere
- Documentation, add the "library" python package to the namespace to the doc: 
https://mxnet.apache.org/api/python/docs/api/ ?

## References
- initial PR (Part 1): #15921


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17006

Re: Stopping nightly releases to Pypi

2019-12-07 Thread Marco de Abreu
Could you elaborate how a non-Amazonian is able to access, maintain and
review the CodeBuild pipeline? How come we've diverted from the community
agreed-on standard where the public Jenkins serves for the purpose of
testing and releasing MXNet? I'd be curious about the issues you're
encountering with Jenkins CI that led to a non-standard solution.

-Marco


Skalicky, Sam  schrieb am Sa., 7. Dez. 2019,
18:39:

> Hi MXNet Community,
>
> We have been working on getting nightly builds fixed and made available
> again. We’ve made another system using AWS CodeBuild & S3 to work around
> the problems with Jenkins CI, PyPI, etc. It is currently building all the
> flavors and publishing to an S3 bucket here:
>
> https://us-west-2.console.aws.amazon.com/s3/buckets/apache-mxnet/dist/?region=us-west-2
>
> There are folders for each set of nightly builds, try out the wheels
> starting today 2019-12-07. Builds start at 1:30am PT (9:30am GMT) and
> arrive in the bucket 30min-2hours later. Inside each folder are the wheels
> for each flavor of MXNet. Currently we’re only building for linux, builds
> for windows/Mac will come later.
>
> If you want to download the wheels easily you can use a URL in the form of:
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/
> /dist/-1.6.0b-py2.py3-none-manylinux1_x86_64.whl
>
> Heres a set of links for today’s builds
>
> (Plain mxnet, no mkl no cuda)
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> (mxnet-mkl
> 
> )
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> (mxnet-cuXXX
> 
> )
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> (mxnet-cuXXXmkl
> 
> )
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
>
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
>
> You can easily install these pip wheels in your system either by
> downloading them to your machine first and then installing by doing:
>
> pip install /path/to/downloaded/wheel.whl
>
> Or you can install directly by just giving the link to pip like this:
>
> pip install
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
>
> Credit goes to everyone involved (in no particular order)
> Rakesh Vasudevan
> Zach Kimberg
> Manu Seth
> Sheng Zha
> Jun Wu
> Pedro Larroy
> Chaitanya Bapat
>
> Thanks!
> Sam
>
>
> On Dec 5, 2019, at 1:16 AM, Lausen, Leonard  > wrote:
>
> We don't loose pip by hosting on S3. We just don't host nightly releases
> on Pypi
> servers and mirror them to several hundred mirrors immediately after each
> build
> is published which is very expensive for the Pypi project.. People can
> still
> install the nightly builds with pip by specifying the -f option.
>
> Uploading weekly releases to Pypi will reduce the cost for Pypi by ~75%
> [1]. It
> may be acceptable to Pypi, but does it make sense for us? I'm not convinced
> weekly release on Pypi is a good idea. Consider one release is buggy,
> users will
> need to wait for 7 days for a fix. It doesn't provide good user experience.
> If someone has a stronger conviction about the value of weekly releases on
> Pypi,
> that person shall please go ahead and propose it in a separate discussion
> thread.
>
> Currently we don't have generally working nightly builds to Pypi and as a
> matter
> of fact we know that we can't have them due to Pypi's policy and our
> apparent
> need for large binaries. Given this fact and that no objection was raised
> by
> 2019-12-05 

Re: Stopping nightly releases to Pypi

2019-12-07 Thread Skalicky, Sam
Hi MXNet Community,

We have been working on getting nightly builds fixed and made available again. 
We’ve made another system using AWS CodeBuild & S3 to work around the problems 
with Jenkins CI, PyPI, etc. It is currently building all the flavors and 
publishing to an S3 bucket here:
https://us-west-2.console.aws.amazon.com/s3/buckets/apache-mxnet/dist/?region=us-west-2

There are folders for each set of nightly builds, try out the wheels starting 
today 2019-12-07. Builds start at 1:30am PT (9:30am GMT) and arrive in the 
bucket 30min-2hours later. Inside each folder are the wheels for each flavor of 
MXNet. Currently we’re only building for linux, builds for windows/Mac will 
come later.

If you want to download the wheels easily you can use a URL in the form of:
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist//dist/-1.6.0b-py2.py3-none-manylinux1_x86_64.whl

Heres a set of links for today’s builds

(Plain mxnet, no mkl no cuda)
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
(mxnet-mkl)
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
(mxnet-cuXXX)
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
(mxnet-cuXXXmkl)
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl

You can easily install these pip wheels in your system either by downloading 
them to your machine first and then installing by doing:

pip install /path/to/downloaded/wheel.whl

Or you can install directly by just giving the link to pip like this:

pip install 
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl

Credit goes to everyone involved (in no particular order)
Rakesh Vasudevan
Zach Kimberg
Manu Seth
Sheng Zha
Jun Wu
Pedro Larroy
Chaitanya Bapat

Thanks!
Sam


On Dec 5, 2019, at 1:16 AM, Lausen, Leonard 
mailto:lau...@amazon.com.INVALID>> wrote:

We don't loose pip by hosting on S3. We just don't host nightly releases on Pypi
servers and mirror them to several hundred mirrors immediately after each build
is published which is very expensive for the Pypi project.. People can still
install the nightly builds with pip by specifying the -f option.

Uploading weekly releases to Pypi will reduce the cost for Pypi by ~75% [1]. It
may be acceptable to Pypi, but does it make sense for us? I'm not convinced
weekly release on Pypi is a good idea. Consider one release is buggy, users will
need to wait for 7 days for a fix. It doesn't provide good user experience.
If someone has a stronger conviction about the value of weekly releases on Pypi,
that person shall please go ahead and propose it in a separate discussion
thread.

Currently we don't have generally working nightly builds to Pypi and as a matter
of fact we know that we can't have them due to Pypi's policy and our apparent
need for large binaries. Given this fact and that no objection was raised by
2019-12-05 at 05:42 UTC, I conclude we have lazy consensus on stopping upload
attempts of nightly builds to Pypi.

With consensus established, we can change the CI job to stop trying to upload
the nightly builds and then request Pypi to increase the limit. Then we have one
less blocker for the 1.6 release.

Best regards
Leonard

[1]: Lower cost due to less releases, but higher cost due to 500MB -> 800MB
limit increase. Assuming that the limit increase translates into actually larger
binaries.


On Wed, 2019-12-04 at 22:20 +0100, Marco de Abreu wrote:
Are weekly releases an option? It was brought up as concern that we might
lose pip as a pretty common distribution channel where people consume
nightly builds. I don't feel like that concern has been properly addressed
so far.

-Marco

Lausen, Leonard mailto:lau...@amazon.com.invalid>> 
schrieb am Mi., 4. Dez. 2019,
04:09:

As a simple POC to test distribution, you can try installing MXNet based on
these 3 URLs:

pip install --no-cache-dir


Re: [apache/incubator-mxnet] Failed OpenMP assertion when loading MXNet compiled with DEBUG=1 (#10856)

2019-12-07 Thread Lausen, Leonard
Chris, I'm trying to understand the situation better exactly because I think
this bug is important and I would like to address it. Therefore I asked you a
question, expecting your answer would be helpful to solve this problem.
Unfortunately it seems to me that your answer misses the point of my question.

Let me reiterate the question and provide more background information.

1) It is my understanding that Intel OpenMP and LLVM OpenMP runtime differ
essentially only in the compiler used to compile them [1]

2) It is my understanding that it is generally accepted that GCC does not work
well with anything but gomp. GCC wants to force the use of libgomp at the linker
stage and this leads to a undefined situation that can cause problems like the
one we observe. Specifically, Stackoverflow describes the problem for Intel OMP
at [2].

3) There is no reason for compiling MXNet + LLVM OpenMP using the GCC compiler.
If we want LLVM OpenMP or Intel OpenMP, we can compile with LLVM. If we want
gomp, we can compile with GCC. Doing anything else seems to be only asking for
trouble. Thus I suggest we use always use the compiler provided OpenMP.

4) You state that my suggestion equals commenting out the assertion instead of
fixing the problem. It is my understanding, that the problem only occurs when 2
OpenMP libraries are linked. However, according to [2], linking 2 OpenMP
libraries is a "recipe for disaster". Why do we need to go with the "recipe for
disaster", based on the solution I suggest in point 3).

I fully understand that I don't have anywhere near your experience with OpenMP.
Therefore, help out to clarify any specific wrong conclusions or assumptions in
above point. Keep in mind:

> Those who are asked should be responsive and helpful, within the context of
> our shared goal of improving Apache project code.
https://www.apache.org/foundation/policies/conduct#specific-guidelines

[1]: https://software.intel.com/en-us/forums/intel-c-compiler/topic/793552
[2]: https://stackoverflow.com/a/26149633/2560672

Your previous answer has not addressed my code change suggestion. The code
change is specifically about avoiding to have 2 OpenMP runtimes. You have at no
point justified why you think we need to have 2 runtimes at the same time. You
must provide a technical justification showing why having only 1 runtime would
be bad, or your veto is considered invalid according to Apache's rules.

> To prevent vetos from being used capriciously, they must be accompanied by a
> technical justification showing why the change is bad (opens a security
> exposure, negatively affects performance, etc. ). A veto without a
> justification is invalid and has no weight.
https://www.apache.org/foundation/voting.html


Re: [apache/incubator-mxnet] Failed OpenMP assertion when loading MXNet compiled with DEBUG=1 (#10856)

2019-12-07 Thread Chris Olivier
if it is really a problem, then it would be prioritized. all the necessary
info is in that issue (and i already mentioned just yesterday or today on
that ticket) what it was again and it’s like i was talking to no one, as it
has been, simply an immediate revert to “remove the library”.  in the time
wasted on all this, it could have been resolved 100 times over.


I can remove just about every bug from mxnet by turning off ALL of the
features in CMakeLists.txt. no features, no bugs. This is roughly
equivalent to the approach that has been taken so far for 1.5 years, which
is not good engineering practice, ad a suggestion that I am surprised to
see championed by a committer.


Here’s another example:


Not too long ago (maybe 8 months?) there was a crash at shutdown in debug
mode in tcmalloc (gperf version of malloc, which is similar to jemalloc)
with an error message about bad pointer to free() or something like that.  At
the time, I didn’t know what caused it and so I did not block it’s removal.


fast-forward to about two months ago, where I saw the same error in a
different code base. Since it was happening to me, I was in a position to
debug it, so I did and found that a the same small static library was
linked into two different shared objects, and occasionally (depending upon
link order, I presume), a global string variable was created and destroyed
twice, because when linking, both shared object c-runtime init functions
had the same name, so mapped to the same startup routine and global data
address, so when both shared objects initialized, they called the same
address. This caused both a memory leak because the first startup string
memory allocation was discarded by the second call to the constructor and
at shutdown,  an assert in tcmalloc because the same second memory pointer
allocated was freed twice.  When tcmalloc was removed, the assert went away
but the bug, to the best of my knowledge, is still there.  If I knew then
what I know now, I would have asked the bug to be fixed rather than remove
tcmalloc.  Not because of a love for tcmalloc, but because there is
something telling you there is a bug and the bug should be fixed, because
if you just hide the bug (comment out the assert) then it’s likely to cause
other (harder to track down) problems later. So now that bug is probably
still there causing who-knows-what random crashes or undefined behavior.


This is the kind of root causing that should be done and not effectively
commenting out the assert. I believe we should insist on the highest
standards. I understand if a person does CI all day and if they find
something they can do via CI (ie turn off a feature) which makes the
problem go away, then they might feel compelled to champion that option.
Like the saying goes, “When you have a hammer in your hand, everything
looks like a nail”.


But this is not always the best solution for the project. There is a bug,
and it should be fixed because commenting out the assert just hides the bug
from plain view, but the bug remains.  Or sufficient evidence otherwise.


-Chris



On Sat, Dec 7, 2019 at 8:06 AM Leonard Lausen 
wrote:

> It appears to me that this issue only occurs when having multiple openmp
> libraries at runtime. I don't understand why we need to support this
> use-case. MKL-DNN works with whatever openmp runtime is provided by the
> compiler [1
> ].
> If you think this use-case is important, please give some more reasoning.
> If you convince me I'm happy to help to root-cause it.
>
> Otherwise I suggest to follow the simplistic approach of using the compile
> openmp runtime. If any specific openmp runtime is needed, then we can
> compile with the associated compiler (GCC, LLVM, Intel Compiler).
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> ,
> or unsubscribe
> 
> .
>


Re: Please remove conflicting Open MP version from CMake builds

2019-12-07 Thread Lausen, Leonard
Chris, if you can fix this in a small fraction of a time, please go ahead and do
so. Could you clarify why you think Intel's statement is nonsense or not
applicable? "Because different OpenMP runtimes may not be binary-compatible,
it's important to ensure that only one OpenMP runtime is used throughout the
application."

Why do we need to build LLVM OpenMP with GCC? If we want LLVM OpenMP, why not
just use LLVM compiler? The LLVM OpenMP in MXNet repo has currently not been
updated since 2 years. If we rely on the compiler version, we don't have to
maintain OpenMP in our repo.

Thank you
Leonard

On Sat, 2019-12-07 at 07:03 -0800, Chris Olivier wrote:
> -1
> 
> mkldnn removed omp5 for licencing issues
> no bugs have actually been traced to the use of llvm openmp. only an assert
> caused by an actual bug in mxnet code. there are suitable workarounds.
> 
> over time llvm omp has simply been used as a “catch all” for random
> problems that aren’t related at all (such as getenv race condition in an
> atfork call that isn’t even part of an omp parallel region).
> 
> proposal is now and has always been roughly equivalent to the idea of
> “comment out an assert rather than fix the bug it’s reporting”.
> 
> Up until very recently, Makefile version of mxnet used libomp5 for YEARS
> and not libgomp, with no issue reported (omp not built in debug mode), so
> the equivalent configuration from CMake mysteriously causing myriads if
> problems has questionable merit and smells more like a hubris situation.
> 
> I use tensorflow as well and it links to libomp5 rather than libgomp.
> 
> if the assert problem is really a problem, the bug being reported would be
> prioritized and fixed. it should be fixed regardless. all the time spent by
> some CI people trying to remove this could have simply fixed the actual bug
> in a small fraction of the time.
> 
> 
> On Fri, Dec 6, 2019 at 8:44 PM Lausen, Leonard 
> wrote:
> 
> > I think it's reasonable to assume that the Intel MKLDNN team is an
> > "authorative"
> > source about the issue of compilation with OpenMP and the OpenMP runtime
> > library
> > related issues. Thus I suggest we follow the recommendation of Intel
> > MKLDNN team
> > within the MXNet project.
> > 
> > Looking through the Intel MKLDNN documentation, I find [1]:
> > 
> > > DNNL uses OpenMP runtime library provided by the compiler.
> > 
> > as well as
> > 
> > > it's important to ensure that only one OpenMP runtime is used throughout
> > the
> > > application. Having more than one OpenMP runtime linked to an executable
> > may
> > > lead to undefined behavior including incorrect results or crashes.
> > 
> > To keep our project maintainable and error free, I thus suggest we follow
> > DNNL
> > and use the OpenMP runtime library provided by the compiler.
> > We have limited ressources and finding the root cause for any bugs
> > resulting
> > from linking multiple OpenMP libraries as currently done is, in my
> > opinion. not
> > a good use of time. We know it's due to undefined behavior and we know
> > it's best
> > practice to use OpenMP runtime library provided by the compiler. So let's
> > just
> > do that.
> > 
> > I think given that MKL-DNN has also adopted the "OpenMP runtime library
> > provided
> > by the compiler" approach, this issue is not contentious anymore and
> > qualifies
> > for lazy consensus.
> > 
> > Thus if there is no objection within 72 hours (lazy consensus), let's drop
> > bundled LLVM OpenMP from master [2]. If we find any issues due to
> > droppeing the
> > bundled LLVM OpenMP, we can always add it back prior to the next release.
> > 
> > Best regards
> > Leonard
> > 
> > [1]:
> > 
> > https://github.com/intel/mkl-dnn/blob/433e086bf5d9e5ccfc9ec0b70322f931b6b1921d/doc/build/build_options.md#openmp
> > (This is the updated reference from Anton's previous comment, based on the
> > changes in MKLDNN done in the meantime
> > https://github.com/apache/incubator-mxnet/pull/12160#issuecomment-415078066
> > )
> > [2]: Alike https://github.com/apache/incubator-mxnet/pull/12160
> > 
> > 
> > On Fri, 2019-12-06 at 12:16 -0800, Pedro Larroy wrote:
> > > I will try to stay on the sidelines for now since previous conversations
> > > about OMP have not been productive here and I have spent way too much
> > time
> > > on this already, I'm not the first one giving up on trying to help with
> > > this topic.
> > > 
> > > I would be glad if you guys can work together and find a solution. I will
> > > just put my understanding of the big picture hoping that it helps move it
> > > forward.
> > > 
> > > 
> > > Recently the intel omp library which seemed to have the best performance
> > of
> > > the 3 was removed from MKL.
> > > 
> > > - There's 3 libraries in play, GNU Omp which is shipped with gcc (gomp),
> > > LLVM openmp in 3rdparty (llvm-omp), Intel OMP when using MKL, which is
> > > recently removed (iomp)
> > > 
> > > - IOMP seems to have the best performance, there's stability issues
> > > producing 

Re: Please remove conflicting Open MP version from CMake builds

2019-12-07 Thread Chris Olivier
-1

mkldnn removed omp5 for licencing issues
no bugs have actually been traced to the use of llvm openmp. only an assert
caused by an actual bug in mxnet code. there are suitable workarounds.

over time llvm omp has simply been used as a “catch all” for random
problems that aren’t related at all (such as getenv race condition in an
atfork call that isn’t even part of an omp parallel region).

proposal is now and has always been roughly equivalent to the idea of
“comment out an assert rather than fix the bug it’s reporting”.

Up until very recently, Makefile version of mxnet used libomp5 for YEARS
and not libgomp, with no issue reported (omp not built in debug mode), so
the equivalent configuration from CMake mysteriously causing myriads if
problems has questionable merit and smells more like a hubris situation.

I use tensorflow as well and it links to libomp5 rather than libgomp.

if the assert problem is really a problem, the bug being reported would be
prioritized and fixed. it should be fixed regardless. all the time spent by
some CI people trying to remove this could have simply fixed the actual bug
in a small fraction of the time.


On Fri, Dec 6, 2019 at 8:44 PM Lausen, Leonard 
wrote:

> I think it's reasonable to assume that the Intel MKLDNN team is an
> "authorative"
> source about the issue of compilation with OpenMP and the OpenMP runtime
> library
> related issues. Thus I suggest we follow the recommendation of Intel
> MKLDNN team
> within the MXNet project.
>
> Looking through the Intel MKLDNN documentation, I find [1]:
>
> > DNNL uses OpenMP runtime library provided by the compiler.
>
> as well as
>
> > it's important to ensure that only one OpenMP runtime is used throughout
> the
> > application. Having more than one OpenMP runtime linked to an executable
> may
> > lead to undefined behavior including incorrect results or crashes.
>
> To keep our project maintainable and error free, I thus suggest we follow
> DNNL
> and use the OpenMP runtime library provided by the compiler.
> We have limited ressources and finding the root cause for any bugs
> resulting
> from linking multiple OpenMP libraries as currently done is, in my
> opinion. not
> a good use of time. We know it's due to undefined behavior and we know
> it's best
> practice to use OpenMP runtime library provided by the compiler. So let's
> just
> do that.
>
> I think given that MKL-DNN has also adopted the "OpenMP runtime library
> provided
> by the compiler" approach, this issue is not contentious anymore and
> qualifies
> for lazy consensus.
>
> Thus if there is no objection within 72 hours (lazy consensus), let's drop
> bundled LLVM OpenMP from master [2]. If we find any issues due to
> droppeing the
> bundled LLVM OpenMP, we can always add it back prior to the next release.
>
> Best regards
> Leonard
>
> [1]:
>
> https://github.com/intel/mkl-dnn/blob/433e086bf5d9e5ccfc9ec0b70322f931b6b1921d/doc/build/build_options.md#openmp
> (This is the updated reference from Anton's previous comment, based on the
> changes in MKLDNN done in the meantime
> https://github.com/apache/incubator-mxnet/pull/12160#issuecomment-415078066
> )
> [2]: Alike https://github.com/apache/incubator-mxnet/pull/12160
>
>
> On Fri, 2019-12-06 at 12:16 -0800, Pedro Larroy wrote:
> > I will try to stay on the sidelines for now since previous conversations
> > about OMP have not been productive here and I have spent way too much
> time
> > on this already, I'm not the first one giving up on trying to help with
> > this topic.
> >
> > I would be glad if you guys can work together and find a solution. I will
> > just put my understanding of the big picture hoping that it helps move it
> > forward.
> >
> >
> > Recently the intel omp library which seemed to have the best performance
> of
> > the 3 was removed from MKL.
> >
> > - There's 3 libraries in play, GNU Omp which is shipped with gcc (gomp),
> > LLVM openmp in 3rdparty (llvm-omp), Intel OMP when using MKL, which is
> > recently removed (iomp)
> >
> > - IOMP seems to have the best performance, there's stability issues
> > producing crashes sometimes but the impact seems relatively small for
> users
> > and developers. In general seems linking with a different OMP version
> that
> > the one shipped with the compiler is known to cause stability issues but
> > it's done anyway.
> >
> > - LLVM-OMP used when building with CMake, not used in the PIP releases or
> > when building with Make. Has stability issues, hangs when running in
> debug
> > mode during test execution and produces tons of assertions in debug mode.
> > Might have some small performance gains but there is no clear cut data
> that
> > showcases significant performance gains.
> >
> > - GOMP is the version shipped with GCC and the PIP wheels without MKL,
> has
> > no stability problems.
> >
> > As a ballpark, IOMP might give 10% performance improvement in some cases.
> >
> > We need to document well how users should tune and configure MXNet when
> > 

[apache/incubator-mxnet] [RFC] [Gluon] Accumulating loss in the forward phase (#17004)

2019-12-07 Thread Xi Wang
## Description

In `tf.keras`, users could call `add_loss` method to create some non-standard 
loss function (when I say standard, I mean loss function that takes parameters 
other than `y_true` and `y_pred`), e.g. loss function that involves the input.

https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer#add_loss

A practical example would be Bayesian Neural Network:
```python
model = tf.keras.Sequential([
  tfp.layers.DenseReparameterization(512, activation=tf.nn.relu),
  tfp.layers.DenseReparameterization(10),
  ])
logits = model(features)
neg_log_likelihood = tf.nn.softmax_cross_entropy_with_logits(
  labels=labels, logits=logits)
kl = sum(model.losses)
loss = neg_log_likelihood + kl
train_op = tf.train.AdamOptimizer().minimize(loss)
```
source: 
https://github.com/tensorflow/probability/blob/r0.8/tensorflow_probability/python/layers/dense_variational.py#L356

In this case, the loss is composed of two parts: classification error and the 
loss inside `DenseReparameterization`(which is the KL divergence between the 
posterior and prior of weights in each layer)(i.e. model.losses). This is 
achieved by utilizing `add_loss` method.

___
However, this feature is currently not supported by Gluon.

In order to implement it, I tired the following code :
```python
class StochasticBlock(nn.HybridBlock):
  def __init__(self):
super(StochasticBlock, self).__init__()
self._losses = []

  def add_loss(self, loss):
self._losses.append(loss)

  @property
  def losses(self):
collected_losses = []
collected_losses.extend(self._losses)
for child in self._children.values():
  if hasattr(child, '_losses'):
collected_losses.extend(getattr(child, '_losses'))
return collected_losses

class DiagGaussian(StochasticBlock):
  def __init__(self):
super(DiagGaussian, self).__init__()

  def hybrid_forward(self, F, loc, scale):
log_variance = F.np.log(1e-20 + scale ** 2)
KL = 0.5 * F.np.sum(1 + log_variance - loc ** 2 - F.np.exp(log_variance), 
axis=1)
self.add_loss(KL)
return (F.np.random.normal(loc, scale))

diagGaussian = DiagGaussian()
loc = np.random.uniform(-10, 10, size=(2,2))
scale = np.random.uniform(size=(2,2))
diagGaussian.hybridize()
print(diagGaussian(loc, scale))
print(diagGaussian.losses[0])
```
It worked well, if not turning hybridize, otherwise the `losses[0]` would 
become `<_Symbol diaggaussian0_multiply_scalar0>` instead of some concrete 
value.

I am actively looking for other solutions to this problem, a potential 
workaround would be forcing `losses` to be one of the block's output. Not sure 
if it is gonna work in `Sequential`, it's also super not elegant =_=

Having this feature could bring huge convenience for the implementation of deep 
generative models (such as VAE.)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17004

Re: Can upgrade windows CI cmake?

2019-12-07 Thread shiwen hu
i test 3.12.2 3.13.3 3.14.2 3.15.5

shiwen hu  于2019年12月7日周六 下午7:28写道:

> yes.
>
> Lausen, Leonard  于2019年12月7日周六 下午7:20写道:
>
>> Do you mean starting 3.15.5 it works fine?
>> The image you attached doesn't display on my end.
>>
>> On Dec 7, 2019 19:12, shiwen hu  wrote:
>> [image.png]
>>
>> I tested these versions.  until 3.15.5 is working fine.
>>
>> shiwen hu mailto:yajiedes...@gmail.com>>
>> 于2019年12月7日周六 下午1:24写道:
>> Now, other problems are solved by modifying CMakeLists.txt.but The
>> command line is too long problem must update cmake.However I don't know
>> which minimum version fixed the problem.I try to do some tests to find out
>> the minimum version.
>>
>> Pedro Larroy > pedro.larroy.li...@gmail.com>> 于2019年12月7日周六 上午3:52写道:
>> CMake shipped with ubuntu has issues when compiling with CUDA on GPU
>> instances.  I wouldn't recommend anything older than 3.12 for Linux GPU
>>
>>
>> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh#L63
>>
>> I don't know about windows CMake version but would make sense to require a
>> newer version.
>>
>> On Thu, Dec 5, 2019 at 7:26 PM Lausen, Leonard > >
>> wrote:
>>
>> > Currently we declare cmake_minimum_required(VERSION 3.0.2)
>> >
>> > I'm in favor of updating our CMake requirement. The main question may be
>> > what
>> > new version to pick as minimum requirement.
>> >
>> > In general, there is the guideline
>> >
>> > > You really should at least use a version of CMake that came out after
>> > your
>> > > compiler, since it needs to know compiler flags, etc, for that
>> version.
>> > And,
>> > > since CMake will dumb itself down to the minimum required version in
>> your
>> > > CMake file, installing a new CMake, even system wide, is pretty safe.
>> You
>> > > should at least install it locally. It's easy (1-2 lines in many
>> cases),
>> > and
>> > > you'll find that 5 minutes of work will save you hundreds of lines and
>> > hours
>> > > of CMakeLists.txt writing, and will be much easier to maintain in the
>> > long
>> > > run.
>> > https://cliutils.gitlab.io/modern-cmake/
>> >
>> > https://cliutils.gitlab.io/modern-cmake/chapters/intro/newcmake.html
>> > gives a
>> > short overview of all the improvements made to CMake over the past 6
>> years.
>> >
>> > It's easy for users to upgrade their cmake version with pip:
>> >   pip install --upgrade --user cmake
>> > Thus it wouldn't be overly problematic to rely on a very recent version
>> of
>> > cmake, if indeed it's required.
>> >
>> > Nevertheless, if an earlier version fixes the problems, let's rather
>> pick
>> > that
>> > one. Did you confirm which version is required to fix the problem?
>> >
>> > For now you could try if the CMake version shipped in the oldest
>> supported
>> > Ubuntu LTS release (Ubuntu 16.04) is fixing your problem (CMake 3.5)? If
>> > not,
>> > please test if CMake version shipped in Ubuntu 18.04 (CMake 3.10) fixes
>> > your
>> > issue.
>> >
>> > Thanks
>> > Leonard
>> >
>> > On Fri, 2019-12-06 at 08:45 +0800, shiwen hu wrote:
>> > > i am send a pr  https://github.com/apache/incubator-mxnet/pull/16980
>> to
>> > > change windows build system.but now ci cmake version seems to be a
>> bug.
>> > > can't to compile.can upgrade to 3.16.0?
>> >
>>
>>


Re: Can upgrade windows CI cmake?

2019-12-07 Thread shiwen hu
yes.

Lausen, Leonard  于2019年12月7日周六 下午7:20写道:

> Do you mean starting 3.15.5 it works fine?
> The image you attached doesn't display on my end.
>
> On Dec 7, 2019 19:12, shiwen hu  wrote:
> [image.png]
>
> I tested these versions.  until 3.15.5 is working fine.
>
> shiwen hu mailto:yajiedes...@gmail.com>>
> 于2019年12月7日周六 下午1:24写道:
> Now, other problems are solved by modifying CMakeLists.txt.but The command
> line is too long problem must update cmake.However I don't know which
> minimum version fixed the problem.I try to do some tests to find out the
> minimum version.
>
> Pedro Larroy  pedro.larroy.li...@gmail.com>> 于2019年12月7日周六 上午3:52写道:
> CMake shipped with ubuntu has issues when compiling with CUDA on GPU
> instances.  I wouldn't recommend anything older than 3.12 for Linux GPU
>
>
> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh#L63
>
> I don't know about windows CMake version but would make sense to require a
> newer version.
>
> On Thu, Dec 5, 2019 at 7:26 PM Lausen, Leonard 
> wrote:
>
> > Currently we declare cmake_minimum_required(VERSION 3.0.2)
> >
> > I'm in favor of updating our CMake requirement. The main question may be
> > what
> > new version to pick as minimum requirement.
> >
> > In general, there is the guideline
> >
> > > You really should at least use a version of CMake that came out after
> > your
> > > compiler, since it needs to know compiler flags, etc, for that version.
> > And,
> > > since CMake will dumb itself down to the minimum required version in
> your
> > > CMake file, installing a new CMake, even system wide, is pretty safe.
> You
> > > should at least install it locally. It's easy (1-2 lines in many
> cases),
> > and
> > > you'll find that 5 minutes of work will save you hundreds of lines and
> > hours
> > > of CMakeLists.txt writing, and will be much easier to maintain in the
> > long
> > > run.
> > https://cliutils.gitlab.io/modern-cmake/
> >
> > https://cliutils.gitlab.io/modern-cmake/chapters/intro/newcmake.html
> > gives a
> > short overview of all the improvements made to CMake over the past 6
> years.
> >
> > It's easy for users to upgrade their cmake version with pip:
> >   pip install --upgrade --user cmake
> > Thus it wouldn't be overly problematic to rely on a very recent version
> of
> > cmake, if indeed it's required.
> >
> > Nevertheless, if an earlier version fixes the problems, let's rather pick
> > that
> > one. Did you confirm which version is required to fix the problem?
> >
> > For now you could try if the CMake version shipped in the oldest
> supported
> > Ubuntu LTS release (Ubuntu 16.04) is fixing your problem (CMake 3.5)? If
> > not,
> > please test if CMake version shipped in Ubuntu 18.04 (CMake 3.10) fixes
> > your
> > issue.
> >
> > Thanks
> > Leonard
> >
> > On Fri, 2019-12-06 at 08:45 +0800, shiwen hu wrote:
> > > i am send a pr  https://github.com/apache/incubator-mxnet/pull/16980
> to
> > > change windows build system.but now ci cmake version seems to be a bug.
> > > can't to compile.can upgrade to 3.16.0?
> >
>
>


Re: Can upgrade windows CI cmake?

2019-12-07 Thread Lausen, Leonard
Do you mean starting 3.15.5 it works fine?
The image you attached doesn't display on my end.

On Dec 7, 2019 19:12, shiwen hu  wrote:
[image.png]

I tested these versions.  until 3.15.5 is working fine.

shiwen hu mailto:yajiedes...@gmail.com>> 于2019年12月7日周六 
下午1:24写道:
Now, other problems are solved by modifying CMakeLists.txt.but The command line 
is too long problem must update cmake.However I don't know which minimum 
version fixed the problem.I try to do some tests to find out the minimum 
version.

Pedro Larroy 
mailto:pedro.larroy.li...@gmail.com>> 
于2019年12月7日周六 上午3:52写道:
CMake shipped with ubuntu has issues when compiling with CUDA on GPU
instances.  I wouldn't recommend anything older than 3.12 for Linux GPU

https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh#L63

I don't know about windows CMake version but would make sense to require a
newer version.

On Thu, Dec 5, 2019 at 7:26 PM Lausen, Leonard 
wrote:

> Currently we declare cmake_minimum_required(VERSION 3.0.2)
>
> I'm in favor of updating our CMake requirement. The main question may be
> what
> new version to pick as minimum requirement.
>
> In general, there is the guideline
>
> > You really should at least use a version of CMake that came out after
> your
> > compiler, since it needs to know compiler flags, etc, for that version.
> And,
> > since CMake will dumb itself down to the minimum required version in your
> > CMake file, installing a new CMake, even system wide, is pretty safe. You
> > should at least install it locally. It's easy (1-2 lines in many cases),
> and
> > you'll find that 5 minutes of work will save you hundreds of lines and
> hours
> > of CMakeLists.txt writing, and will be much easier to maintain in the
> long
> > run.
> https://cliutils.gitlab.io/modern-cmake/
>
> https://cliutils.gitlab.io/modern-cmake/chapters/intro/newcmake.html
> gives a
> short overview of all the improvements made to CMake over the past 6 years.
>
> It's easy for users to upgrade their cmake version with pip:
>   pip install --upgrade --user cmake
> Thus it wouldn't be overly problematic to rely on a very recent version of
> cmake, if indeed it's required.
>
> Nevertheless, if an earlier version fixes the problems, let's rather pick
> that
> one. Did you confirm which version is required to fix the problem?
>
> For now you could try if the CMake version shipped in the oldest supported
> Ubuntu LTS release (Ubuntu 16.04) is fixing your problem (CMake 3.5)? If
> not,
> please test if CMake version shipped in Ubuntu 18.04 (CMake 3.10) fixes
> your
> issue.
>
> Thanks
> Leonard
>
> On Fri, 2019-12-06 at 08:45 +0800, shiwen hu wrote:
> > i am send a pr  https://github.com/apache/incubator-mxnet/pull/16980 to
> > change windows build system.but now ci cmake version seems to be a bug.
> > can't to compile.can upgrade to 3.16.0?
>



Re: Can upgrade windows CI cmake?

2019-12-07 Thread shiwen hu
[image: image.png]

I tested these versions.  until 3.15.5 is working fine.

shiwen hu  于2019年12月7日周六 下午1:24写道:

> Now, other problems are solved by modifying CMakeLists.txt.but The command
> line is too long problem must update cmake.However I don't know which
> minimum version fixed the problem.I try to do some tests to find out the
> minimum version.
>
> Pedro Larroy  于2019年12月7日周六 上午3:52写道:
>
>> CMake shipped with ubuntu has issues when compiling with CUDA on GPU
>> instances.  I wouldn't recommend anything older than 3.12 for Linux GPU
>>
>>
>> https://github.com/apache/incubator-mxnet/blob/master/ci/docker/install/ubuntu_core.sh#L63
>>
>> I don't know about windows CMake version but would make sense to require a
>> newer version.
>>
>> On Thu, Dec 5, 2019 at 7:26 PM Lausen, Leonard > >
>> wrote:
>>
>> > Currently we declare cmake_minimum_required(VERSION 3.0.2)
>> >
>> > I'm in favor of updating our CMake requirement. The main question may be
>> > what
>> > new version to pick as minimum requirement.
>> >
>> > In general, there is the guideline
>> >
>> > > You really should at least use a version of CMake that came out after
>> > your
>> > > compiler, since it needs to know compiler flags, etc, for that
>> version.
>> > And,
>> > > since CMake will dumb itself down to the minimum required version in
>> your
>> > > CMake file, installing a new CMake, even system wide, is pretty safe.
>> You
>> > > should at least install it locally. It's easy (1-2 lines in many
>> cases),
>> > and
>> > > you'll find that 5 minutes of work will save you hundreds of lines and
>> > hours
>> > > of CMakeLists.txt writing, and will be much easier to maintain in the
>> > long
>> > > run.
>> > https://cliutils.gitlab.io/modern-cmake/
>> >
>> > https://cliutils.gitlab.io/modern-cmake/chapters/intro/newcmake.html
>> > gives a
>> > short overview of all the improvements made to CMake over the past 6
>> years.
>> >
>> > It's easy for users to upgrade their cmake version with pip:
>> >   pip install --upgrade --user cmake
>> > Thus it wouldn't be overly problematic to rely on a very recent version
>> of
>> > cmake, if indeed it's required.
>> >
>> > Nevertheless, if an earlier version fixes the problems, let's rather
>> pick
>> > that
>> > one. Did you confirm which version is required to fix the problem?
>> >
>> > For now you could try if the CMake version shipped in the oldest
>> supported
>> > Ubuntu LTS release (Ubuntu 16.04) is fixing your problem (CMake 3.5)? If
>> > not,
>> > please test if CMake version shipped in Ubuntu 18.04 (CMake 3.10) fixes
>> > your
>> > issue.
>> >
>> > Thanks
>> > Leonard
>> >
>> > On Fri, 2019-12-06 at 08:45 +0800, shiwen hu wrote:
>> > > i am send a pr  https://github.com/apache/incubator-mxnet/pull/16980
>> to
>> > > change windows build system.but now ci cmake version seems to be a
>> bug.
>> > > can't to compile.can upgrade to 3.16.0?
>> >
>>
>