Re: jemalloc 5 incompatibility

2020-01-19 Thread Marco de Abreu
Building as part of cmake once upstream is stable sounds like a good
approach. Thanks for catching this issue early on!

-Marco

Lausen, Leonard  schrieb am So., 19. Jan. 2020,
11:20:

> As of jemalloc 5, jemalloc default build can not be used in libraries that
> are
> dlopened. However, libmxnet.so is dlopened by Python (ctypes). To use
> MXNet with
> jemalloc 5, users must not link to system libjemalloc.so but must rather
> link to
> a libjemalloc compiled with special parameters to allow dlopen to work.
> See
> https://github.com/jemalloc/jemalloc/issues/937
>
> jemalloc 5 is distributed as part of Ubuntu 18.10 and higher, as well as
> Debian
> Stable. Users on these systems will be unable to use MXNet after compiling
> with
> USE_JEMALLOC=ON on systems with libjemalloc-dev, which is the default
> setting.
>
> Thus in https://github.com/apache/incubator-mxnet/pull/17324 I suggest to
> disable Jemalloc by default in the source build. Auto-detecting the
> version of
> jemalloc is not helpful, because over time less and less systems will come
> with
> a working version of jemalloc.
>
> Please go ahead and approve the PR if you agree.
>
> The better solution is to build jemalloc as part of our build. See
> https://github.com/apache/incubator-mxnet/pull/17121
> But as CMake build of jemalloc is not yet integrated upstream, this
> approach
> currently relies on a development branch of jemalloc. Thus it was
> suggested to
> revisit the approach once CMake integration is stable upstream.
>


Re: Stop redistributing source code of 3rdparty dependencies to avoid licensing issues

2020-01-19 Thread Pedro Larroy
-1

I think is brittle to download a piece of source code that needs network
connectivity to build. The network is always in flux. Source archives that
need to download too many dependencies to build will end up broken with
time. I would expect source to build with a reasonable set of well known
system dependencies.


On Friday, January 17, 2020, Marco de Abreu  wrote:
> I agree with Tianqi. We may change our build system, but this won't free
us
> from the necessity to validate the licenses of our dependencies.
>
> The question at this point is whether we are allowed to differentiate
> between our main-source and hold it to the strict standards while treating
> the third party folder as dependency, where we only have to verify that
the
> projects are licensed with an Apache compatible license.
>
> At the moment, the project already treats them different: our license
> checks exclude third party. I think this is where the disparity is coming
> from. I'd recommend we discuss with Apache how we can handle this
> situation: package third party code for user convenience while limiting
> responsibility.
>
> In the end, we still have to ensure that everything is licensed properly,
> so maybe we should try to align both processes to match the real world
> instead of changing the real world to match the process.
>
> -Marco
>
> Tianqi Chen  schrieb am Fr., 17. Jan. 2020,
20:44:
>
>> I don't have an opinion, but would like to list pros and cons of doing
so.
>>
>> The pro of doing so is that it indeed simplifies the release process, as
>> these additional dependencies becomes category-B level dependencies as in
>> https://www.apache.org/legal/resolved.html
>>
>> The con of doing so is that it brings additional burden to the users of
the
>> software to check the license of these dependencies, in some sense,
>> including these information in the
>> license actually gives an extra level of transparency.
>>
>> The copyright message in some of the dependencies are a bit unfortunate,
>> one potential way to run the check is to write a python script to go
>> through the files and detect the line Copyright and cross match and add
>> them.
>>
>> Note that good models to follow are
>> - hadoop: https://github.com/apache/hadoop/tree/trunk/licenses
>> - flink: https://github.com/apache/flink
>>
>> Each of the repo have a licenses folder that contains licenses, and
things
>> points to them.
>>
>> I am not a lawyer, but the case for ps-lite seems can be resolved as long
>> as we can confirm these files follows Apache-2.0, as
>> https://www.apache.org/licenses/LICENSE-2.0 only requires us to
>> redistribute
>> the license and anything in the NOTICE, but we do not have the obligation
>> to list all the copyright messages in the source content.
>>
>> TQ
>>
>> On Fri, Jan 17, 2020 at 11:10 AM Yuan Tang 
>> wrote:
>>
>> > +1
>> >
>> > On Fri, Jan 17, 2020 at 1:59 PM Chris Olivier 
>> > wrote:
>> >
>> > > +1
>> > >
>> > > On Fri, Jan 17, 2020 at 10:19 AM Lausen, Leonard
>> > > > > >
>> > > wrote:
>> > >
>> > > > Dear MXNet community,
>> > > >
>> > > > as per recent mail on gene...@incubator.apache.org [1] there are a
>> > > number
>> > > > of
>> > > > licensing issues in MXNet 1.6rc1. Based on anecdotal evidence I
>> believe
>> > > > there
>> > > > has been no release so far without any licensing issues, which is a
>> > > > blocker to
>> > > > MXNet graduating from it's incubating status. One contributing
factor
>> > is
>> > > > that we
>> > > > bundle 3rdparty source code in our releases [2].
>> > > >
>> > > > One key factor is that 3rdparty projects don't always enforce
>> licensing
>> > > > best
>> > > > practice in the way we do. For example, 3rdparty/ps-lite doesn't
>> > enforce
>> > > > license
>> > > > headers in the source files and there has been confusion about the
>> > > license
>> > > > of
>> > > > recent contributions by ByteDance (See [1]).
>> > > >
>> > > > To avoid such licensing issues in MXNet releases a simple solution
is
>> > to
>> > > > stop
>> > > > distributing the 3rdparty code in our source releases. Instead, we
>> can
>> > > > adapt our
>> > > > buildsystem to download 3rdparty code as part of the build
>> > configuration
>> > > > process. CMake makes this very easy with the FetchContent module
[3].
>> > > >
>> > > > For development purpose involving changes to the 3rdparty source or
>> > build
>> > > > systems that can't access the internet, there are easy means for
>> > > > specifying the
>> > > > location of local sources (instead of downloading), via the
>> > > > FETCHCONTENT_SOURCE_DIR_ variable [4].
>> > > >
>> > > > Would there be any concerns about such approach? Obviously it can
>> only
>> > be
>> > > > fully
>> > > > implemented as soon as the CMake build system is feature complete
and
>> > the
>> > > > Makefile build can be dropped. (Note that the Makefile build is
being
>> > > > deprecated
>> > > > and removed as part of MXNet 2 roadmap [5])
>> > > >
>> > > > Best regards
>> > > > Leonard
>> > > >
>>

jemalloc 5 incompatibility

2020-01-19 Thread Lausen, Leonard
As of jemalloc 5, jemalloc default build can not be used in libraries that are
dlopened. However, libmxnet.so is dlopened by Python (ctypes). To use MXNet with
jemalloc 5, users must not link to system libjemalloc.so but must rather link to
a libjemalloc compiled with special parameters to allow dlopen to work. See 
https://github.com/jemalloc/jemalloc/issues/937

jemalloc 5 is distributed as part of Ubuntu 18.10 and higher, as well as Debian
Stable. Users on these systems will be unable to use MXNet after compiling with
USE_JEMALLOC=ON on systems with libjemalloc-dev, which is the default setting.

Thus in https://github.com/apache/incubator-mxnet/pull/17324 I suggest to
disable Jemalloc by default in the source build. Auto-detecting the version of
jemalloc is not helpful, because over time less and less systems will come with
a working version of jemalloc.

Please go ahead and approve the PR if you agree.

The better solution is to build jemalloc as part of our build. See 
https://github.com/apache/incubator-mxnet/pull/17121
But as CMake build of jemalloc is not yet integrated upstream, this approach
currently relies on a development branch of jemalloc. Thus it was suggested to
revisit the approach once CMake integration is stable upstream.