Re: Stop redistributing source code of 3rdparty dependencies to avoid licensing issues
On Mon, Jan 20, 2020 at 1:17 PM Lausen, Leonard wrote: > If I don't misremember, our mentor Markus Weimer suggested at KDD 2019 in a > conversation that it's easiest to stop bundling non-ASF 3rdparty code. You do not misremember :-) If possible, it is easiest for everyone involved to capture the dependencies in a clean way for the build system to download. That being said, I came to this conclusion in the relative sane space of Java / Maven projects. I believe that this isn't the state of the art in C++, and might be difficult to do. Has anyone looked into whether the dependencies are or could be made available e.g. as VCpkgs? [0] The other aspect to consider are situations where the code mxnet depends on is actually also managed by people in the mxnet community. I believe a lot of that was the case when mxnet started in the incubator from dmlc. For those cases, one could consider inverting the dependency relationship: The ASF / ASL is designed to be the universal donor of software. Hence, it is unlikely that depending on any ASF code would be a problem for current dependencies of the DMLC code. Markus [0]: https://github.com/Microsoft/vcpkg
Re: Stop redistributing source code of 3rdparty dependencies to avoid licensing issues
Quote from Tianqi: > The pro of doing so is that it indeed simplifies the release process, as > these additional dependencies becomes category-B level dependencies as in > https://www.apache.org/legal/resolved.html Why would the dependencies become category-B level? It seems all licensing considerations only apply to ASF distributions (source or convenience binary). Category-B level software is software that can't be included in the source distribution but may be included in ASF convenience binaries. https://www.apache.org/legal/resolved.html#binary-only-inclusion-condition I believe we currently do not have ASF convenience binaries (though there are some convenience binaries unrelated to ASF published on Pypi and S3 Buckets). With respect to the source distribution, no licensing considerations apply to non-bundled dependencies: "LICENSE and NOTICE MUST NOT provide unnecessary information about materials which are not bundled in the package, such as separately downloaded dependencies." http://www.apache.org/legal/release-policy.html#licensing-documentation > The con of doing so is that it brings additional burden to the users of the > software to check the license of these dependencies, in some sense, > including these information in the > license actually gives an extra level of transparency. I agree. We haven't been successful with providing this transparency yet though given the licensing issues at every release. Quote from Marco: > The question at this point is whether we are allowed to differentiate > between our main-source and hold it to the strict standards while treating > the third party folder as dependency, where we only have to verify that the > projects are licensed with an Apache compatible license. I don't think so. If it's in the source distribution, it must be appropriately declared in LICENSE and NOTICE. > At the moment, the project already treats them different: our license > checks exclude third party. I think this is where the disparity is coming > from. Indeed, we currently don't check 3rdparty code in an automated way. I'm not sure how big of an overhead maintaining more fine-grained excludes is (compared to current exclude all in 3rdparty). When writing the mail, I assumed the overhead would be significant. > I'd recommend we discuss with Apache how we can handle this > situation: package third party code for user convenience while limiting > responsibility. In the end, we still have to ensure that everything is > licensed properly, so maybe we should try to align both processes to match the > real world instead of changing the real world to match the process. What do you mean with "changing the real world to match the process"? How to "align both processes to match the real world"? As PPMC member, would you be able to ask ASF for a recommendation? If I don't misremember, our mentor Markus Weimer suggested at KDD 2019 in a conversation that it's easiest to stop bundling non-ASF 3rdparty code. Quote from Pedro: > Source archives that need to download too many dependencies to build will end > up broken with time. I would expect source to build with a reasonable set of > well known system dependencies. Yes, it's a valid consideration. Best regards Leonard On Fri, 2020-01-17 at 22:04 +0100, Marco de Abreu wrote: > I agree with Tianqi. We may change our build system, but this won't free us > from the necessity to validate the licenses of our dependencies. > > The question at this point is whether we are allowed to differentiate > between our main-source and hold it to the strict standards while treating > the third party folder as dependency, where we only have to verify that the > projects are licensed with an Apache compatible license. > > At the moment, the project already treats them different: our license > checks exclude third party. I think this is where the disparity is coming > from. I'd recommend we discuss with Apache how we can handle this > situation: package third party code for user convenience while limiting > responsibility. > > In the end, we still have to ensure that everything is licensed properly, > so maybe we should try to align both processes to match the real world > instead of changing the real world to match the process. > > -Marco > > Tianqi Chen schrieb am Fr., 17. Jan. 2020, 20:44: > > > I don't have an opinion, but would like to list pros and cons of doing so. > > > > The pro of doing so is that it indeed simplifies the release process, as > > these additional dependencies becomes category-B level dependencies as in > > https://www.apache.org/legal/resolved.html > > > > The con of doing so is that it brings additional burden to the users of the > > software to check the license of these dependencies, in some sense, > > including these information in the > > license actually gives an extra level of transparency. > > > > The copyright message in some of the dependencies are a bit unfortunate, > > one potential way to run the che
Re: Stop redistributing source code of 3rdparty dependencies to avoid licensing issues
-1 I think is brittle to download a piece of source code that needs network connectivity to build. The network is always in flux. Source archives that need to download too many dependencies to build will end up broken with time. I would expect source to build with a reasonable set of well known system dependencies. On Friday, January 17, 2020, Marco de Abreu wrote: > I agree with Tianqi. We may change our build system, but this won't free us > from the necessity to validate the licenses of our dependencies. > > The question at this point is whether we are allowed to differentiate > between our main-source and hold it to the strict standards while treating > the third party folder as dependency, where we only have to verify that the > projects are licensed with an Apache compatible license. > > At the moment, the project already treats them different: our license > checks exclude third party. I think this is where the disparity is coming > from. I'd recommend we discuss with Apache how we can handle this > situation: package third party code for user convenience while limiting > responsibility. > > In the end, we still have to ensure that everything is licensed properly, > so maybe we should try to align both processes to match the real world > instead of changing the real world to match the process. > > -Marco > > Tianqi Chen schrieb am Fr., 17. Jan. 2020, 20:44: > >> I don't have an opinion, but would like to list pros and cons of doing so. >> >> The pro of doing so is that it indeed simplifies the release process, as >> these additional dependencies becomes category-B level dependencies as in >> https://www.apache.org/legal/resolved.html >> >> The con of doing so is that it brings additional burden to the users of the >> software to check the license of these dependencies, in some sense, >> including these information in the >> license actually gives an extra level of transparency. >> >> The copyright message in some of the dependencies are a bit unfortunate, >> one potential way to run the check is to write a python script to go >> through the files and detect the line Copyright and cross match and add >> them. >> >> Note that good models to follow are >> - hadoop: https://github.com/apache/hadoop/tree/trunk/licenses >> - flink: https://github.com/apache/flink >> >> Each of the repo have a licenses folder that contains licenses, and things >> points to them. >> >> I am not a lawyer, but the case for ps-lite seems can be resolved as long >> as we can confirm these files follows Apache-2.0, as >> https://www.apache.org/licenses/LICENSE-2.0 only requires us to >> redistribute >> the license and anything in the NOTICE, but we do not have the obligation >> to list all the copyright messages in the source content. >> >> TQ >> >> On Fri, Jan 17, 2020 at 11:10 AM Yuan Tang >> wrote: >> >> > +1 >> > >> > On Fri, Jan 17, 2020 at 1:59 PM Chris Olivier >> > wrote: >> > >> > > +1 >> > > >> > > On Fri, Jan 17, 2020 at 10:19 AM Lausen, Leonard >> > > > > > >> > > wrote: >> > > >> > > > Dear MXNet community, >> > > > >> > > > as per recent mail on gene...@incubator.apache.org [1] there are a >> > > number >> > > > of >> > > > licensing issues in MXNet 1.6rc1. Based on anecdotal evidence I >> believe >> > > > there >> > > > has been no release so far without any licensing issues, which is a >> > > > blocker to >> > > > MXNet graduating from it's incubating status. One contributing factor >> > is >> > > > that we >> > > > bundle 3rdparty source code in our releases [2]. >> > > > >> > > > One key factor is that 3rdparty projects don't always enforce >> licensing >> > > > best >> > > > practice in the way we do. For example, 3rdparty/ps-lite doesn't >> > enforce >> > > > license >> > > > headers in the source files and there has been confusion about the >> > > license >> > > > of >> > > > recent contributions by ByteDance (See [1]). >> > > > >> > > > To avoid such licensing issues in MXNet releases a simple solution is >> > to >> > > > stop >> > > > distributing the 3rdparty code in our source releases. Instead, we >> can >> > > > adapt our >> > > > buildsystem to download 3rdparty code as part of the build >> > configuration >> > > > process. CMake makes this very easy with the FetchContent module [3]. >> > > > >> > > > For development purpose involving changes to the 3rdparty source or >> > build >> > > > systems that can't access the internet, there are easy means for >> > > > specifying the >> > > > location of local sources (instead of downloading), via the >> > > > FETCHCONTENT_SOURCE_DIR_ variable [4]. >> > > > >> > > > Would there be any concerns about such approach? Obviously it can >> only >> > be >> > > > fully >> > > > implemented as soon as the CMake build system is feature complete and >> > the >> > > > Makefile build can be dropped. (Note that the Makefile build is being >> > > > deprecated >> > > > and removed as part of MXNet 2 roadmap [5]) >> > > > >> > > > Best regards >> > > > Leonard >> > > > >>
Re: Stop redistributing source code of 3rdparty dependencies to avoid licensing issues
I agree with Tianqi. We may change our build system, but this won't free us from the necessity to validate the licenses of our dependencies. The question at this point is whether we are allowed to differentiate between our main-source and hold it to the strict standards while treating the third party folder as dependency, where we only have to verify that the projects are licensed with an Apache compatible license. At the moment, the project already treats them different: our license checks exclude third party. I think this is where the disparity is coming from. I'd recommend we discuss with Apache how we can handle this situation: package third party code for user convenience while limiting responsibility. In the end, we still have to ensure that everything is licensed properly, so maybe we should try to align both processes to match the real world instead of changing the real world to match the process. -Marco Tianqi Chen schrieb am Fr., 17. Jan. 2020, 20:44: > I don't have an opinion, but would like to list pros and cons of doing so. > > The pro of doing so is that it indeed simplifies the release process, as > these additional dependencies becomes category-B level dependencies as in > https://www.apache.org/legal/resolved.html > > The con of doing so is that it brings additional burden to the users of the > software to check the license of these dependencies, in some sense, > including these information in the > license actually gives an extra level of transparency. > > The copyright message in some of the dependencies are a bit unfortunate, > one potential way to run the check is to write a python script to go > through the files and detect the line Copyright and cross match and add > them. > > Note that good models to follow are > - hadoop: https://github.com/apache/hadoop/tree/trunk/licenses > - flink: https://github.com/apache/flink > > Each of the repo have a licenses folder that contains licenses, and things > points to them. > > I am not a lawyer, but the case for ps-lite seems can be resolved as long > as we can confirm these files follows Apache-2.0, as > https://www.apache.org/licenses/LICENSE-2.0 only requires us to > redistribute > the license and anything in the NOTICE, but we do not have the obligation > to list all the copyright messages in the source content. > > TQ > > On Fri, Jan 17, 2020 at 11:10 AM Yuan Tang > wrote: > > > +1 > > > > On Fri, Jan 17, 2020 at 1:59 PM Chris Olivier > > wrote: > > > > > +1 > > > > > > On Fri, Jan 17, 2020 at 10:19 AM Lausen, Leonard > > > > > > > > wrote: > > > > > > > Dear MXNet community, > > > > > > > > as per recent mail on gene...@incubator.apache.org [1] there are a > > > number > > > > of > > > > licensing issues in MXNet 1.6rc1. Based on anecdotal evidence I > believe > > > > there > > > > has been no release so far without any licensing issues, which is a > > > > blocker to > > > > MXNet graduating from it's incubating status. One contributing factor > > is > > > > that we > > > > bundle 3rdparty source code in our releases [2]. > > > > > > > > One key factor is that 3rdparty projects don't always enforce > licensing > > > > best > > > > practice in the way we do. For example, 3rdparty/ps-lite doesn't > > enforce > > > > license > > > > headers in the source files and there has been confusion about the > > > license > > > > of > > > > recent contributions by ByteDance (See [1]). > > > > > > > > To avoid such licensing issues in MXNet releases a simple solution is > > to > > > > stop > > > > distributing the 3rdparty code in our source releases. Instead, we > can > > > > adapt our > > > > buildsystem to download 3rdparty code as part of the build > > configuration > > > > process. CMake makes this very easy with the FetchContent module [3]. > > > > > > > > For development purpose involving changes to the 3rdparty source or > > build > > > > systems that can't access the internet, there are easy means for > > > > specifying the > > > > location of local sources (instead of downloading), via the > > > > FETCHCONTENT_SOURCE_DIR_ variable [4]. > > > > > > > > Would there be any concerns about such approach? Obviously it can > only > > be > > > > fully > > > > implemented as soon as the CMake build system is feature complete and > > the > > > > Makefile build can be dropped. (Note that the Makefile build is being > > > > deprecated > > > > and removed as part of MXNet 2 roadmap [5]) > > > > > > > > Best regards > > > > Leonard > > > > > > > > [1]: > > > > > > > > > > > > > > https://lists.apache.org/thread.html/rb83ff64bdac464df2f0cf2fe8fb4c6b9d3b8fa62b645763dc606045f%40%3Cgeneral.incubator.apache.org%3E > > > > [2]: See the .tar.gz files at > > > > https://incubator.apache.org/clutch/mxnet.html > > > > [3]: https://cmake.org/cmake/help/latest/module/FetchContent.html > > > > [4]: https://cmake.org/pipermail/cmake/2019-June/069709.html > > > > [5]: https://github.com/apache/incubator-mxnet/issues/16167 > > > > > > > > > >
Re: Stop redistributing source code of 3rdparty dependencies to avoid licensing issues
I don't have an opinion, but would like to list pros and cons of doing so. The pro of doing so is that it indeed simplifies the release process, as these additional dependencies becomes category-B level dependencies as in https://www.apache.org/legal/resolved.html The con of doing so is that it brings additional burden to the users of the software to check the license of these dependencies, in some sense, including these information in the license actually gives an extra level of transparency. The copyright message in some of the dependencies are a bit unfortunate, one potential way to run the check is to write a python script to go through the files and detect the line Copyright and cross match and add them. Note that good models to follow are - hadoop: https://github.com/apache/hadoop/tree/trunk/licenses - flink: https://github.com/apache/flink Each of the repo have a licenses folder that contains licenses, and things points to them. I am not a lawyer, but the case for ps-lite seems can be resolved as long as we can confirm these files follows Apache-2.0, as https://www.apache.org/licenses/LICENSE-2.0 only requires us to redistribute the license and anything in the NOTICE, but we do not have the obligation to list all the copyright messages in the source content. TQ On Fri, Jan 17, 2020 at 11:10 AM Yuan Tang wrote: > +1 > > On Fri, Jan 17, 2020 at 1:59 PM Chris Olivier > wrote: > > > +1 > > > > On Fri, Jan 17, 2020 at 10:19 AM Lausen, Leonard > > > > > wrote: > > > > > Dear MXNet community, > > > > > > as per recent mail on gene...@incubator.apache.org [1] there are a > > number > > > of > > > licensing issues in MXNet 1.6rc1. Based on anecdotal evidence I believe > > > there > > > has been no release so far without any licensing issues, which is a > > > blocker to > > > MXNet graduating from it's incubating status. One contributing factor > is > > > that we > > > bundle 3rdparty source code in our releases [2]. > > > > > > One key factor is that 3rdparty projects don't always enforce licensing > > > best > > > practice in the way we do. For example, 3rdparty/ps-lite doesn't > enforce > > > license > > > headers in the source files and there has been confusion about the > > license > > > of > > > recent contributions by ByteDance (See [1]). > > > > > > To avoid such licensing issues in MXNet releases a simple solution is > to > > > stop > > > distributing the 3rdparty code in our source releases. Instead, we can > > > adapt our > > > buildsystem to download 3rdparty code as part of the build > configuration > > > process. CMake makes this very easy with the FetchContent module [3]. > > > > > > For development purpose involving changes to the 3rdparty source or > build > > > systems that can't access the internet, there are easy means for > > > specifying the > > > location of local sources (instead of downloading), via the > > > FETCHCONTENT_SOURCE_DIR_ variable [4]. > > > > > > Would there be any concerns about such approach? Obviously it can only > be > > > fully > > > implemented as soon as the CMake build system is feature complete and > the > > > Makefile build can be dropped. (Note that the Makefile build is being > > > deprecated > > > and removed as part of MXNet 2 roadmap [5]) > > > > > > Best regards > > > Leonard > > > > > > [1]: > > > > > > > > > https://lists.apache.org/thread.html/rb83ff64bdac464df2f0cf2fe8fb4c6b9d3b8fa62b645763dc606045f%40%3Cgeneral.incubator.apache.org%3E > > > [2]: See the .tar.gz files at > > > https://incubator.apache.org/clutch/mxnet.html > > > [3]: https://cmake.org/cmake/help/latest/module/FetchContent.html > > > [4]: https://cmake.org/pipermail/cmake/2019-June/069709.html > > > [5]: https://github.com/apache/incubator-mxnet/issues/16167 > > > > > >
Re: Stop redistributing source code of 3rdparty dependencies to avoid licensing issues
+1 On Fri, Jan 17, 2020 at 1:59 PM Chris Olivier wrote: > +1 > > On Fri, Jan 17, 2020 at 10:19 AM Lausen, Leonard > > wrote: > > > Dear MXNet community, > > > > as per recent mail on gene...@incubator.apache.org [1] there are a > number > > of > > licensing issues in MXNet 1.6rc1. Based on anecdotal evidence I believe > > there > > has been no release so far without any licensing issues, which is a > > blocker to > > MXNet graduating from it's incubating status. One contributing factor is > > that we > > bundle 3rdparty source code in our releases [2]. > > > > One key factor is that 3rdparty projects don't always enforce licensing > > best > > practice in the way we do. For example, 3rdparty/ps-lite doesn't enforce > > license > > headers in the source files and there has been confusion about the > license > > of > > recent contributions by ByteDance (See [1]). > > > > To avoid such licensing issues in MXNet releases a simple solution is to > > stop > > distributing the 3rdparty code in our source releases. Instead, we can > > adapt our > > buildsystem to download 3rdparty code as part of the build configuration > > process. CMake makes this very easy with the FetchContent module [3]. > > > > For development purpose involving changes to the 3rdparty source or build > > systems that can't access the internet, there are easy means for > > specifying the > > location of local sources (instead of downloading), via the > > FETCHCONTENT_SOURCE_DIR_ variable [4]. > > > > Would there be any concerns about such approach? Obviously it can only be > > fully > > implemented as soon as the CMake build system is feature complete and the > > Makefile build can be dropped. (Note that the Makefile build is being > > deprecated > > and removed as part of MXNet 2 roadmap [5]) > > > > Best regards > > Leonard > > > > [1]: > > > > > https://lists.apache.org/thread.html/rb83ff64bdac464df2f0cf2fe8fb4c6b9d3b8fa62b645763dc606045f%40%3Cgeneral.incubator.apache.org%3E > > [2]: See the .tar.gz files at > > https://incubator.apache.org/clutch/mxnet.html > > [3]: https://cmake.org/cmake/help/latest/module/FetchContent.html > > [4]: https://cmake.org/pipermail/cmake/2019-June/069709.html > > [5]: https://github.com/apache/incubator-mxnet/issues/16167 > > >
Re: Stop redistributing source code of 3rdparty dependencies to avoid licensing issues
+1 On Fri, Jan 17, 2020 at 10:19 AM Lausen, Leonard wrote: > Dear MXNet community, > > as per recent mail on gene...@incubator.apache.org [1] there are a number > of > licensing issues in MXNet 1.6rc1. Based on anecdotal evidence I believe > there > has been no release so far without any licensing issues, which is a > blocker to > MXNet graduating from it's incubating status. One contributing factor is > that we > bundle 3rdparty source code in our releases [2]. > > One key factor is that 3rdparty projects don't always enforce licensing > best > practice in the way we do. For example, 3rdparty/ps-lite doesn't enforce > license > headers in the source files and there has been confusion about the license > of > recent contributions by ByteDance (See [1]). > > To avoid such licensing issues in MXNet releases a simple solution is to > stop > distributing the 3rdparty code in our source releases. Instead, we can > adapt our > buildsystem to download 3rdparty code as part of the build configuration > process. CMake makes this very easy with the FetchContent module [3]. > > For development purpose involving changes to the 3rdparty source or build > systems that can't access the internet, there are easy means for > specifying the > location of local sources (instead of downloading), via the > FETCHCONTENT_SOURCE_DIR_ variable [4]. > > Would there be any concerns about such approach? Obviously it can only be > fully > implemented as soon as the CMake build system is feature complete and the > Makefile build can be dropped. (Note that the Makefile build is being > deprecated > and removed as part of MXNet 2 roadmap [5]) > > Best regards > Leonard > > [1]: > > https://lists.apache.org/thread.html/rb83ff64bdac464df2f0cf2fe8fb4c6b9d3b8fa62b645763dc606045f%40%3Cgeneral.incubator.apache.org%3E > [2]: See the .tar.gz files at > https://incubator.apache.org/clutch/mxnet.html > [3]: https://cmake.org/cmake/help/latest/module/FetchContent.html > [4]: https://cmake.org/pipermail/cmake/2019-June/069709.html > [5]: https://github.com/apache/incubator-mxnet/issues/16167 >
Stop redistributing source code of 3rdparty dependencies to avoid licensing issues
Dear MXNet community, as per recent mail on gene...@incubator.apache.org [1] there are a number of licensing issues in MXNet 1.6rc1. Based on anecdotal evidence I believe there has been no release so far without any licensing issues, which is a blocker to MXNet graduating from it's incubating status. One contributing factor is that we bundle 3rdparty source code in our releases [2]. One key factor is that 3rdparty projects don't always enforce licensing best practice in the way we do. For example, 3rdparty/ps-lite doesn't enforce license headers in the source files and there has been confusion about the license of recent contributions by ByteDance (See [1]). To avoid such licensing issues in MXNet releases a simple solution is to stop distributing the 3rdparty code in our source releases. Instead, we can adapt our buildsystem to download 3rdparty code as part of the build configuration process. CMake makes this very easy with the FetchContent module [3]. For development purpose involving changes to the 3rdparty source or build systems that can't access the internet, there are easy means for specifying the location of local sources (instead of downloading), via the FETCHCONTENT_SOURCE_DIR_ variable [4]. Would there be any concerns about such approach? Obviously it can only be fully implemented as soon as the CMake build system is feature complete and the Makefile build can be dropped. (Note that the Makefile build is being deprecated and removed as part of MXNet 2 roadmap [5]) Best regards Leonard [1]: https://lists.apache.org/thread.html/rb83ff64bdac464df2f0cf2fe8fb4c6b9d3b8fa62b645763dc606045f%40%3Cgeneral.incubator.apache.org%3E [2]: See the .tar.gz files at https://incubator.apache.org/clutch/mxnet.html [3]: https://cmake.org/cmake/help/latest/module/FetchContent.html [4]: https://cmake.org/pipermail/cmake/2019-June/069709.html [5]: https://github.com/apache/incubator-mxnet/issues/16167