Re: LICENSE and NOTICE file content
Thanks Justin. Can you submit a pull request? On Thu, Jun 21, 2018 at 8:10 PM Justin Mclean wrote: > Hi, > > We’ve recently had a number of incubating projects copy your LICENSE and > NOTICE files as they see Spark as a popular project and they are a little > sad when the IPMC votes -1 on their releases. > > Now I'm not on your PMC, don’t know your projects history and there may be > valid reasons for the current LICENSE and NOTICE contents so take this as > some friendly advice, you can choose to ignore it or not act on it. Looking > at your latest source release (2.3.1), I can see there seems too much > information in LICENSE and especially NOTICE for a source release. It may > be that the LICENE and NOTICE is intended for the binary release? [1] But > even if that is teh case it also seems to be missing a couple of licenses > for bundled software. > > But in general my alarm bells start ringing because: > - Category B licenses are listed (which shouldn't be in a source release) > - License information is listed in NOTICE when it should be in LICENSE > - Dependancies are listed rather than what is actually bundled > > Taking a look at the release I can see this 3rd party code bundled: > > MIT licensed (some is dual licensed): > dagre-d3 > datatables > jquery cookies > SortTable > Modernizr > matchMedia polyfill* > respond* > dataTables bootstrap* > jQuery > jQuery datatables* > grap lib-dot > jquery block UI > anchorJS > jsonFormatter > > Apache licensed: > vis.js* > bootstrap* > bootstrap-tooltip* > toposort.py* > TimSort* > LimitedInputStream.java* > > BSD licensed: > d3 > cloudpickle > join* > > Python licensed > heapq3 > > CC0 licensed: > ./data/mllib/images/kittens/29.5.a_b_EGDP022204.jpg* > > * Are currently missing from license > > So that would end up with a number of licenses in LICENSE but nothing > added to a boiler plate NOTICE file. The ALv2 licensed items don’t have > NOTICE files so there no impact there. I could of course have missed > something and could be wrong for a number of reasons but I cannot see how > the above makes the NOTICE file 667 lines long :-) > > I also noticed some compiled code in the source release which probably > shouldn’t be there. [2] > spark-2.3.1/core/src/test/resources/TestUDTF.jar > spark-2.3.1/sql/hive/src/test/resources/SPARK-21101-1.0.jar > spark-2.3.1/sql/hive/src/test/resources/TestUDTF.jar > spark-2.3.1/sql/hive/src/test/resources/hive-contrib-0.13.1.jar > > spark-2.3.1/sql/hive/src/test/resources/hive-hcatalog-core-0.13.1.jar > spark-2.3.1/sql/hive/src/test/resources/data/files/TestSerDe.jar > > spark-2.3.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.10.jar > > spark-2.3.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.11.jar > spark-2.3.1/sql/hive-thriftserver/src/test/resources/TestUDTF.jar > > Thanks, > Justin > > PS please cc me on replies as I’m not subscribed to your mailing list > > 1. http://www.apache.org/dev/licensing-howto.html#binary > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Re: LICENSE and NOTICE file content
Hi, Here you go [1]. That is however only for the source, re the connivance binary (which I’ve not checked) the LICENSE and NOTICE is very likely to be different. It turns out the Android project does have a NOTICE file and that had an effect on the spark one. Thanks, Justin 1. https://github.com/apache/spark/pull/21610 - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: LICENSE and NOTICE file content
Hi, The PR was just for the LICENSE and NOTICE you still may want to look at the jar issue. Thanks, Justin - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: LICENSE and NOTICE file content
On Thu, Jun 21, 2018 at 10:10 PM Justin Mclean wrote: > Now I'm not on your PMC, don’t know your projects history and there may be > valid reasons for the current LICENSE and NOTICE contents so take this as > some friendly advice, you can choose to ignore it or not act on it. Looking > at your latest source release (2.3.1), I can see there seems too much > information in LICENSE and especially NOTICE for a source release. It may > be that the LICENE and NOTICE is intended for the binary release? [1] But > even if that is teh case it also seems to be missing a couple of licenses > for bundled software. > Yes, there's just one set, and it's really for the binary distribution. I don't think this is technically aligning with policy to use as the LICENSE and NOTICE for the source distro, even if it's not wrong from a license standpoint (i.e. it's not great to say source distro includes foo when it doesn't but not illegal). Let me take that point to your PR to see if there's a simple way to get that one right at last. > > But in general my alarm bells start ringing because: > - Category B licenses are listed (which shouldn't be in a source release) > I think this is an artifact of the above. I'm not aware of Cat B source in Spark but it's possible it slipped in. Point out where you see it if so. > - License information is listed in NOTICE when it should be in LICENSE > While I think I got this right a long time ago, a) things can change, and b) might have missed something. What in particular? (can reply on the PR) > - Dependancies are listed rather than what is actually bundled > Same as above I think; this is needed for the binary release. > > > * Are currently missing from license > All possibly missed, or added by those who didn't understand the licensing implication. I'll look at the PR. > > I also noticed some compiled code in the source release which probably > shouldn’t be there. [2] > spark-2.3.1/core/src/test/resources/TestUDTF.jar > spark-2.3.1/sql/hive/src/test/resources/SPARK-21101-1.0.jar > spark-2.3.1/sql/hive/src/test/resources/TestUDTF.jar > spark-2.3.1/sql/hive/src/test/resources/hive-contrib-0.13.1.jar > > spark-2.3.1/sql/hive/src/test/resources/hive-hcatalog-core-0.13.1.jar > spark-2.3.1/sql/hive/src/test/resources/data/files/TestSerDe.jar > > spark-2.3.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.10.jar > > spark-2.3.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.11.jar > spark-2.3.1/sql/hive-thriftserver/src/test/resources/TestUDTF.jar > These should be in the source release. They're not project code per se but files that test JAR handling.
Re: LICENSE and NOTICE file content
Hi, > Yes, there's just one set, and it's really for the binary distribution. See [1] it’s a good idea to have a different LICENSE and NOTICE for source and binary (and lots of other projects do this). > - License information is listed in NOTICE when it should be in LICENSE > > While I think I got this right a long time ago, a) things can change, and b) > might have missed something. What in particular? (can reply on the PR) The CDDL, CPL, MPL license lists and ALv2 headers at bottom. > - Dependancies are listed rather than what is actually bundled > > Same as above I think; this is needed for the binary release. I would find it surprising that (for instance) JUnit is bundled in the binary release but I’ve not looked so it could be correct. > I also noticed some compiled code in the source release which probably > shouldn’t be there. [2] > spark-2.3.1/core/src/test/resources/TestUDTF.jar > spark-2.3.1/sql/hive/src/test/resources/SPARK-21101-1.0.jar > spark-2.3.1/sql/hive/src/test/resources/TestUDTF.jar > spark-2.3.1/sql/hive/src/test/resources/hive-contrib-0.13.1.jar > spark-2.3.1/sql/hive/src/test/resources/hive-hcatalog-core-0.13.1.jar > spark-2.3.1/sql/hive/src/test/resources/data/files/TestSerDe.jar > > spark-2.3.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.10.jar > > spark-2.3.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.11.jar > spark-2.3.1/sql/hive-thriftserver/src/test/resources/TestUDTF.jar > > These should be in the source release. They're not project code per se but > files that test JAR handling. IMO they shouldn’t as there no exception for test code vs project code. If your release contains compiled code then it fails the OSI definition of open source for starters [2]. It also make it rather hard to correctly review a release. This has come up a few times on various lists and every time I seen the same answer i.e it’s not allowed. I’d suggest you ask on legal discuss for advice re this. Thanks, Justin 1. http://www.apache.org/dev/licensing-howto.html#binary 2. https://opensource.org/osd
Re: LICENSE and NOTICE file content
On Sat, Jun 23, 2018 at 4:47 AM Justin Mclean wrote: > See [1] it’s a good idea to have a different LICENSE and NOTICE for source > and binary (and lots of other projects do this). > Agree, this just never happened after I got the initial big overhaul of the LICENSE/NOTICE in place that got things to "not technically violating licenses". I'll take on trying to update them accordingly. > > While I think I got this right a long time ago, a) things can change, and > b) might have missed something. What in particular? (can reply on the PR) > > > The CDDL, CPL, MPL license lists and ALv2 headers at bottom. > CDDL, CPL and MPL are Cat B (looking at http://www.apache.org/legal/resolved.html#category-b here). The reciprocity requires notice, and so I would think NOTICE is the right place? The listing is to comply with this guideline: "Please include the URL to the product's homepage in the prominent label. An appropriate and prominent label is a label the user will read while learning about the distribution - for example in a README. Please also ensure to comply with any attribution/notice requirements in the specific license in question." ALv2 headers don't belong there. Those look like they were added incorrectly a while ago. > I would find it surprising that (for instance) JUnit is bundled in the > binary release but I’ve not looked so it could be correct. > If it's on this list, it's because it turned up one day when I dumped the transitive non-test dependencies of Spark using the Maven plugins. Someone out there may have a (bogus) non-test dependency on Junit. It may no longer be the case, as I do not see any junit classes distributed in the binary distro of Spark now. This should be removed I believe. > IMO they shouldn’t as there no exception for test code vs project code. If > your release contains compiled code then it fails the OSI definition of > open source for starters [2]. It also make it rather hard to correctly > review a release. This has come up a few times on various lists and every > time I seen the same answer i.e it’s not allowed. I’d suggest you ask on > legal discuss for advice re this. > It's not test code; test code would indeed have to be distributed as source as well. They are binary blobs, if you like, needed by test code, that happen to be JARs here and not JPEGs or .docx files or something. These help test handling of JAR files.
Re: LICENSE and NOTICE file content
Hi, > The CDDL, CPL, MPL license lists and ALv2 headers at bottom. > > CDDL, CPL and MPL are Cat B (looking at > http://www.apache.org/legal/resolved.html#category-b here). The reciprocity > requires notice, and so I would think NOTICE is the right place? The listing > is to comply with this guideline: > > "Please include the URL to the product's homepage in the prominent label. An > appropriate and prominent label is a label the user will read while learning > about the distribution - for example in a README. Please also ensure to > comply with any attribution/notice requirements in the specific license in > question.” NOTICE is not the right place for attribution, the license information usually include attribution (via the copyright line) and that info should go in LICENSE. It’s often thought that “attribution notice requirements” need to go in NOTICE but they don’t. If you carefully read [1] you see that if it already covered by LICENSE there’s no need to add it NOTICE. > If it's on this list, it's because it turned up one day when I dumped the > transitive non-test dependencies of Spark using the Maven plugins. Someone > out there may have a (bogus) non-test dependency on Junit. Dependancy are not listed in LICENSE / NOTICE only things that are actually bundled. > It's not test code; test code would indeed have to be distributed as source > as well. They are binary blobs, if you like, needed by test code, that happen > to be JARs here and not JPEGs or .docx files or something. These help test > handling of JAR files. Which IMO is still not allowed in a source release, but as I said it would be best for you to check on legal discuss. Thanks, Justin 1. https://www.apache.org/legal/resolved.html#required-third-party-notices - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: LICENSE and NOTICE file content
On Sat, Jun 23, 2018 at 7:34 PM Justin Mclean wrote: > Hi, > > NOTICE is not the right place for attribution, the license information > usually include attribution (via the copyright line) and that info should > go in LICENSE. It’s often thought that “attribution notice requirements” > need to go in NOTICE but they don’t. If you carefully read [1] you see that > if it already covered by LICENSE there’s no need to add it NOTICE. > Good pointer, it does suggest LICENSE for Cat B notices. In my overhaul I'll just move the lists to LICENSE. > Dependancy are not listed in LICENSE / NOTICE only things that are > actually bundled. > Here, it is the compile/runtime dependencies that are the very things that are bundled, in the assembly in the binary release. That' what drives almost all of the license issue here. This is why I start with this list of transitive dependencies as it will be exactly what's bundled. There is no good reason JUnit should be in that list, but can be if some project did accidentally mark it non-test scope. Right now I do not see it, so either it's no longer needed or was an error in the first place. It'll go away.
Re: LICENSE and NOTICE file content
@legal-discuss, brief recap: In Spark's test source code and release, there are some JAR files which exist to test handling of JAR files. Example: TestSerDe.jar in https://github.com/apache/spark/tree/master/sql/hive/src/test/resources/data/files Justin raises the legitimate question: these don't belong in a source release, do they? My operating theory had been that they are more like binary blobs w.r.t. Spark, like a test JPEG or data file, and are not the compiled version of any test code in Spark. They need to exist in order to run the tests from a source release. So it's not quite a case of shipping compiled Spark code in a source release. I can imagine three opinions: 1) It's OK. 2) It's OK, but you need to include the source code to even those test JAR files somewhere 3) It's not fine, and the toolchain has to separately build these from source first automatically I found https://markmail.org/thread/nf3lsdy5m3c3ovbr on legal-discuss previously, which seems to incline towards 2. I'm also inclined towards 2, as 3 is probably relatively tricky in practice even though that's a nice-to-have. I'd welcome opinions on this one. Sean On Sat, Jun 23, 2018 at 7:34 PM Justin Mclean wrote: > > It's not test code; test code would indeed have to be distributed as > source as well. They are binary blobs, if you like, needed by test code, > that happen to be JARs here and not JPEGs or .docx files or something. > These help test handling of JAR files. > > Which IMO is still not allowed in a source release, but as I said it would > be best for you to check on legal discuss. > >
Re: LICENSE and NOTICE file content
I am not an official answer person, but IMO, the first question is: “Is the source for TestSerDe.jar ‘open source’ under an ALv2-compatible license?”. If “yes”, then supply the source in the source release and not the JAR. One of the reasons for “no compiled code in a source release” is that it is very difficult to verify that compiled code is “correct” and not corrupted, infected with a virus, etc. If “no”, then treat as a 3rd-party dependency. Which may mean you can’t use it or need to treat it as optional, or a runtime dependency. The related question is: How do folks modify this JAR? If it was a JPEG, there are plenty of JPEG modification tools. There really aren’t JAR modification tools that modify JARs internal .class files, you really should use the source files. I am still surprised/puzzled by the answer in the thread you linked to. It still seems in both cases that a “binary” is being supplied for “convenience”. IMO, there should be very few, if any, things in an Apache source repo that are “unmodifiable”. The “workaround” of renaming the .jar or .class files to something else so it isn’t seen as executable code seems like it still doesn’t fully meet the spirit of an open source release, either, but better than shipping executable code in a source package. On the other hand, I would not hold up a release for an issue like this. Fix it in some future release. My 2 cents, -Alex From: Sean Owen Reply-To: "legal-disc...@apache.org" Date: Monday, June 25, 2018 at 7:34 AM To: "legal-disc...@apache.org" Cc: "jus...@classsoftware.com" , "dev@spark.apache.org" Subject: Re: LICENSE and NOTICE file content @legal-discuss, brief recap: In Spark's test source code and release, there are some JAR files which exist to test handling of JAR files. Example: TestSerDe.jar in https://github.com/apache/spark/tree/master/sql/hive/src/test/resources/data/files<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Ftree%2Fmaster%2Fsql%2Fhive%2Fsrc%2Ftest%2Fresources%2Fdata%2Ffiles&data=02%7C01%7Caharui%40adobe.com%7Cebfec420df224fbdd1e908d5daa8c109%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636655340682402475&sdata=ISOCfVRQzS1AtA6gqmZIJ8fVf3UFFL3ZAQSiYM%2FfXi4%3D&reserved=0> Justin raises the legitimate question: these don't belong in a source release, do they? My operating theory had been that they are more like binary blobs w.r.t. Spark, like a test JPEG or data file, and are not the compiled version of any test code in Spark. They need to exist in order to run the tests from a source release. So it's not quite a case of shipping compiled Spark code in a source release. I can imagine three opinions: 1) It's OK. 2) It's OK, but you need to include the source code to even those test JAR files somewhere 3) It's not fine, and the toolchain has to separately build these from source first automatically I found https://markmail.org/thread/nf3lsdy5m3c3ovbr<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmarkmail.org%2Fthread%2Fnf3lsdy5m3c3ovbr&data=02%7C01%7Caharui%40adobe.com%7Cebfec420df224fbdd1e908d5daa8c109%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636655340682402475&sdata=GpOWWI6hVBHT%2FBOetkruO7ZH18%2FPdNpLOGX8spaKnX8%3D&reserved=0> on legal-discuss previously, which seems to incline towards 2. I'm also inclined towards 2, as 3 is probably relatively tricky in practice even though that's a nice-to-have. I'd welcome opinions on this one. Sean On Sat, Jun 23, 2018 at 7:34 PM Justin Mclean mailto:jus...@classsoftware.com>> wrote: > It's not test code; test code would indeed have to be distributed as source > as well. They are binary blobs, if you like, needed by test code, that happen > to be JARs here and not JPEGs or .docx files or something. These help test > handling of JAR files. Which IMO is still not allowed in a source release, but as I said it would be best for you to check on legal discuss.
Re: LICENSE and NOTICE file content
Yes the code in there is ALv2 licensed; appears to be either created for Spark or copied from Hive. Yes, irrespective of the policy issue, it's important to be able to recreate these JARs somehow, and I don't think we have the source in the repo for all of them (at least, the ones that originate from Spark). That much seems like a must-do. After that, seems worth figuring out just how hard it is to build these artifacts from source. If it's easy, great. If not, either the test can be removed or we figure out just how hard a requirement this is. On Mon, Jun 25, 2018 at 11:34 AM Alex Harui wrote: > I am not an official answer person, but IMO, the first question is: “Is > the source for TestSerDe.jar ‘open source’ under an ALv2-compatible > license?”. > > > > If “yes”, then supply the source in the source release and not the JAR. > One of the reasons for “no compiled code in a source release” is that it is > very difficult to verify that compiled code is “correct” and not corrupted, > infected with a virus, etc. > > > > If “no”, then treat as a 3rd-party dependency. Which may mean you can’t > use it or need to treat it as optional, or a runtime dependency. > > > > The related question is: How do folks modify this JAR? If it was a JPEG, > there are plenty of JPEG modification tools. There really aren’t JAR > modification tools that modify JARs internal .class files, you really > should use the source files. I am still surprised/puzzled by the answer in > the thread you linked to. It still seems in both cases that a “binary” is > being supplied for “convenience”. IMO, there should be very few, if any, > things in an Apache source repo that are “unmodifiable”. > > > > The “workaround” of renaming the .jar or .class files to something else so > it isn’t seen as executable code seems like it still doesn’t fully meet the > spirit of an open source release, either, but better than shipping > executable code in a source package. > > > > On the other hand, I would not hold up a release for an issue like this. > Fix it in some future release. > > > > My 2 cents, > > -Alex > > > >