Re: LICENSE and NOTICE file content

2018-06-21 Thread Reynold Xin
Thanks Justin. Can you submit a pull request?

On Thu, Jun 21, 2018 at 8:10 PM Justin Mclean 
wrote:

> Hi,
>
> We’ve recently had a number of incubating projects copy your LICENSE and
> NOTICE files as they see Spark as a popular project and they are a little
> sad when the IPMC votes -1 on their releases.
>
> Now I'm not on your PMC, don’t know your projects history and there may be
> valid reasons for the current LICENSE and NOTICE contents so take this as
> some friendly advice, you can choose to ignore it or not act on it. Looking
> at your latest source release (2.3.1), I can see there seems too much
> information in LICENSE and especially NOTICE for a source release. It may
> be that the LICENE and NOTICE is intended for the binary release? [1] But
> even if that is teh case it also seems to be missing a couple of licenses
> for bundled software.
>
> But in general my alarm bells start ringing because:
> - Category B licenses are listed (which shouldn't be in a source release)
> - License information is listed in NOTICE when it should be in LICENSE
> - Dependancies are listed rather than what is actually bundled
>
> Taking a look at the release I can see this 3rd party code bundled:
>
> MIT licensed (some is dual licensed):
> dagre-d3
> datatables
> jquery cookies
> SortTable
> Modernizr
> matchMedia polyfill*
> respond*
> dataTables bootstrap*
> jQuery
> jQuery datatables*
> grap lib-dot
> jquery block UI
> anchorJS
> jsonFormatter
>
> Apache licensed:
> vis.js*
> bootstrap*
> bootstrap-tooltip*
> toposort.py*
> TimSort*
> LimitedInputStream.java*
>
> BSD licensed:
> d3
> cloudpickle
> join*
>
> Python licensed
> heapq3
>
> CC0 licensed:
> ./data/mllib/images/kittens/29.5.a_b_EGDP022204.jpg*
>
> * Are currently missing from license
>
> So that would end up with a number of licenses in LICENSE but nothing
> added to a boiler plate NOTICE file. The ALv2 licensed items don’t have
> NOTICE files so there no impact there. I could of course have missed
> something and could be wrong for a number of reasons but I cannot see how
> the above makes the NOTICE file 667 lines long :-)
>
> I also noticed some compiled code in the source release which probably
> shouldn’t be there. [2]
> spark-2.3.1/core/src/test/resources/TestUDTF.jar
> spark-2.3.1/sql/hive/src/test/resources/SPARK-21101-1.0.jar
> spark-2.3.1/sql/hive/src/test/resources/TestUDTF.jar
> spark-2.3.1/sql/hive/src/test/resources/hive-contrib-0.13.1.jar
>
> spark-2.3.1/sql/hive/src/test/resources/hive-hcatalog-core-0.13.1.jar
> spark-2.3.1/sql/hive/src/test/resources/data/files/TestSerDe.jar
>
> spark-2.3.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.10.jar
>
> spark-2.3.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.11.jar
> spark-2.3.1/sql/hive-thriftserver/src/test/resources/TestUDTF.jar
>
> Thanks,
> Justin
>
> PS please cc me on replies as I’m not subscribed to your mailing list
>
> 1. http://www.apache.org/dev/licensing-howto.html#binary
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: LICENSE and NOTICE file content

2018-06-21 Thread Justin Mclean
Hi,

Here you go [1]. That is however only for the source, re the connivance binary 
(which I’ve not checked) the LICENSE and NOTICE is very likely to be different.

It turns out the Android project does have a NOTICE file and that had an effect 
on the spark one.

Thanks,
Justin

1. https://github.com/apache/spark/pull/21610
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: LICENSE and NOTICE file content

2018-06-21 Thread Justin Mclean
Hi,

The PR was just for the LICENSE and NOTICE you still may want to look at the 
jar issue.

Thanks,
Justin
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: LICENSE and NOTICE file content

2018-06-23 Thread Sean Owen
On Thu, Jun 21, 2018 at 10:10 PM Justin Mclean 
wrote:

> Now I'm not on your PMC, don’t know your projects history and there may be
> valid reasons for the current LICENSE and NOTICE contents so take this as
> some friendly advice, you can choose to ignore it or not act on it. Looking
> at your latest source release (2.3.1), I can see there seems too much
> information in LICENSE and especially NOTICE for a source release. It may
> be that the LICENE and NOTICE is intended for the binary release? [1] But
> even if that is teh case it also seems to be missing a couple of licenses
> for bundled software.
>

Yes, there's just one set, and it's really for the binary distribution. I
don't think this is technically aligning with policy to use as the LICENSE
and NOTICE for the source distro, even if it's not wrong from a license
standpoint (i.e. it's not great to say source distro includes foo when it
doesn't but not illegal). Let me take that point to your PR to see if
there's a simple way to get that one right at last.


>
> But in general my alarm bells start ringing because:
> - Category B licenses are listed (which shouldn't be in a source release)
>

I think this is an artifact of the above. I'm not aware of Cat B source in
Spark but it's possible it slipped in. Point out where you see it if so.


> - License information is listed in NOTICE when it should be in LICENSE
>

While I think I got this right a long time ago, a) things can change, and
b) might have missed something. What in particular? (can reply on the PR)


> - Dependancies are listed rather than what is actually bundled
>

Same as above I think; this is needed for the binary release.


>
>
> * Are currently missing from license
>

All possibly missed, or added by those who didn't understand the licensing
implication. I'll look at the PR.


>
> I also noticed some compiled code in the source release which probably
> shouldn’t be there. [2]
> spark-2.3.1/core/src/test/resources/TestUDTF.jar
> spark-2.3.1/sql/hive/src/test/resources/SPARK-21101-1.0.jar
> spark-2.3.1/sql/hive/src/test/resources/TestUDTF.jar
> spark-2.3.1/sql/hive/src/test/resources/hive-contrib-0.13.1.jar
>
> spark-2.3.1/sql/hive/src/test/resources/hive-hcatalog-core-0.13.1.jar
> spark-2.3.1/sql/hive/src/test/resources/data/files/TestSerDe.jar
>
> spark-2.3.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.10.jar
>
> spark-2.3.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.11.jar
> spark-2.3.1/sql/hive-thriftserver/src/test/resources/TestUDTF.jar
>

These should be in the source release. They're not project code per se but
files that test JAR handling.


Re: LICENSE and NOTICE file content

2018-06-23 Thread Justin Mclean
Hi,

> Yes, there's just one set, and it's really for the binary distribution.

See [1] it’s a good idea to have a different LICENSE and NOTICE for source and 
binary (and lots of other projects do this).

> - License information is listed in NOTICE when it should be in LICENSE
> 
> While I think I got this right a long time ago, a) things can change, and b) 
> might have missed something. What in particular? (can reply on the PR)

The CDDL, CPL, MPL  license lists and ALv2 headers at bottom.

> - Dependancies are listed rather than what is actually bundled
> 
> Same as above I think; this is needed for the binary release.

I would find it surprising that (for instance) JUnit is bundled in the binary 
release but I’ve not looked so it could be correct.

> I also noticed some compiled code in the source release which probably 
> shouldn’t be there. [2]
> spark-2.3.1/core/src/test/resources/TestUDTF.jar
> spark-2.3.1/sql/hive/src/test/resources/SPARK-21101-1.0.jar
> spark-2.3.1/sql/hive/src/test/resources/TestUDTF.jar
> spark-2.3.1/sql/hive/src/test/resources/hive-contrib-0.13.1.jar
> spark-2.3.1/sql/hive/src/test/resources/hive-hcatalog-core-0.13.1.jar
> spark-2.3.1/sql/hive/src/test/resources/data/files/TestSerDe.jar
> 
> spark-2.3.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.10.jar
> 
> spark-2.3.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.11.jar
> spark-2.3.1/sql/hive-thriftserver/src/test/resources/TestUDTF.jar
> 
> These should be in the source release. They're not project code per se but 
> files that test JAR handling.

IMO they shouldn’t as there no exception for test code vs project code. If your 
release contains compiled code then it fails the OSI definition of open source 
for starters [2]. It also make it rather hard to correctly review a release. 
This has come up a few times on various lists and every time I seen the same 
answer i.e it’s not allowed. I’d suggest you ask on legal discuss for advice re 
this.

Thanks,
Justin

1. http://www.apache.org/dev/licensing-howto.html#binary
2. https://opensource.org/osd

Re: LICENSE and NOTICE file content

2018-06-23 Thread Sean Owen
On Sat, Jun 23, 2018 at 4:47 AM Justin Mclean 
wrote:

> See [1] it’s a good idea to have a different LICENSE and NOTICE for source
> and binary (and lots of other projects do this).
>

Agree, this just never happened after I got the initial big overhaul of the
LICENSE/NOTICE in place that got things to "not technically violating
licenses". I'll take on trying to update them accordingly.


>
> While I think I got this right a long time ago, a) things can change, and
> b) might have missed something. What in particular? (can reply on the PR)
>
>
> The CDDL, CPL, MPL  license lists and ALv2 headers at bottom.
>

CDDL, CPL and MPL are Cat B (looking at
http://www.apache.org/legal/resolved.html#category-b here). The reciprocity
requires notice, and so I would think NOTICE is the right place? The
listing is to comply with this guideline:

"Please include the URL to the product's homepage in the prominent label.
An appropriate and prominent label is a label the user will read while
learning about the distribution - for example in a README. Please also
ensure to comply with any attribution/notice requirements in the specific
license in question."

ALv2 headers don't belong there. Those look like they were added
incorrectly a while ago.



> I would find it surprising that (for instance) JUnit is bundled in the
> binary release but I’ve not looked so it could be correct.
>

If it's on this list, it's because it turned up one day when I dumped the
transitive non-test dependencies of Spark using the Maven plugins. Someone
out there may have a (bogus) non-test dependency on Junit. It may no longer
be the case, as I do not see any junit classes distributed in the binary
distro of Spark now. This should be removed I believe.



> IMO they shouldn’t as there no exception for test code vs project code. If
> your release contains compiled code then it fails the OSI definition of
> open source for starters [2]. It also make it rather hard to correctly
> review a release. This has come up a few times on various lists and every
> time I seen the same answer i.e it’s not allowed. I’d suggest you ask on
> legal discuss for advice re this.
>

It's not test code; test code would indeed have to be distributed as source
as well. They are binary blobs, if you like, needed by test code, that
happen to be JARs here and not JPEGs or .docx files or something. These
help test handling of JAR files.


Re: LICENSE and NOTICE file content

2018-06-23 Thread Justin Mclean
Hi,

> The CDDL, CPL, MPL  license lists and ALv2 headers at bottom.
> 
> CDDL, CPL and MPL are Cat B (looking at 
> http://www.apache.org/legal/resolved.html#category-b here). The reciprocity 
> requires notice, and so I would think NOTICE is the right place? The listing 
> is to comply with this guideline:
> 
> "Please include the URL to the product's homepage in the prominent label. An 
> appropriate and prominent label is a label the user will read while learning 
> about the distribution - for example in a README. Please also ensure to 
> comply with any attribution/notice requirements in the specific license in 
> question.”

NOTICE is not the right place for attribution, the license information usually 
include attribution (via the copyright line) and that info should go in 
LICENSE. It’s often thought that “attribution notice requirements” need to go 
in NOTICE but they don’t. If you carefully read [1] you see that if it already 
covered by LICENSE there’s no need to add it NOTICE.

> If it's on this list, it's because it turned up one day when I dumped the 
> transitive non-test dependencies of Spark using the Maven plugins. Someone 
> out there may have a (bogus) non-test dependency on Junit.

Dependancy are not listed in LICENSE / NOTICE only things that are actually 
bundled.

> It's not test code; test code would indeed have to be distributed as source 
> as well. They are binary blobs, if you like, needed by test code, that happen 
> to be JARs here and not JPEGs or .docx files or something. These help test 
> handling of JAR files.

Which IMO is still not allowed in a source release, but as I said it would be 
best for you to check on legal discuss.

Thanks,
Justin

1. https://www.apache.org/legal/resolved.html#required-third-party-notices


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: LICENSE and NOTICE file content

2018-06-23 Thread Sean Owen
On Sat, Jun 23, 2018 at 7:34 PM Justin Mclean 
wrote:

> Hi,
>
> NOTICE is not the right place for attribution, the license information
> usually include attribution (via the copyright line) and that info should
> go in LICENSE. It’s often thought that “attribution notice requirements”
> need to go in NOTICE but they don’t. If you carefully read [1] you see that
> if it already covered by LICENSE there’s no need to add it NOTICE.
>

Good pointer, it does suggest LICENSE for Cat B notices. In my overhaul
I'll just move the lists to LICENSE.



> Dependancy are not listed in LICENSE / NOTICE only things that are
> actually bundled.
>

Here, it is the compile/runtime dependencies that are the very things that
are bundled, in the assembly in the binary release. That' what drives
almost all of the license issue here. This is why I start with this list of
transitive dependencies as it will be exactly what's bundled.

There is no good reason JUnit should be in that list, but can be if some
project did accidentally mark it non-test scope. Right now I do not see it,
so either it's no longer needed or was an error in the first place. It'll
go away.


Re: LICENSE and NOTICE file content

2018-06-25 Thread Sean Owen
@legal-discuss, brief recap:

In Spark's test source code and release, there are some JAR files which
exist to test handling of JAR files. Example: TestSerDe.jar in
https://github.com/apache/spark/tree/master/sql/hive/src/test/resources/data/files


Justin raises the legitimate question: these don't belong in a source
release, do they?

My operating theory had been that they are more like binary blobs w.r.t.
Spark, like a test JPEG or data file, and are not the compiled version of
any test code in Spark. They need to exist in order to run the tests from a
source release. So it's not quite a case of shipping compiled Spark code in
a source release.

I can imagine three opinions:

1) It's OK.
2) It's OK, but you need to include the source code to even those test JAR
files somewhere
3) It's not fine, and the toolchain has to separately build these from
source first automatically

I found https://markmail.org/thread/nf3lsdy5m3c3ovbr on legal-discuss
previously, which seems to incline towards 2.

I'm also inclined towards 2, as 3 is probably relatively tricky in practice
even though that's a nice-to-have.

I'd welcome opinions on this one.

Sean


On Sat, Jun 23, 2018 at 7:34 PM Justin Mclean 
wrote:

> > It's not test code; test code would indeed have to be distributed as
> source as well. They are binary blobs, if you like, needed by test code,
> that happen to be JARs here and not JPEGs or .docx files or something.
> These help test handling of JAR files.
>
> Which IMO is still not allowed in a source release, but as I said it would
> be best for you to check on legal discuss.
>
>


Re: LICENSE and NOTICE file content

2018-06-25 Thread Alex Harui
I am not an official answer person, but IMO, the first question is:  “Is the 
source for TestSerDe.jar ‘open source’ under an ALv2-compatible license?”.

If “yes”, then supply the source in the source release and not the JAR.  One of 
the reasons for “no compiled code in a source release” is that it is very 
difficult to verify that compiled code is “correct” and not corrupted, infected 
with a virus, etc.

If “no”, then treat as a 3rd-party dependency.  Which may mean you can’t use it 
or need to treat it as optional, or a runtime dependency.

The related question is:  How do folks modify this JAR?  If it was a JPEG, 
there are plenty of JPEG modification tools.  There really aren’t JAR 
modification tools that modify JARs internal .class files, you really should 
use the source files.  I am still surprised/puzzled by the answer in the thread 
you linked to.  It still seems in both cases that a “binary” is being supplied 
for “convenience”.  IMO, there should be very few, if any, things in an Apache 
source repo that are “unmodifiable”.

The “workaround” of renaming the .jar or .class files to something else so it 
isn’t seen as executable code seems like it still doesn’t fully meet the spirit 
of an open source release, either, but better than shipping executable code in 
a source package.

On the other hand, I would not hold up a release for an issue like this.  Fix 
it in some future release.

My 2 cents,
-Alex

From: Sean Owen 
Reply-To: "legal-disc...@apache.org" 
Date: Monday, June 25, 2018 at 7:34 AM
To: "legal-disc...@apache.org" 
Cc: "jus...@classsoftware.com" , 
"dev@spark.apache.org" 
Subject: Re: LICENSE and NOTICE file content

@legal-discuss, brief recap:

In Spark's test source code and release, there are some JAR files which exist 
to test handling of JAR files. Example: TestSerDe.jar in 
https://github.com/apache/spark/tree/master/sql/hive/src/test/resources/data/files<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Ftree%2Fmaster%2Fsql%2Fhive%2Fsrc%2Ftest%2Fresources%2Fdata%2Ffiles&data=02%7C01%7Caharui%40adobe.com%7Cebfec420df224fbdd1e908d5daa8c109%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636655340682402475&sdata=ISOCfVRQzS1AtA6gqmZIJ8fVf3UFFL3ZAQSiYM%2FfXi4%3D&reserved=0>

Justin raises the legitimate question: these don't belong in a source release, 
do they?

My operating theory had been that they are more like binary blobs w.r.t. Spark, 
like a test JPEG or data file, and are not the compiled version of any test 
code in Spark. They need to exist in order to run the tests from a source 
release. So it's not quite a case of shipping compiled Spark code in a source 
release.

I can imagine three opinions:

1) It's OK.
2) It's OK, but you need to include the source code to even those test JAR 
files somewhere
3) It's not fine, and the toolchain has to separately build these from source 
first automatically

I found 
https://markmail.org/thread/nf3lsdy5m3c3ovbr<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmarkmail.org%2Fthread%2Fnf3lsdy5m3c3ovbr&data=02%7C01%7Caharui%40adobe.com%7Cebfec420df224fbdd1e908d5daa8c109%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636655340682402475&sdata=GpOWWI6hVBHT%2FBOetkruO7ZH18%2FPdNpLOGX8spaKnX8%3D&reserved=0>
 on legal-discuss previously, which seems to incline towards 2.

I'm also inclined towards 2, as 3 is probably relatively tricky in practice 
even though that's a nice-to-have.

I'd welcome opinions on this one.

Sean


On Sat, Jun 23, 2018 at 7:34 PM Justin Mclean 
mailto:jus...@classsoftware.com>> wrote:
> It's not test code; test code would indeed have to be distributed as source 
> as well. They are binary blobs, if you like, needed by test code, that happen 
> to be JARs here and not JPEGs or .docx files or something. These help test 
> handling of JAR files.

Which IMO is still not allowed in a source release, but as I said it would be 
best for you to check on legal discuss.


Re: LICENSE and NOTICE file content

2018-06-25 Thread Sean Owen
Yes the code in there is ALv2 licensed; appears to be either created for
Spark or copied from Hive. Yes, irrespective of the policy issue, it's
important to be able to recreate these JARs somehow, and I don't think we
have the source in the repo for all of them (at least, the ones that
originate from Spark). That much seems like a must-do.

After that, seems worth figuring out just how hard it is to build these
artifacts from source. If it's easy, great. If not, either the test can be
removed or we figure out just how hard a requirement this is.

On Mon, Jun 25, 2018 at 11:34 AM Alex Harui 
wrote:

> I am not an official answer person, but IMO, the first question is:  “Is
> the source for TestSerDe.jar ‘open source’ under an ALv2-compatible
> license?”.
>
>
>
> If “yes”, then supply the source in the source release and not the JAR.
> One of the reasons for “no compiled code in a source release” is that it is
> very difficult to verify that compiled code is “correct” and not corrupted,
> infected with a virus, etc.
>
>
>
> If “no”, then treat as a 3rd-party dependency.  Which may mean you can’t
> use it or need to treat it as optional, or a runtime dependency.
>
>
>
> The related question is:  How do folks modify this JAR?  If it was a JPEG,
> there are plenty of JPEG modification tools.  There really aren’t JAR
> modification tools that modify JARs internal .class files, you really
> should use the source files.  I am still surprised/puzzled by the answer in
> the thread you linked to.  It still seems in both cases that a “binary” is
> being supplied for “convenience”.  IMO, there should be very few, if any,
> things in an Apache source repo that are “unmodifiable”.
>
>
>
> The “workaround” of renaming the .jar or .class files to something else so
> it isn’t seen as executable code seems like it still doesn’t fully meet the
> spirit of an open source release, either, but better than shipping
> executable code in a source package.
>
>
>
> On the other hand, I would not hold up a release for an issue like this.
> Fix it in some future release.
>
>
>
> My 2 cents,
>
> -Alex
>
>
>
>