Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-04-09 Thread Philippe Ombredanne
On Sun, Apr 9, 2017 at 9:20 PM, Luis Villa  wrote:
> What's the "right" level to scan at? Top-level project-declared LICENSE
> file? Or per-file throughout the tree? (Note that often those two measures
> don't agree with each other.)

MO is that the right level is scan at both levels and if needed surface any
inconsistencies or contradictions. Scanning only the simpler top-level
project-declared LICENSE or COPYING file is not enough and too often
incomplete or inaccurate data based on my experience at scale.

That said, I am the maintainer of the open source ScanCode toolkit, a
fresh take to build a better mousetrap for license scanning:

https://github.com/nexB/scancode-toolkit

My goal is simple:
I want the licensing of every open source code to be a problem solved.
Not a question mark. e.g. working towards 100% licensing clarity and
eventually ensure that no piece of existing open source code raises
questions wrt. licensing to a user or aspiring user.

For that I would like to scan it **all**... and setup some community peer
review site so we can help every open source project add, refine or cleanup
any missing, incomplete, inaccurate or contradicting licensing. Or at least
make the data open and available for anyone to query otherwise.

The main drag is as always resource availability (as in both human time,
network , bandwidth and computing power) to fetch and scan everything from
every package managers, forge, Sourceforge, Github, etc which represents
a significant[sic] number of terabytes.
This could become a lesser issue on the fetch side when softwareheritage.org
is fully operational. But still.

If anyone is interested by this, please contact me!
-- 
Cordially
Philippe Ombredanne
___
License-discuss mailing list
License-discuss@opensource.org
https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss


Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-04-09 Thread Luis Villa
On Sun, Apr 9, 2017 at 11:57 AM Philippe Ombredanne 
wrote:

> > On Thu, Apr 6, 2017 at 6:19 PM Philippe Ombredanne  >
> > wrote:
> >>
> >> On Thu, Apr 6, 2017 at 5:21 PM, Luis Villa  wrote:
> >> > On Tue, Jan 10, 2017, 11:07 AM Luis Villa  wrote:
> >> >>
> >> >> Hey, all-
> >> >> I promised some board members a summary of my investigation in
> '12-'13
> >> >> into updating, supplementing, or replacing the "popular licenses"
> list.
> >> >> Here
> >> >> goes.
> >> [...]
> >> > Yet another (inevitably flawed) data set:
> >> > https://libraries.io/licenses
> >>
> >> With the merit that the all the underlying code is FLOSS.
> >>
> >> Another possible source --always biased-- could be Debian's popcon and
> >> some cross ref with debsources.
>
>
> On Fri, Apr 7, 2017 at 11:54 AM, Andrew Nesbitt 
> wrote:
> > "inevitably flawed", would be great to get some feedback on how/why it's
> > flawed so I can improve it?
> >
> > System level package managers are in the pipeline for the end of the
> year,
> > but there are so fewer packages there that I can't see it moving the
> needle
> > much
>
> Andrew: my comment on "inevitably flawed" was to echo Luis point that any
> open source  license popularity contest is likely to be flawed and biased
> one
> way or another regardless of the data set that is considered as a basis.
>

Hi, Andrew-
For some reason your email never made it through to me; just saw Philippe's
response coming through.

I added "inevitably flawed" as a shorthand to fend off the basic critiques
that always accompany mentions of surveys on this list. The primary
critiques are:

   - What's the "right" set of data sources to draw from? By deciding to
   include (or leave out) any particular repo, you inevitably impact license
   popularity, and also inevitably you can't include them all.
   - What's the "right" metric for popularity? projects? files? LOCs? usage
   of projects? For example, if two projects are of the same complexity, but
   one is widely used and the other hardly used at all, should they count the
   same? What if one is very simple, the other very complex?
  - What about unmaintained/old code?
  - What's the "right" level to scan at? Top-level project-declared
   LICENSE file? Or per-file throughout the tree? (Note that often those two
   measures don't agree with each other.)

I feel that there are no right or wrong answers to these questions;
different surveys have different purposes. But others disagree: every time
we discuss this subject here, someone pops up and says "no, this service
does it wrong, they should do X instead". Because no service can please
everyone, I know yours displeases someone :)

[There is also a question as to whether or not proprietary methodologies
should be completely ignored, or taken as another data point with an
appropriate grain of salt. I don't think that question is possible to
answer for OSI yet, but it probably does have right/wrong answers.]

Hope that clarifies where I was coming from- would be happy to chat
whenever if this doesn't.

Luis
-- 

*Luis Villa: Open Law and Strategy *
*+1-415-938-4552*
___
License-discuss mailing list
License-discuss@opensource.org
https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss


Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-04-09 Thread Philippe Ombredanne
On Fri, Apr 7, 2017 at 8:14 PM, Smith, McCoy  wrote:
> But I think that at some point it would be helpful for there to be a
> resource for people to sift through all the licenses on the list to
> understand what they do and don’t do.

You may also consider this https://enterprise.dejacode.com/licenses/
Every OSI licenses (and more) conditions have been carefully tagged as
seen here:
https://enterprise.dejacode.com/licenses/Demo/apache-2.0/#license-conditions
(disclosure: this is a product of my company)
-- 
Cordially
Philippe Ombredanne
___
License-discuss mailing list
License-discuss@opensource.org
https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss


Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-04-09 Thread Philippe Ombredanne
> On Thu, Apr 6, 2017 at 6:19 PM Philippe Ombredanne 
> wrote:
>>
>> On Thu, Apr 6, 2017 at 5:21 PM, Luis Villa  wrote:
>> > On Tue, Jan 10, 2017, 11:07 AM Luis Villa  wrote:
>> >>
>> >> Hey, all-
>> >> I promised some board members a summary of my investigation in '12-'13
>> >> into updating, supplementing, or replacing the "popular licenses" list.
>> >> Here
>> >> goes.
>> [...]
>> > Yet another (inevitably flawed) data set:
>> > https://libraries.io/licenses
>>
>> With the merit that the all the underlying code is FLOSS.
>>
>> Another possible source --always biased-- could be Debian's popcon and
>> some cross ref with debsources.


On Fri, Apr 7, 2017 at 11:54 AM, Andrew Nesbitt  wrote:
> "inevitably flawed", would be great to get some feedback on how/why it's
> flawed so I can improve it?
>
> System level package managers are in the pipeline for the end of the year,
> but there are so fewer packages there that I can't see it moving the needle
> much

Andrew: my comment on "inevitably flawed" was to echo Luis point that any
open source  license popularity contest is likely to be flawed and biased one
way or another regardless of the data set that is considered as a basis.

That was not a reflection on any flaw in libraries.io which rocks!
Accept my apologies if it came across this way

-- 
Cordially
Philippe Ombredanne
___
License-discuss mailing list
License-discuss@opensource.org
https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss


Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-04-07 Thread Smith, McCoy
I don’t think that works as a wizard, and the analysis of licenses on that site 
is pretty high level (i.e., I’m not sure it would tell you, say, the 
differences between the multiple “weak copyleft” licenses on the OSI list so 
that one could decide which one might be best for one’s particular project – 
which is what I think Larry was suggesting might be helpful).

From: License-discuss [mailto:license-discuss-boun...@opensource.org] On Behalf 
Of Christopher Sean Morrison
Sent: Friday, April 07, 2017 11:32 AM
To: license-discuss@opensource.org
Subject: Re: [License-discuss] notes on a systematic approach to "popular" 
licenses


On Apr 7, 2017, at 2:14 PM, Smith, McCoy 
<mccoy.sm...@intel.com<mailto:mccoy.sm...@intel.com>> wrote:

But I think that at some point it would be helpful for there to be a resource 
for people to sift through all the licenses on the list to understand what they 
do and don’t do.

Isn’t that exactly what https://tldrlegal.com does?  They even have the 
OSI-approved ones marked and sorted by popularity (as determined by eyeballs on 
their site):  https://tldrlegal.com/licenses/tags/OSI-Approved

Cheers!
Sean

___
License-discuss mailing list
License-discuss@opensource.org
https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss


Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-04-07 Thread John Cowan
On Fri, Apr 7, 2017 at 2:32 PM, Christopher Sean Morrison 
wrote:

Isn’t that exactly what https://tldrlegal.com does?  They even have the
> OSI-approved ones marked and sorted by popularity (as determined by
> eyeballs on their site):  https://tldrlegal.com/licenses/tags/OSI-Approved
>

For people who like opinionated wizards, there's also mine, currently
hosted at
.  It asks you questions about what
you want your license to do, and then steers you to the 3-clause BSD, the
Apache 2.0, the GPL 2.0, or the LGPL 2.0 licenses.

-- 
John Cowan  http://vrici.lojban.org/~cowanco...@ccil.org
"But I am the real Strider, fortunately," he said, looking down at them
with his face softened by a sudden smile.  "I am Aragorn son of Arathorn,
and if by life or death I can save you, I will."
___
License-discuss mailing list
License-discuss@opensource.org
https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss


Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-04-07 Thread Christopher Sean Morrison

> On Apr 7, 2017, at 2:14 PM, Smith, McCoy  wrote:
> 
> But I think that at some point it would be helpful for there to be a resource 
> for people to sift through all the licenses on the list to understand what 
> they do and don’t do.

Isn’t that exactly what https://tldrlegal.com  does?  
They even have the OSI-approved ones marked and sorted by popularity (as 
determined by eyeballs on their site):  
https://tldrlegal.com/licenses/tags/OSI-Approved 


Cheers!
Sean

___
License-discuss mailing list
License-discuss@opensource.org
https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss


Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-04-07 Thread Smith, McCoy
What Larry is describing is similar to a project that at one point was being 
put together in part by Professor Urban back when she was at USC law:  A 
licensing wizard for use in selecting an open source license from the existing 
OSI list.  That project is described in the licensing proliferation committee 
report that came out about 10 years ago:  
https://opensource.org/proliferation-report

I thought that at some point this project was launched, but it may not have 
ever been.  There may have been some concerns at the time (as they likely would 
be if revived or redone) of to what extent such a tool might be providing legal 
advice.

But I think that at some point it would be helpful for there to be a resource 
for people to sift through all the licenses on the list to understand what they 
do and don’t do.




From: License-discuss [mailto:license-discuss-boun...@opensource.org] On Behalf 
Of Lawrence Rosen
Sent: Thursday, April 06, 2017 9:40 AM
To: license-discuss@opensource.org
Cc: Lawrence Rosen <lro...@rosenlaw.com>
Subject: Re: [License-discuss] notes on a systematic approach to "popular" 
licenses

Richard Fontana wrote:
> Interesting but at first glance the data seems too unreliable to be of any 
> use. I started checking the identified projects under the so-called Clear BSD 
> license (the FSF-free, never-OSI-submitted BSD variant that explicitly 
> excludes patent licenses) and the ones I looked at were all spurious matches.

Luis is noting that the current OSI list of "popular" licenses is unreliable 
also. Let's not do nothing about it.
Popularity is important only for social media starlets.

More important for us would be a list that describes the fundamental areas 
where each license differs from the others. Give licensors a reason to select a 
license, and give licensees a reason to understand its risks and benefits. 
Don't limit those descriptions to 2 sentences or to arbitrary classifications. 
Stating explicitly in this OSD list that certain licenses are "popular" on 
Black Duck or other lists may be helpful but not determinative.

Yes, that license list is now long. If that length problem is the sole reason 
that you list certain licenses first in a shorter "recommended" list, do so 
explicitly but with appropriate caveats not to trust those recommendations.

The alternative to that kind of limited but precise legal analysis is that new 
proposed licenses will be rejected or discussed to death simply because they 
aren't popular. They should only be rejected if (1) they don't contain anything 
legally new (non-proliferation), or (2) they don't satisfy the OSD (not open 
source).

/Larry


From: License-discuss [mailto:license-discuss-boun...@opensource.org] On Behalf 
Of Richard Fontana
Sent: Thursday, April 6, 2017 8:51 AM
To: license-discuss@opensource.org<mailto:license-discuss@opensource.org>
Subject: Re: [License-discuss] notes on a systematic approach to "popular" 
licenses

Interesting but at first glance the data seems too unreliable to be of any use. 
I started checking the identified projects under the so-called Clear BSD 
license (the FSF-free, never-OSI-submitted BSD variant that explicitly excludes 
patent licenses) and the ones I looked at were all spurious matches.

Richard



On Thu, Apr 6, 2017, at 11:21 AM, Luis Villa wrote:
Yet another (inevitably flawed) data set:
https://libraries.io/licenses

On Tue, Jan 10, 2017, 11:07 AM Luis Villa <l...@lu.is<mailto:l...@lu.is>> wrote:
[Apparently I got unsubscribed at some point, so if you've sent an email here 
in recent months seeking my feedback, please resend.]

Hey, all-
I promised some board members a summary of my investigation in '12-'13 into 
updating, supplementing, or replacing the "popular licenses" list. Here goes.

tl;dr
I think OSI should have an data-driven short license list with a replicable and 
transparent methodology, supplemented by a new-and-good(?) list that captures 
licenses that aren't yet popular but are high quality and have some substantial 
improvement that advances the goals of OSI.

Purposes of non-comprehensive lists
If you Google "open source licenses", OSI pages are the top two hits. 
Historically, those pages were not very helpful unless you already knew 
something about open source. Having a shorter "top" list can help make the OSI 
website more useful to newcomers by suggesting a starting place for their 
exploration and education about open source.

In addition, third parties often look to OSI as a trusted (neutral?) source for 
"top" or "best" licenses that they can incorporate into products. (The full 
OSI-approved list is not practical for many applications.) For example, if OSI 
had an up-to-date short list, it might have been the basis for GitHub's license 
chooser.

A list that is purely based on popularity would freeze open source in a 
parti

Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-04-06 Thread Philippe Ombredanne
On Thu, Apr 6, 2017 at 5:21 PM, Luis Villa  wrote:
> On Tue, Jan 10, 2017, 11:07 AM Luis Villa  wrote:
>>
>> Hey, all-
>> I promised some board members a summary of my investigation in '12-'13
>> into updating, supplementing, or replacing the "popular licenses" list. Here
>> goes.
[...]
> Yet another (inevitably flawed) data set:
> https://libraries.io/licenses

With the merit that the all the underlying code is FLOSS.

Another possible source --always biased-- could be Debian's popcon and
some cross ref with debsources.

-- 
Cordially
Philippe Ombredanne
___
License-discuss mailing list
License-discuss@opensource.org
https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss


Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-04-06 Thread Lawrence Rosen
Richard Fontana wrote:

> Interesting but at first glance the data seems too unreliable to be of any 
> use. I started checking the identified projects under the so-called Clear BSD 
> license (the FSF-free, never-OSI-submitted BSD variant that explicitly 
> excludes patent licenses) and the ones I looked at were all spurious matches. 

 

Luis is noting that the current OSI list of "popular" licenses is unreliable 
also. Let's not do nothing about it. 

Popularity is important only for social media starlets.

 

More important for us would be a list that describes the fundamental areas 
where each license differs from the others. Give licensors a reason to select a 
license, and give licensees a reason to understand its risks and benefits. 
Don't limit those descriptions to 2 sentences or to arbitrary classifications. 
Stating explicitly in this OSD list that certain licenses are "popular" on 
Black Duck or other lists may be helpful but not determinative.

 

Yes, that license list is now long. If that length problem is the sole reason 
that you list certain licenses first in a shorter "recommended" list, do so 
explicitly but with appropriate caveats not to trust those recommendations.

 

The alternative to that kind of limited but precise legal analysis is that new 
proposed licenses will be rejected or discussed to death simply because they 
aren't popular. They should only be rejected if (1) they don't contain anything 
legally new (non-proliferation), or (2) they don't satisfy the OSD (not open 
source).

 

/Larry

 

 

From: License-discuss [mailto:license-discuss-boun...@opensource.org] On Behalf 
Of Richard Fontana
Sent: Thursday, April 6, 2017 8:51 AM
To: license-discuss@opensource.org
Subject: Re: [License-discuss] notes on a systematic approach to "popular" 
licenses

 

Interesting but at first glance the data seems too unreliable to be of any use. 
I started checking the identified projects under the so-called Clear BSD 
license (the FSF-free, never-OSI-submitted BSD variant that explicitly excludes 
patent licenses) and the ones I looked at were all spurious matches. 

 

Richard

 

 

 

On Thu, Apr 6, 2017, at 11:21 AM, Luis Villa wrote:

Yet another (inevitably flawed) data set: 

https://libraries.io/licenses

 

On Tue, Jan 10, 2017, 11:07 AM Luis Villa <l...@lu.is <mailto:l...@lu.is> > 
wrote:

[Apparently I got unsubscribed at some point, so if you've sent an email here 
in recent months seeking my feedback, please resend.]

 

Hey, all-

I promised some board members a summary of my investigation in '12-'13 into 
updating, supplementing, or replacing the "popular licenses" list. Here goes.

 

tl;dr

I think OSI should have an data-driven short license list with a replicable and 
transparent methodology, supplemented by a new-and-good(?) list that captures 
licenses that aren't yet popular but are high quality and have some substantial 
improvement that advances the goals of OSI.

 

Purposes of non-comprehensive lists

If you Google "open source licenses", OSI pages are the top two hits. 
Historically, those pages were not very helpful unless you already knew 
something about open source. Having a shorter "top" list can help make the OSI 
website more useful to newcomers by suggesting a starting place for their 
exploration and education about open source.  

 

In addition, third parties often look to OSI as a trusted (neutral?) source for 
"top" or "best" licenses that they can incorporate into products. (The full 
OSI-approved list is not practical for many applications.) For example, if OSI 
had an up-to-date short list, it might have been the basis for GitHub's license 
chooser.

A list that is purely based on popularity would freeze open source in a 
particular time, likely making it hard for new licenses with important 
innovations to get adoption. However, a list based on more subjective criteria 
is hard to create and update.

Past attempts

The proliferation report attempted to address this problem by categorizing 
existing licenses. These categories were, intentionally or not, seen as the 
"popular or strong communities list" and "everything else". Without a process 
or clear set of criteria to update the "popular" list, however, it became 
frozen in time. It is now difficult to credibly recommend the list to newcomers 
or third parties (MPL 1.1 is deprecated; no mention of Blackduck #4 GPL v3; 
etc.).

There was also substantial work done towards a license "chooser" or "wizard". 
However, this runs into some of the same problems - either the chooser is 
opinionated (and so pisses off people, and potentially locks the licenses in 
time) or is borderline-useless for newcomers (because it still requires 
substantial additional research after using it).

Data-driven "popular" list

With a

Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-04-06 Thread Richard Fontana
Interesting but at first glance the data seems too unreliable to be of
any use. I started checking the identified projects under the so-called
Clear BSD license (the FSF-free, never-OSI-submitted BSD variant that
explicitly excludes patent licenses) and the ones I looked at were all
spurious matches.


Richard







On Thu, Apr 6, 2017, at 11:21 AM, Luis Villa wrote:

> Yet another (inevitably flawed) data set: 

> https://libraries.io/licenses

> 

> On Tue, Jan 10, 2017, 11:07 AM Luis Villa  wrote:

>> [Apparently I got unsubscribed at some point, so if you've sent an
>> email here in recent months seeking my feedback, please resend.]
>> 

>> Hey, all-

>> I promised some board members a summary of my investigation in '12-
>> '13 into updating, supplementing, or replacing the "popular licenses"
>> list. Here goes.
>> 

>> *tl;dr*

>> I think OSI should have an data-driven short license list with a
>> replicable and transparent methodology, supplemented by a new-and-
>> good(?) list that captures licenses that aren't yet popular but are
>> high quality and have some substantial improvement that advances the
>> goals of OSI.
>> 

>> *Purposes of non-comprehensive lists*

>> If you Google "open source licenses", OSI pages are the top two hits.
>> Historically, those pages were not very helpful unless you already
>> knew something about open source. Having a shorter "top" list can
>> help make the OSI website more useful to newcomers by suggesting a
>> starting place for their exploration and education about open source.
>> 

>> In addition, third parties often look to OSI as a trusted (neutral?)
>> source for "top" or "best" licenses that they can incorporate into
>> products. (The full OSI-approved list is not practical for many
>> applications.) For example, if OSI had an up-to-date short list, it
>> might have been the basis for GitHub's license chooser.
>> A list that is purely based on popularity would freeze open source in
>> a particular time, likely making it hard for new licenses with
>> important innovations to get adoption. However, a list based on more
>> subjective criteria is hard to create and update.
>> *Past attempts*



>> The proliferation report attempted to address this problem by
>> categorizing existing licenses. These categories were,
>> intentionally or not, seen as the "popular or strong communities
>> list" and "everything else". Without a process or clear set of
>> criteria to update the "popular" list, however, it became frozen in
>> time. It is now difficult to credibly recommend the list to
>> newcomers or third parties (MPL 1.1 is deprecated; no mention of
>> Blackduck #4 GPL v3; etc.).
>> There was also substantial work done towards a license "chooser" or
>> "wizard". However, this runs into some of the same problems - either
>> the chooser is opinionated (and so pisses off people, and potentially
>> locks the licenses in time) or is borderline-useless for newcomers
>> (because it still requires substantial additional research after
>> using it).
>> *Data-driven "popular" list*



>> With all that in mind, I think that OSI needs a (mostly) data-driven
>> "popular" shortlist, based on a scan of public code + application of
>> (mostly?) objective rules to the outcome of that scan.
>> To maintain OSI's reputation as being (reasonably) neutral and
>> independent, OSI should probably avoid basing this on third-party
>> license surveys (e.g., Black Duck[1]) unless their methodologies and
>> data sources are well-documented. Ideally someone will write code so
>> that the "survey" can be run by OSI and reproduced by others.
>> Hard decisions on how to collect and "process" the data will include:


>>  * *choice of data sources:* What data sources are drawn on? Key
>>Linux distros? GitHub? per-language repos like maven, cpan, npm,
>>etc?
>>  * *what are you counting?** *Projects? (May favor small, throwaway
>>projects?) Lines of code? (May favor the largest, most complex
>>projects?) ... ?
>>  * *which license tools? *Some scanners are more aggressive in trying
>>to identify *something*, while others prefer accuracy over
>>comprehensiveness. In 2013 there was no good answer to this, but
>>my understanding is that fossology now has three different
>>scanners, so for OSI's purposes it may be sufficient to take those
>>three and average.
>>* Could throw in Black Duck or other non-transparent surveys as a
>>  fourth, fifth, etc.?
>>  * *new versions? *If a new version exists but isn't widely adopted
>>yet, how does the list reflect that? e.g., MPL 1.1 still shows up
>>in Black Duck's survey; should OSI replace 1.1 with 2.0 in the
>>"processed" list? What about GPL v2 v. v3? BSD/MIT v. UPL?
>>  * *gaps/"mistakes":* What happens when the board thinks the data is
>>incorrect? :) e.g., should ISC be listed?
>> Part of why we didn't go very far in 2013 is because there are no
>> great answers for these - different 

Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-04-06 Thread Luis Villa
Yet another (inevitably flawed) data set:
https://libraries.io/licenses

On Tue, Jan 10, 2017, 11:07 AM Luis Villa  wrote:

> [Apparently I got unsubscribed at some point, so if you've sent an email
> here in recent months seeking my feedback, please resend.]
>
> Hey, all-
> I promised some board members a summary of my investigation in '12-'13
> into updating, supplementing, or replacing the "popular licenses" list.
> Here goes.
>
>
> *tl;dr*
> I think OSI should have an data-driven short license list with a
> replicable and transparent methodology, supplemented by a new-and-good(?)
> list that captures licenses that aren't yet popular but are high quality
> and have some substantial improvement that advances the goals of OSI.
>
>
> *Purposes of non-comprehensive lists*
> If you Google "open source licenses", OSI pages are the top two hits.
> Historically, those pages were not very helpful unless you already knew
> something about open source. Having a shorter "top" list can help make the
> OSI website more useful to newcomers by suggesting a starting place for
> their exploration and education about open source.
>
> In addition, third parties often look to OSI as a trusted (neutral?)
> source for "top" or "best" licenses that they can incorporate into
> products. (The full OSI-approved list is not practical for many
> applications.) For example, if OSI had an up-to-date short list, it might
> have been the basis for GitHub's license chooser.
>
> A list that is purely based on popularity would freeze open source in a
> particular time, likely making it hard for new licenses with important
> innovations to get adoption. However, a list based on more subjective
> criteria is hard to create and update.
>
> *Past attempts*
>
> The proliferation report attempted to address this problem by categorizing
> existing licenses. These categories were, intentionally or not, seen as the
> "popular or strong communities list" and "everything else". Without a
> process or clear set of criteria to update the "popular" list, however, it
> became frozen in time. It is now difficult to credibly recommend the list
> to newcomers or third parties (MPL 1.1 is deprecated; no mention of
> Blackduck #4 GPL v3; etc.).
>
> There was also substantial work done towards a license "chooser" or
> "wizard". However, this runs into some of the same problems - either the
> chooser is opinionated (and so pisses off people, and potentially locks the
> licenses in time) or is borderline-useless for newcomers (because it still
> requires substantial additional research after using it).
>
> *Data-driven "popular" list*
>
> With all that in mind, I think that OSI needs a (mostly) data-driven
> "popular" shortlist, based on a scan of public code + application of
> (mostly?) objective rules to the outcome of that scan.
>
> To maintain OSI's reputation as being (reasonably) neutral and
> independent, OSI should probably avoid basing this on third-party license
> surveys (e.g., Black Duck
> ) unless
> their methodologies and data sources are well-documented. Ideally someone
> will write code so that the "survey" can be run by OSI and reproduced by
> others.
>
> Hard decisions on how to collect and "process" the data will include:
>
>- *choice of data sources:* What data sources are drawn on? Key Linux
>distros? GitHub? per-language repos like maven, cpan, npm, etc?
>- *what are you counting?* Projects? (May favor small, throwaway
>projects?) Lines of code? (May favor the largest, most complex projects?)
>... ?
>- *which license tools? *Some scanners are more aggressive in trying
>to identify *something*, while others prefer accuracy over
>comprehensiveness. In 2013 there was no good answer to this, but my
>understanding is that fossology now has three different scanners, so for
>OSI's purposes it may be sufficient to take those three and average.
>- Could throw in Black Duck or other non-transparent surveys as a
>   fourth, fifth, etc.?
>   - *new versions? *If a new version exists but isn't widely adopted
>yet, how does the list reflect that? e.g., MPL 1.1 still shows up in Black
>Duck's survey; should OSI replace 1.1 with 2.0 in the "processed" list?
>What about GPL v2 v. v3? BSD/MIT v. UPL?
>- *gaps/"mistakes":* What happens when the board thinks the data is
>incorrect? :) e.g., should ISC be listed?
>
> Part of why we didn't go very far in 2013 is because there are no great
> answers for these - different answers will reflect different values, and
> have different engineering impact. They're all hard choices for the board,
> the developers, hopefully license-discuss, and perhaps a broader community.
>
> Hat tip: Daniel German was invaluable to me in thinking through these
> questions.
>
> *Supplementing with high-quality, value-adding options*
> To encourage progress, while still avoiding proliferation, 

Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-01-25 Thread Stefano Zacchiroli
On Mon, Jan 23, 2017 at 07:04:40PM +, Luis Villa wrote:
>- Top 10 open source licenses
>
> 
>from WhiteSource. Top 5 are same as Black Duck, but BlackDuck has Perl at
>#6 and ISC at #7 (despite being deprecated by ISC!) and MS-PL doesn't make
>the top 10; WhiteSource doesn't have ISC or Perl and has MS-PL at #7.

For the records, and unless I'm missing something, this seems to be at
the same level of "scientificity" of the yearly report by Black Duck: we
don't know what's in the database of "over 3M open source components and
70M source files", we don't know what they count to produce the pie
charts (files?  "components"? popularity? etc.), nor we have access to
the code used to due the counting.

I'd be glad to be proven wrong and pointed to all the details (data,
source code, etc.) that allow to independently verify the results of
that (and/or similar) studies.

Cheers.
-- 
Stefano Zacchiroli . z...@upsilon.cc . upsilon.cc/zack . . o . . . o . o
Computer Science Professor . CTO Software Heritage . . . . . o . . . o o
Former Debian Project Leader . OSI Board Director  . . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »
___
License-discuss mailing list
License-discuss@opensource.org
https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss


Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-01-23 Thread Luis Villa
On Tue, Jan 10, 2017 at 1:56 PM Richard Fontana 
wrote:

On Tue, Jan 10, 2017 at 04:07:53PM +, Luis Villa wrote:

> *Supplementing with high-quality, value-adding options*
> To encourage progress, while still avoiding proliferation, I'd suggest a
> second list of licenses that are good but not (yet?) popular.



I like the general idea, and I suppose it corresponds to what the OSI
was trying to do with the partial updating of the 2006 popular list. I
would rather have #3 be "must be determined to be well drafted and of high
quality" without giving specifics (despite the additional
subjectivity this would introduce).


I'd still suggest pushing for collaboratively drafted, which could include
"merely" incorporating substantial feedback from license-review, as in the
case of UPL, but ideally would rise to the GPL/MPL level of community
discussion.

Perhaps where there has not been a non-OSI public discussion, "well drafted
and high quality" could implicitly or explicitly be part of the OSI
discussion and evaluation.


Looking at the whole history of
open source licensing, it is hard to make the case that involvement of
an attorney is a likely indicator of higher quality. :)


Touché ;)

> If a new license meets #1, but not #3 and #4, then OSI's formal policy
> should be to approve, but bury it in one of the other proliferation list
> groups. (Those groups are actually quite good, and should be fairly
> non-controversial — once you have a good policy for what gets in the more
> "favored" groups.) I don't think a new "deprecated" group is necessary -
> the proliferation categories are basically a good list of that already.

I actually think we should take a fresh look at these proliferation
categories.


Interesting! Any changes in particular? No objection to that in principle,
but they've always seemed decent to me.


A bigger problem is that ... OSI came to be a place where one would bring
licenses that are not being used yet -- which in some cases could mean
licenses that never end up being used.


Interesting, and obviously correct, observation. Getting out of that trap
is a bit of a catch-22, though: lots of the reason anyone cares about OSI
at all is that it is seen as a bit of a "seal of approval" for
unusual/unused licenses. If you say "no, we don't approve until used" then
there is very little incentive for anyone to bother to bring the license
later, once adoption has occurred. Perhaps that is not a problem, or
perhaps that is solved(lessened?) by making clear that there is "this
passes our lowest bars" v. "this is actually recommended".

Two relevant reads since I started this initial thread:

   - Redmonk
   ;
   interesting on a number of fronts but perhaps most interestingly noting
   that the weak copyleft family has somewhat dropped out of the middle
   between permissive/strong copyleft.
   - Top 10 open source licenses
   

   from WhiteSource. Top 5 are same as Black Duck, but BlackDuck has Perl at
   #6 and ISC at #7 (despite being deprecated by ISC!) and MS-PL doesn't make
   the top 10; WhiteSource doesn't have ISC or Perl and has MS-PL at #7.

Luis
___
License-discuss mailing list
License-discuss@opensource.org
https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss


Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-01-11 Thread John Cowan
On Tue, Jan 10, 2017 at 4:05 PM, Richard Fontana 
wrote:

I had thought it might be preferable to return to the original
> "popular list" and just make clear that it is the product of a
> now-distant point in time, but I now believe this solution would
> probably be seen by many as worse than the current approach.
>

Perhaps we should add a few  images to the page.

In fact, I don't see how any list can be seen as anything but
prejudiced, and any data-driven list will merely reflect the
prejudices and fashions, not even of today, but of the past,
since open-source software once published rarely disappears.

-- 
 John Cowan  http://vrici.lojban.org/~cowanco...@ccil.org
Using RELAX NG compact syntax to develop schemas is one of the simple
pleasures in life  --Jeni Tennison
___
License-discuss mailing list
License-discuss@opensource.org
https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss


Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-01-10 Thread Henrik Ingo
Luis

Thanks for keeping this discussion alive. My comments:

As for popular licenses, I generally agree with your suggestions. I
would also argue that coming up with a list of de-facto most popular
licenses shouldn't be as bitterly controversial as you're prepared
for, and maybe the history of this discussion is.

The fact is, whatever method you choose, you should roughly expect to
see an exponentially decreasing curve. There would typically arise
some threshold, above which the most popular items are clearly ahead
of everyone else. (e.g. the 20/80 Pareto rule, etc)

Purely for the sake of this discussion, if we look at Black Duck's
list: https://www.blackducksoftware.com/top-open-source-licenses

...I can see 2-4 such thresholds:
Threshold 1: The 3rd license is 7% ahead of the 4th
Threshold 2: The 4th license is 3% ahead of the 5th
Threshold 3: The 8th license is 2% ahead of the 9th
Threshold 4: Anything at least 1% or higher

Of course, from the above set, people would still lobby for a
threshold including their favorite license or excluding one they
dislike. But it is nevertheless a limited set to argue about, that
arises naturally from the data.

Furthermore, your suggestion of also listing new licenses, should help
ease the pressure. For example, nobody could argue against that the
set above the tightest threshold - MIT, GPLv2 and Apache - are the
most popular licenses. They also happen to include exactly the 3 types
people most often look for: A short and permissive "BSD style"
license, a strong copyleft license, and a long permissive license,
including for example a clear patent license. So even if someone might
want to argue for a list that is longer than 3, this "emerging out of
the data" threshold already landed us in a very useful place.

Now, one could of course criticize this and say that a list of popular
licenses must include at least GPLv3 and BSD (or LGPL, or something
else). But GPLv3 could in this case be featured in the list of new
licenses, with the explanation that it is an update to GPLv2, but has
not yet overtaken it in popularity. BSD on the other hand - based on
this data - is perhaps not worthy of being on a most popular list? The
data clearly suggests that the MIT license is the most popular one
among this family, even if "BSD style" is the common name for the
category you hear most often.

In fact, adding the concept of license families might again help ease
some pressure from this discussion, and also be genuinely useful. So
for example, the entry for GPLv2 should list all other licenses in the
GPL family (both v2 and v3), and the entry for MIT could then link to
a list of other BSD-style licenses. I don't know if the Apache License
has such siblings at the moment?



A few points on the list for new licenses. I think your idea and
criteria are sound. Perhaps a nice addition to your proposal would be
to provide some context by digging through historical statistics: How
many new licenses have been approved in the last, say, 10 years, that
weren't legacy, redundant, special purpose, etc? Perhaps the number is
small enough that they could all be added to such a list?

I would add then an expiration date to this list, which could be for
example 10 years. The point of the expiration date of course would be
that a license on the "new list" should become a popular license
within that time, or if it doesn't, it will no longer be featured.
(There's a correlation between how low the threshold is for the
popular list and the expiration date for the new list.)

henrik




On Tue, Jan 10, 2017 at 6:07 PM, Luis Villa  wrote:
> [Apparently I got unsubscribed at some point, so if you've sent an email
> here in recent months seeking my feedback, please resend.]
>
> Hey, all-
> I promised some board members a summary of my investigation in '12-'13 into
> updating, supplementing, or replacing the "popular licenses" list. Here
> goes.
>
> tl;dr
> I think OSI should have an data-driven short license list with a replicable
> and transparent methodology, supplemented by a new-and-good(?) list that
> captures licenses that aren't yet popular but are high quality and have some
> substantial improvement that advances the goals of OSI.
>
> Purposes of non-comprehensive lists
> If you Google "open source licenses", OSI pages are the top two hits.
> Historically, those pages were not very helpful unless you already knew
> something about open source. Having a shorter "top" list can help make the
> OSI website more useful to newcomers by suggesting a starting place for
> their exploration and education about open source.
>
> In addition, third parties often look to OSI as a trusted (neutral?) source
> for "top" or "best" licenses that they can incorporate into products. (The
> full OSI-approved list is not practical for many applications.) For example,
> if OSI had an up-to-date short list, it might have been the basis for
> GitHub's license chooser.
>
> A list that is purely based on 

Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-01-10 Thread Richard Fontana
On Tue, Jan 10, 2017 at 04:07:53PM +, Luis Villa wrote:

> With all that in mind, I think that OSI needs a (mostly) data-driven
> "popular" shortlist, based on a scan of public code + application of
> (mostly?) objective rules to the outcome of that scan.
> 
> To maintain OSI's reputation as being (reasonably) neutral and independent,
> OSI should probably avoid basing this on third-party license surveys
> (e.g., Black
> Duck ) unless
> their methodologies and data sources are well-documented. Ideally someone
> will write code so that the "survey" can be run by OSI and reproduced by
> others.

+1

> *Supplementing with high-quality, value-adding options*
> To encourage progress, while still avoiding proliferation, I'd suggest a
> second list of licenses that are good but not (yet?) popular. "Good" would
> be defined as something like:
> 
>1. meets the OSD
>2. isn't on the data-driven popularity list
>3. drafted by an attorney (at minimum) or by a collaborative, public
>drafting process with clear support from a sponsoring-maintaining
>organization (ideal)
>4. has a new "feature" that is firmly in keeping with the overall goals
>of open source and can be concisely explained in a few sentences (e.g., for
>UPL, "GPL-compatible permissive license with explicit patent grant")
>1. but not "just for a particular community" - has to be at least
>   plausible applicable to most open source projects
>   2. this is unavoidably subjective; suggest having it fall to the
>   board with pre-discussion on license-review.

> #4 allows for some innovation (and OSI support of such innovation) while #3
> applies a quality filter. (Both #3 and #4 have anti-proliferation effects.)
> Hopefully licenses that meet #3 and #4 would eventually move into #2, but
> you could imagine placing a time limit on this list; if you're not in the
> top 10 most popular within five years, then you get retired? But not sure
> that's a good idea at all - just throwing it out as one option.

I like the general idea, and I suppose it corresponds to what the OSI
was trying to do with the partial updating of the 2006 popular list. I
would rather have #3 be "must be determined to be well drafted and of
high quality" without giving specifics (despite the additional
subjectivity this would introduce). Looking at the whole history of
open source licensing, it is hard to make the case that involvement of
an attorney is a likely indicator of higher quality. :)

> If a new license meets #1, but not #3 and #4, then OSI's formal policy
> should be to approve, but bury it in one of the other proliferation list
> groups. (Those groups are actually quite good, and should be fairly
> non-controversial — once you have a good policy for what gets in the more
> "favored" groups.) I don't think a new "deprecated" group is necessary -
> the proliferation categories are basically a good list of that already.

I actually think we should take a fresh look at these proliferation
categories.

>- With SPDX and Fedora providing more comprehensive lists of FOSS
>licenses, it might make sense for OSI to link to those as "extended"
>resources, to reduce pressure from obscure license authors to get their
>license approved.

A bigger problem is that somehow, and quite early on, the OSI came to
be seen as an organization that encouraged "experimental
licenses". (Not entirely a problem - the good part of this is that it
actively encouraged creativity and advancement in the field of open
source licensing.) One characteristic of the SPDX, Fedora, Debian [to
the extent an actual 'list' of DFSG-compatible licenses exists, which
I'm not sure of], and the FSF lists is that they deal with the actual
real world, for the most part: they consider licenses that are really
being used. OSI came to be a place where one would bring licenses that
are not being used yet -- which in some cases could mean licenses that
never end up being used.

SPDX and Fedora are thus not really going to reduce pressure for
obscure license authors unless those license authors actually see
their licenses in real use.

Richard
___
License-discuss mailing list
License-discuss@opensource.org
https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss


Re: [License-discuss] notes on a systematic approach to "popular" licenses

2017-01-10 Thread Richard Fontana
On Tue, Jan 10, 2017 at 04:07:53PM +, Luis Villa wrote:

> The proliferation report attempted to address this problem by categorizing
> existing licenses. These categories were, intentionally or not, seen as the
> "popular or strong communities list" and "everything else". Without a
> process or clear set of criteria to update the "popular" list, however, it
> became frozen in time. It is now difficult to credibly recommend the list
> to newcomers or third parties (MPL 1.1 is deprecated; no mention of
> Blackduck #4 GPL v3; etc.).
[...]
 
>- I don't recommend merely updating the existing "popular and..." list
>through a subjective or one-time process. The politics of that will be
>messy, and without a documented, mostly-objective, data-driven method,
>it'll again become an outdated mess.

Luis, I agree.

I just want to point out something I've said privately (and I think
publicly as well, if not in a few years), which is that the current
version of the "popular or strong communities list" is in my opinion a
mess. It takes the original (flawed IMO) ~2006 list and does the
following:

* Changes MPL 1.1 to MPL 2.0 (which of course didn't exist in 2006 and
  which is significantly different from MPL 1.1)

* In contrast to MPL, the existence of significantly different
  OSI-approved versions of the GPL and LGPL is ignored

* Ignores the fact that CDDL's current license steward has for several
  years had a minor (1.1) update which has not been submitted for OSI
  approval

I had thought it might be preferable to return to the original
"popular list" and just make clear that it is the product of a
now-distant point in time, but I now believe this solution would
probably be seen by many as worse than the current approach.

Richard


___
License-discuss mailing list
License-discuss@opensource.org
https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss


[License-discuss] notes on a systematic approach to "popular" licenses

2017-01-10 Thread Luis Villa
[Apparently I got unsubscribed at some point, so if you've sent an email
here in recent months seeking my feedback, please resend.]

Hey, all-
I promised some board members a summary of my investigation in '12-'13 into
updating, supplementing, or replacing the "popular licenses" list. Here
goes.


*tl;dr*
I think OSI should have an data-driven short license list with a replicable
and transparent methodology, supplemented by a new-and-good(?) list that
captures licenses that aren't yet popular but are high quality and have
some substantial improvement that advances the goals of OSI.


*Purposes of non-comprehensive lists*
If you Google "open source licenses", OSI pages are the top two hits.
Historically, those pages were not very helpful unless you already knew
something about open source. Having a shorter "top" list can help make the
OSI website more useful to newcomers by suggesting a starting place for
their exploration and education about open source.

In addition, third parties often look to OSI as a trusted (neutral?) source
for "top" or "best" licenses that they can incorporate into products. (The
full OSI-approved list is not practical for many applications.) For
example, if OSI had an up-to-date short list, it might have been the basis
for GitHub's license chooser.

A list that is purely based on popularity would freeze open source in a
particular time, likely making it hard for new licenses with important
innovations to get adoption. However, a list based on more subjective
criteria is hard to create and update.

*Past attempts*

The proliferation report attempted to address this problem by categorizing
existing licenses. These categories were, intentionally or not, seen as the
"popular or strong communities list" and "everything else". Without a
process or clear set of criteria to update the "popular" list, however, it
became frozen in time. It is now difficult to credibly recommend the list
to newcomers or third parties (MPL 1.1 is deprecated; no mention of
Blackduck #4 GPL v3; etc.).

There was also substantial work done towards a license "chooser" or
"wizard". However, this runs into some of the same problems - either the
chooser is opinionated (and so pisses off people, and potentially locks the
licenses in time) or is borderline-useless for newcomers (because it still
requires substantial additional research after using it).

*Data-driven "popular" list*

With all that in mind, I think that OSI needs a (mostly) data-driven
"popular" shortlist, based on a scan of public code + application of
(mostly?) objective rules to the outcome of that scan.

To maintain OSI's reputation as being (reasonably) neutral and independent,
OSI should probably avoid basing this on third-party license surveys
(e.g., Black
Duck ) unless
their methodologies and data sources are well-documented. Ideally someone
will write code so that the "survey" can be run by OSI and reproduced by
others.

Hard decisions on how to collect and "process" the data will include:

   - *choice of data sources:* What data sources are drawn on? Key Linux
   distros? GitHub? per-language repos like maven, cpan, npm, etc?
   - *what are you counting?* Projects? (May favor small, throwaway
   projects?) Lines of code? (May favor the largest, most complex projects?)
   ... ?
   - *which license tools? *Some scanners are more aggressive in trying to
   identify *something*, while others prefer accuracy over
   comprehensiveness. In 2013 there was no good answer to this, but my
   understanding is that fossology now has three different scanners, so for
   OSI's purposes it may be sufficient to take those three and average.
   - Could throw in Black Duck or other non-transparent surveys as a
  fourth, fifth, etc.?
  - *new versions? *If a new version exists but isn't widely adopted
   yet, how does the list reflect that? e.g., MPL 1.1 still shows up in Black
   Duck's survey; should OSI replace 1.1 with 2.0 in the "processed" list?
   What about GPL v2 v. v3? BSD/MIT v. UPL?
   - *gaps/"mistakes":* What happens when the board thinks the data is
   incorrect? :) e.g., should ISC be listed?

Part of why we didn't go very far in 2013 is because there are no great
answers for these - different answers will reflect different values, and
have different engineering impact. They're all hard choices for the board,
the developers, hopefully license-discuss, and perhaps a broader community.

Hat tip: Daniel German was invaluable to me in thinking through these
questions.

*Supplementing with high-quality, value-adding options*
To encourage progress, while still avoiding proliferation, I'd suggest a
second list of licenses that are good but not (yet?) popular. "Good" would
be defined as something like:

   1. meets the OSD
   2. isn't on the data-driven popularity list
   3. drafted by an attorney (at minimum) or by a collaborative, public
   drafting process with clear support from a