On Jul 30, 2011, at 3:58 AM, Peter Rice wrote: > Quoted in full for the benefit of the debian-med list who missed the original > posting > > On 29/07/2011 21:35, Adam Sjøgren wrote: >> On Fri, 29 Jul 2011 09:39:46 +0100, Peter wrote: >> >>> It might make things clearer if someone from Debian could explain: >> >> (I am not from Debian, but here is my take on it anyway:) >> >>> (a) why a Creative Commons licence is an issue for you >> >> One of the fundamental software freedoms is the freedom to change the >> software¹. >> >> The Debian Free Software Guidelines' definition of free software >> includes this freedom². >> >> So the "No Derivatives" variants of the Creative Commons licenses aren't >> free by the DFSG definition. >> >> (The GNU Free Documentation License on documents with invariant sections >> is considered non-free by DFSG-standards as well, even if the invariant >> sections are things that nobody would want to change.) >> >> When a project of volunteers packages 29000+ thousand packages, I think >> making a judgement call on whether it is okay that the license of a >> couple of files does not live up to the guidelines is neigh impossible. > >> The answer to "Why would you want to?" is, because you might need to. >> >> It is more obvious with programs and code than it is with database >> entries, granted - but I guess the equivalent problem would be that the >> licensor didn't want to fix a problem in such a database, and that >> problem made the programs using it malfunction. It would be a pain if >> you weren't allowed to fix the problem and distribute the fixed data >> yourself, say, if "upstream" didn't want to include the fix for some >> reason or another; maybe they happened to turn sour on the world/you - >> stranger things have happened. >> >> So, nobody is probably ever going to exercise that freedom in this >> specific case, I think, but ignoring some of the freedoms in special >> cases is infeasible for a project such as Debian. >> >> This is just me trying to explain how I understand it, so take it with a >> grain of salt, and swing by debian-legal³ for the experts. > > A specific example might help. About 5 years ago a release of the UniProt > database (as plain text files) broke the Wisconsin (GCG) sequence analysis > package. They introduced extremely long lines in a data file that everyone > assumed was only maximum 80 characters. > > As GCG was closed source, the fix required a change to the UniProt files to > either wrap or truncate the 'offending' records. > > The fix was not to distribute a change to the data of course, but to write > and distribute a simple perl script that wrapped the long records. > > That was not a licensing issue - the content stays the same, the format is > changed, no changed data is distributed. But it does illustrate that the > database licensing does not prevent 'fixing' a database. > >>> (b) why you appear to consider a copy of a whole or part of a public >>> biological database as part of an "operating system" >> >> They are part of a package which is included in the Debian GNU/Linux >> free operating system. > > I expect there are many problems that arise if data ... and documentation ... > are considered to be software. For EMBOSS we didn't officially specify a > license for the documentation but other packages probably do. It still > worries me that some of our documentation files officially include GPL > licensed (EMBOSS) source code but I did not like any of the alternative > documentation licenses.
I don't understand the logic behind why data would be considered software, unless one is using a very fuzzy definition of 'software'. Is this strictly a packaging issue, e.g. any data packaged with source makes it 'software'? Or just the fact that such data is licensed? Would a package of just data/docs (no code) be allowed? >> (I personally think it would make sense to change to a Creative Commons >> license that allows derivative works - Uniprot and others are going to >> be the canonical source for the data anyway, so nothing will be lost by >> them by doing that, as far as I can see.) > > Unlikely. The no-derivatives version is specifically there to prevent > derivatives - for example Debian distributing a modified UniProt without > permission. > > The ontologies are similar, but do allow for the use case of importing terms > from one ontology into another if the ontology name is changed (and > preferably if cross-references to the original are provided). Again, the need > is to protect the integrity of the original ontology content so references to > a GO term or a UniProt entry are clearly defined. > > This is essential for many of the public bioinformatics databases. Data and > software are not the same in this context. I am curious whether documentation > licensing raises any issues. > > Just my 2c worth > > Peter Rice > EMBOSS Team Maybe the best solution is to just package any data separately? We have talked about setting up a 'biodata' repository for common datasets from all the Bio* projects. Feel free to skip the rest of this, but: <my_2c> I agree with Peter's point, Uniprot and other databases license data this way for very good (and well-intentioned) reasons. For the Bio* languages there are instances where we use such data as a fallback in case a newer version isn't immediately available (REBase and SO come to mind, and I think we have others), so we are likely in the same boat as EMBOSS. I had a long screed here, but I found some original sources for the discussion re: Uniprot and use of Creative Commons licensing that states the reasoning for why this is in place: http://wiki.creativecommons.org/Case_Studies/Uniprot http://eric.jain.name/2006/02/07/uniprot-creative-commons/ http://sciencecommons.org/resources/faq/databases/ http://sciencecommons.org/resources/faq/database-protocol/ Note there is now a 'Database Protocol' (last link) that recommends a different license; that page nicely summarizes the history the whole Creative Commons licensing affair and the issues of using a Creative Commons license re: databases, mainly due to the issue Peter mentioned above, that databases != software. Uniprot doesn't use this as of yet (so it doesn't solve the problem at hand), but it's possible this may change. </my_2c> chris Christopher Fields Senior Research Scientist National Center for Supercomputing Applications Institute for Genomic Biology University of Illinois Urbana-Champaign 1206 W. Gregory Dr. , MC-195 Urbana, IL 61801 _______________________________________________ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss