Re: Biological data being used by an unpublished research paper is considered proprietary
On Mon, Sep 16, 2013 at 12:59:11PM +0100, Peter Rice wrote: On 16/09/2013 11:31, Faheem Mitha wrote: This is really not Debian-related, except insofar as the software in question is something that might have been in Debian one day. I talked about that with people on debian-med recently. So, it is technically off-topic. I posted a reply on stackexchange with instructions to get the data from the EBI SRS server. However, I have run into this issue before in the context of biological database entries and Debian so it may be worth discussing here. There were objections to including SwissProt entries in the example data for the EMBOSS package because the licensing of SwissProt does not allow them to be edited. That was resolved by agreeing that scientific facts should not be edited so that the files could be accepted as part of a Debian package even though they could not be changed. A fine compromise I feel. Hopefully, this is a misstatement of the actual rationale for including this data in Debian, because it is *not* acceptable to have packages in main containing data that we are not allowed to modify. The real rationale is surely that, because facts are *not governed by copyright*, any licensing claim over this data is ignorable. -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. Ubuntu Developerhttp://www.debian.org/ slanga...@ubuntu.com vor...@debian.org signature.asc Description: Digital signature
Re: Biological data being used by an unpublished research paper is considered proprietary
Hi Steve, On Wed, 25 Sep 2013, Steve Langasek wrote: On Mon, Sep 16, 2013 at 12:59:11PM +0100, Peter Rice wrote: On 16/09/2013 11:31, Faheem Mitha wrote: This is really not Debian-related, except insofar as the software in question is something that might have been in Debian one day. I talked about that with people on debian-med recently. So, it is technically off-topic. I posted a reply on stackexchange with instructions to get the data from the EBI SRS server. However, I have run into this issue before in the context of biological database entries and Debian so it may be worth discussing here. There were objections to including SwissProt entries in the example data for the EMBOSS package because the licensing of SwissProt does not allow them to be edited. That was resolved by agreeing that scientific facts should not be edited so that the files could be accepted as part of a Debian package even though they could not be changed. A fine compromise I feel. Hopefully, this is a misstatement of the actual rationale for including this data in Debian, because it is *not* acceptable to have packages in main containing data that we are not allowed to modify. Well, I suppose you can modify the data, but then it won't be the same data. :-) The real rationale is surely that, because facts are *not governed by copyright*, any licensing claim over this data is ignorable. So, biological data is not actually copyrightable? Can you (or anyone else) give me relevant documentation about that? Apparently it may vary by jurisdiction. Does anyone know the rules in the EU, which seems to be what is relevant here, since the servers in question are in Europe? For the record, I've gone ahead and removed the data from my repository, because I wasn't sure whether the person telling me not to distribute it had the right to do so or not. I've added a script to download the data, and will document it. It is not really a big deal either way, but if I had some definite information I could for example email this person back with that information. I wonder if debian-legal would be a better place to ask this. I haven't asked them. Regards, Faheem -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. Ubuntu Developerhttp://www.debian.org/ slanga...@ubuntu.com vor...@debian.org -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/alpine.deb.2.02.1309252213170.3...@orwell.homelinux.org
Re: Biological data being used by an unpublished research paper is considered proprietary
]] Steve Langasek The real rationale is surely that, because facts are *not governed by copyright*, any licensing claim over this data is ignorable. Copyrights are not the any type of «IP» that may require licensing. Database rights exist in Europe for instance. -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/m2bo3gsp3o@rahvafeir.err.no
Re: Biological data being used by an unpublished research paper is considered proprietary
On Mon, 16 Sep 2013, Peter Rice wrote: However, I have run into this issue before in the context of biological database entries and Debian so it may be worth discussing here. There were objections to including SwissProt entries in the example data for the EMBOSS package because the licensing of SwissProt does not allow them to be edited. That was resolved by agreeing that scientific facts should not be edited so that the files could be accepted as part of a Debian package even though they could not be changed. A fine compromise I feel. Le Thu, Sep 19, 2013 at 01:50:48AM +0530, Faheem Mitha a écrit : So, what license did these files go into Debian as? Hello Faheem and Peter, the license page of the UniProt consortium now underlines that the CC-ND license applies only to the copyrightable parts of its databases. We have chosen to apply the Creative Commons Attribution-NoDerivs License to all copyrightable parts of our databases. This means that you are free to copy, distribute, display and make commercial use of these databases in all legislations, provided you give us credit. However, if you intend to distribute a modified version of one of our databases, you must ask us for permission first. http://www.uniprot.org/help/license Since facts can not be copyrighted, I think that the current consensus within Debian is that the copyright statements in the records apply to the whole database and not to the records taken in isolation. This means that in theory, the copyright law does not forbid changing the sequence in individual records distributed separately from the database. In practice, there may be other reasons, and I would list ethics on the top of the list, to not do so in a misleading way. Have a nice day, Charles -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130922061426.ga...@falafel.plessy.net
Re: Biological data being used by an unpublished research paper is considered proprietary
Hi Peter, Thank you for your very helpful answer. Seriously, it is rare to get such a good answer on such a topic. I actually read your response on academia.sx before you saw your email, and I should have guessed such a good reason would have come from a Debian person. Also, I see you registered the same day as your answer. :-) I'm keeping debian-devel and debian-med cc'd for now, because I do have some general questions about biological data licensing. If the lists want me to go away, just say so. Since you posted your answer publicly, I'm assuming you don't mind if I quote it. I recommend you post your answer to the Debian lists, since there is no guarantee that academia.sx will be around forever. See responses inline. I'm afraid there are a lot of questions, but I really can't pass up the opportunity to get some answers for once. Sorry about that. If you don't want to answer my questions (and let's face it, you probably don't) perhaps you can suggest some suitable mailing list(s)/forum(s)? On Mon, 16 Sep 2013, Peter Rice wrote: On 16/09/2013 11:31, Faheem Mitha wrote: Hi, This is really not Debian-related, except insofar as the software in question is something that might have been in Debian one day. I talked about that with people on debian-med recently. So, it is technically off-topic. I posted a reply on stackexchange with instructions to get the data from the EBI SRS server. However, I have run into this issue before in the context of biological database entries and Debian so it may be worth discussing here. There were objections to including SwissProt entries in the example data for the EMBOSS package because the licensing of SwissProt does not allow them to be edited. That was resolved by agreeing that scientific facts should not be edited so that the files could be accepted as part of a Debian package even though they could not be changed. A fine compromise I feel. So, what license did these files go into Debian as? regards, Peter Rice EMBOSS team The copyright is probably on the full database release flatfile and the formatted entries ... you will find similar conditions for UniProt/SwissProt so it is not so unusual. Yes, but I'm not trying to download their entire database, just a small portion of it. The restrictions on scripts are common to prevent server performance hits from a large number of requests. Is such a restriction legally enforceable? I don't see how one can distinguish between a human user downloading using say curl, and a script using curl with random pauses between downloads. Or is acceding to such a request just a matter of common courtesy? You can simply invite reviewers to download the data from some other server, for example from the EBI SRS server. The URL for entry A00673 would be http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[IMGTLIGM-ID:a00673]+-view+FastaSeqs+-ascii; Wow, that works for me! Cool. I've tried before to download data from other biological data web services, but have always fallen down confused at the complexity of the sites and the multiplicity of their options. IMGT is practically the only such site I have found which I found I was able to navigate without getting brain fever. http://www.ebi.ac.uk/miriam/main/collections/MIR:0287 So a few possibly dumb questions. Question 1: Is there no general agreement on the licensing of biological data such as that the kind we are talking about? This seems strange. Aren't such data biological facts, as you put it in your message? To me, it makes as much sense to try to treat the list of prime numbers or any other such mathematical facts as proprietary information. Specifically, I don't understand how IMGT can claim to own this data, to the extent of forbidding its redistribution. They didn't produce this data themselves, did they? Question 2: It looks like EBI is hosting a copy of the IMGT database. Is that right? Also, there are a lot of different kinds of accession numbers. Which accession numbers is IMGT using here? Also, do you know of other servers that have the same data? You can also use a list of accessions, for example A00673 or A01650 http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[IMGTLIGM-ID:a00673|a01650]+-view+FastaSeqs+-ascii If downloading many entries you should pause between requests, but putting lists into the URLs may reduce it to few enough not to cause a problem. I doubts EBI would be upset by 200 requests - they would be concerned about thousands. This is *really* useful. I see each of these list requests produces one fasta file with multiple sequences in them. I think this is be a better way to go rather than producing hundreds of fasta files, each containing a single sequence, as I have been doing. Also, unlike IMGT, one justs downloads a FASTA file directly, without having to trim off HTML stuff. I suspect that each request corresponds at the backend to a SQL query, and if so, I'm sure the system would prefer one larger SQL query to
Biological data being used by an unpublished research paper is considered proprietary
Hi, This is really not Debian-related, except insofar as the software in question is something that might have been in Debian one day. I talked about that with people on debian-med recently. So, it is technically off-topic. However, I thought that maybe people on these lists would have some input on the matter. People in Debian are very experienced in matters of copyright and licensing, and people in debian-med presumably know something about copyright/licensing of biological data. I posted the following to academia.stackexchange.com, http://academia.stackexchange.com/q/12718/285 As I write there is one reply. Summary of my SE question: 1) A distributor of biological data is claiming proprietary ownership of the data. This runs contrary to what I know about such data. Can anyone comment? 2) The distributor also says a script to download the (200) data files is prohibited. Saying I cannot use a script to download the data (curl in my case) is in IMO downright bizarre. Is expecting a user to download 200 files manually reasonable, and how would the server tell the difference anyway? They're all just http requests. Please CC me on any reply. Thanks. Regards, Faheem # http://academia.stackexchange.com/q/12718/285 # This question may be too specialist to be on-topic here. In which case, please feel free to transfer it to another SE site, or close, as appropriate. I am planning to publish an applied statistics paper. This paper develops an algorithm and then applies this algorithm to some data. I obtained most of this data from the site http://www.imgt.org. The data I am using are immunoglobulin and T cell receptor nucleotide sequences, in the form of FASTA files. I'm using around 200 of these. Here is an [random example][1] of the data I am using (click on [6 Sequence (FASTA format)] to get the FASTA file). Now, I have a problem. In [Warranty Disclaimer and Copyright Notice](http://www.imgt.org/Warranty.html), is written The IMGT® software and data are provided as a service to the scientific community to be used only for research and educational purposes. Individuals may print or save portions of IMGT® for their own personal use. Any other use of IMGT® material need prior written permission of the IMGT director and of the legal institutions (CNRS and Université Montpellier 2). I just heard from Prof. Marie-Paule Lefranc and she replied: I have no objection that the data you retrieved for your work from IMGT/LIGM-DB be made available to the reviewers, but unfortunately we cannot authorize a script or a distribution of the IMGT/LIGM-DB files with your code to the users. You can provide the users with the list of the IMGT/LIGM-DB accession numbers you used, with the source of the data clearly identified: (IMGT/LIGM-DB version number) and reference to NAR 2006. Well, this just made my life more difficult. To start with, I'm puzzled by this. Isn't biological data like this public domain? Is it really possible to treat immunoglobulin and T cell receptor nucleotide sequence data as proprietary information? I just wrote back and asked Prof. Lefranc what license the data was published under, which I had not done earlier. Additionally, how does one make data available to reviewers and not to users? That is awkward, to say the least. ##
Re: Biological data being used by an unpublished research paper is considered proprietary
I am not a lawyer but I don't think facts are copyrightable. In some jurisdictions there are database rights and copyright on collections of facts (like phone books) that could apply here. I suggest you consult the lawyers for your research institute for the legal situation in your jurisdiction. The script thing sounds silly and easy to work-around - fake the user-agent and put a few seconds between requests. -- bye, pabs http://wiki.debian.org/PaulWise -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/caktje6ets4u6oykxfmoe3u1dofmo8jjehiwu_sgcagv500o...@mail.gmail.com
Re: Biological data being used by an unpublished research paper is considered proprietary
On 16/09/2013 11:31, Faheem Mitha wrote: Hi, This is really not Debian-related, except insofar as the software in question is something that might have been in Debian one day. I talked about that with people on debian-med recently. So, it is technically off-topic. I posted a reply on stackexchange with instructions to get the data from the EBI SRS server. However, I have run into this issue before in the context of biological database entries and Debian so it may be worth discussing here. There were objections to including SwissProt entries in the example data for the EMBOSS package because the licensing of SwissProt does not allow them to be edited. That was resolved by agreeing that scientific facts should not be edited so that the files could be accepted as part of a Debian package even though they could not be changed. A fine compromise I feel. regards, Peter Rice EMBOSS team -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/5236f28f.2020...@yahoo.co.uk
Re: Biological data being used by an unpublished research paper is considered proprietary
It looks like both you and the site you wish to access are based in France, so please forgive this US-centric intrusion. Under US law, it may be the case that violating website terms of service is a felony crime with jail time attached. https://www.eff.org/deeplinks/2013/01/rebooting-computer-crime-law-part-1-no-prison-time-for-violating-terms-of-service Based on the scenario you describe, and if the communication passed through a system under US jurisdiction, you might be in violation of this stupid law. Jeff signature.asc Description: Digital signature
Re: Biological data being used by an unpublished research paper is considered proprietary
Jeff Epler writes: Under US law, it may be the case that violating website terms of service is a felony crime with jail time attached. The USA Federal courts have made it clear that this is not the case. As far as I know there have been no convictions or even prosecutions under this theory. A casual reading of the legislative history of this (execrable) law indicates that it was not the intent of Congress to criminalize such things as violation of TOS. -- John Hasler jhas...@newsguy.com Elmwood, WI USA -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/87li2wyeuh@thumper.dhh.gt.org