Re: Biological data being used by an unpublished research paper is considered proprietary

2013-09-25 Thread Steve Langasek
On Mon, Sep 16, 2013 at 12:59:11PM +0100, Peter Rice wrote:
 On 16/09/2013 11:31, Faheem Mitha wrote:
 This is really not Debian-related, except insofar as the software in
 question is something that might have been in Debian one day. I talked
 about that with people on debian-med recently. So, it is technically
 off-topic.

 I posted a reply on stackexchange with instructions to get the data
 from the EBI SRS server.

 However, I have run into this issue before in the context of
 biological database entries and Debian so it may be worth discussing
 here. There were objections to including SwissProt entries in the
 example data for the EMBOSS package because the licensing of
 SwissProt does not allow them to be edited. That was resolved by
 agreeing that scientific facts should not be edited so that the
 files could be accepted as part of a Debian package even though they
 could not be changed. A fine compromise I feel.

Hopefully, this is a misstatement of the actual rationale for including this
data in Debian, because it is *not* acceptable to have packages in main
containing data that we are not allowed to modify.

The real rationale is surely that, because facts are *not governed by
copyright*, any licensing claim over this data is ignorable.

-- 
Steve Langasek   Give me a lever long enough and a Free OS
Debian Developer   to set it on, and I can move the world.
Ubuntu Developerhttp://www.debian.org/
slanga...@ubuntu.com vor...@debian.org


signature.asc
Description: Digital signature


Re: Biological data being used by an unpublished research paper is considered proprietary

2013-09-25 Thread Faheem Mitha


Hi Steve,

On Wed, 25 Sep 2013, Steve Langasek wrote:


On Mon, Sep 16, 2013 at 12:59:11PM +0100, Peter Rice wrote:



On 16/09/2013 11:31, Faheem Mitha wrote:


This is really not Debian-related, except insofar as the software in 
question is something that might have been in Debian one day. I talked 
about that with people on debian-med recently. So, it is technically 
off-topic.


I posted a reply on stackexchange with instructions to get the data 
from the EBI SRS server.


However, I have run into this issue before in the context of biological 
database entries and Debian so it may be worth discussing here. There 
were objections to including SwissProt entries in the example data for 
the EMBOSS package because the licensing of SwissProt does not allow 
them to be edited. That was resolved by agreeing that scientific facts 
should not be edited so that the files could be accepted as part of a 
Debian package even though they could not be changed. A fine compromise 
I feel.


Hopefully, this is a misstatement of the actual rationale for including 
this data in Debian, because it is *not* acceptable to have packages in 
main containing data that we are not allowed to modify.


Well, I suppose you can modify the data, but then it won't be the same 
data. :-)


The real rationale is surely that, because facts are *not governed by 
copyright*, any licensing claim over this data is ignorable.


So, biological data is not actually copyrightable? Can you (or anyone 
else) give me relevant documentation about that? Apparently it may vary by 
jurisdiction. Does anyone know the rules in the EU, which seems to be what 
is relevant here, since the servers in question are in Europe?


For the record, I've gone ahead and removed the data from my repository, 
because I wasn't sure whether the person telling me not to distribute it 
had the right to do so or not. I've added a script to download the data, 
and will document it. It is not really a big deal either way, but if I had 
some definite information I could for example email this person back with 
that information.


I wonder if debian-legal would be a better place to ask this. I haven't 
asked them.


   Regards, Faheem


--
Steve Langasek   Give me a lever long enough and a Free OS
Debian Developer   to set it on, and I can move the world.
Ubuntu Developerhttp://www.debian.org/
slanga...@ubuntu.com vor...@debian.org



--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/alpine.deb.2.02.1309252213170.3...@orwell.homelinux.org



Re: Biological data being used by an unpublished research paper is considered proprietary

2013-09-25 Thread Tollef Fog Heen
]] Steve Langasek 

 The real rationale is surely that, because facts are *not governed by
 copyright*, any licensing claim over this data is ignorable.

Copyrights are not the any type of «IP» that may require
licensing. Database rights exist in Europe for instance.

-- 
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/m2bo3gsp3o@rahvafeir.err.no



Re: Biological data being used by an unpublished research paper is considered proprietary

2013-09-22 Thread Charles Plessy
 On Mon, 16 Sep 2013, Peter Rice wrote:
 
 However, I have run into this issue before in the context of
 biological database entries and Debian so it may be worth discussing
 here. There were objections to including SwissProt entries in the
 example data for the EMBOSS package because the licensing of
 SwissProt does not allow them to be edited.  That was resolved by
 agreeing that scientific facts should not be edited so that the
 files could be accepted as part of a Debian package even though they
 could not be changed. A fine compromise I feel.

Le Thu, Sep 19, 2013 at 01:50:48AM +0530, Faheem Mitha a écrit :
 
 So, what license did these files go into Debian as?

Hello Faheem and Peter,

the license page of the UniProt consortium now underlines that the CC-ND
license applies only to the copyrightable parts of its databases.

We have chosen to apply the Creative Commons Attribution-NoDerivs License to
all copyrightable parts of our databases. This means that you are free to 
copy,
distribute, display and make commercial use of these databases in all
legislations, provided you give us credit. However, if you intend to 
distribute
a modified version of one of our databases, you must ask us for permission
first.

http://www.uniprot.org/help/license

Since facts can not be copyrighted, I think that the current consensus within
Debian is that the copyright statements in the records apply to the whole
database and not to the records taken in isolation.  This means that in theory,
the copyright law does not forbid changing the sequence in individual records
distributed separately from the database.  In practice, there may be other
reasons, and I would list ethics on the top of the list, to not do so in a
misleading way.

Have a nice day,

Charles

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130922061426.ga...@falafel.plessy.net



Re: Biological data being used by an unpublished research paper is considered proprietary

2013-09-18 Thread Faheem Mitha


Hi Peter,

Thank you for your very helpful answer. Seriously, it is rare to get
such a good answer on such a topic. I actually read your response on
academia.sx before you saw your email, and I should have guessed such
a good reason would have come from a Debian person. Also, I see you
registered the same day as your answer. :-)

I'm keeping debian-devel and debian-med cc'd for now, because I do
have some general questions about biological data licensing. If the
lists want me to go away, just say so.

Since you posted your answer publicly, I'm assuming you don't mind if
I quote it. I recommend you post your answer to the Debian lists,
since there is no guarantee that academia.sx will be around forever.

See responses inline. I'm afraid there are a lot of questions, but I
really can't pass up the opportunity to get some answers for
once. Sorry about that.

If you don't want to answer my questions (and let's face it, you
probably don't) perhaps you can suggest some suitable mailing
list(s)/forum(s)?

On Mon, 16 Sep 2013, Peter Rice wrote:


On 16/09/2013 11:31, Faheem Mitha wrote:



Hi,



This is really not Debian-related, except insofar as the software in
question is something that might have been in Debian one day. I talked
about that with people on debian-med recently. So, it is technically
off-topic.



I posted a reply on stackexchange with instructions to get the data
from the EBI SRS server.



However, I have run into this issue before in the context of
biological database entries and Debian so it may be worth discussing
here. There were objections to including SwissProt entries in the
example data for the EMBOSS package because the licensing of
SwissProt does not allow them to be edited.  That was resolved by
agreeing that scientific facts should not be edited so that the
files could be accepted as part of a Debian package even though they
could not be changed. A fine compromise I feel.


So, what license did these files go into Debian as?


regards,

Peter Rice
EMBOSS team



The copyright is probably on the full database release flatfile and
the formatted entries ... you will find similar conditions for
UniProt/SwissProt so it is not so unusual.


Yes, but I'm not trying to download their entire database, just a
small portion of it.


The restrictions on scripts are common to prevent server performance
hits from a large number of requests.


Is such a restriction legally enforceable? I don't see how one can
distinguish between a human user downloading using say curl, and a
script using curl with random pauses between downloads. Or is acceding
to such a request just a matter of common courtesy?


You can simply invite reviewers to download the data from some other
server, for example from the EBI SRS server. The URL for entry
A00673 would be



http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[IMGTLIGM-ID:a00673]+-view+FastaSeqs+-ascii;


Wow, that works for me! Cool. I've tried before to download data from
other biological data web services, but have always fallen down
confused at the complexity of the sites and the multiplicity of their
options. IMGT is practically the only such site I have found which I
found I was able to navigate without getting brain fever.

http://www.ebi.ac.uk/miriam/main/collections/MIR:0287

So a few possibly dumb questions.

Question 1: Is there no general agreement on the licensing of
biological data such as that the kind we are talking about? This seems
strange. Aren't such data biological facts, as you put it in your
message? To me, it makes as much sense to try to treat the list of
prime numbers or any other such mathematical facts as proprietary
information.

Specifically, I don't understand how IMGT can claim to own this data,
to the extent of forbidding its redistribution. They didn't produce
this data themselves, did they?

Question 2: It looks like EBI is hosting a copy of the IMGT
database. Is that right?

Also, there are a lot of different kinds of accession numbers. Which
accession numbers is IMGT using here?

Also, do you know of other servers that have the same data?


You can also use a list of accessions, for example A00673 or A01650



http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?[IMGTLIGM-ID:a00673|a01650]+-view+FastaSeqs+-ascii



If downloading many entries you should pause between requests, but
putting lists into the URLs may reduce it to few enough not to cause
a problem. I doubts EBI would be upset by 200 requests - they would
be concerned about thousands.


This is *really* useful. I see each of these list requests produces
one fasta file with multiple sequences in them. I think this is be a
better way to go rather than producing hundreds of fasta files, each
containing a single sequence, as I have been doing. Also, unlike IMGT,
one justs downloads a FASTA file directly, without having to trim off
HTML stuff. I suspect that each request corresponds at the backend to
a SQL query, and if so, I'm sure the system would prefer one larger
SQL query to 

Biological data being used by an unpublished research paper is considered proprietary

2013-09-16 Thread Faheem Mitha


Hi,

This is really not Debian-related, except insofar as the software in 
question is something that might have been in Debian one day. I talked 
about that with people on debian-med recently. So, it is technically 
off-topic.


However, I thought that maybe people on these lists would have some input 
on the matter. People in Debian are very experienced in matters of 
copyright and licensing, and people in debian-med presumably know 
something about copyright/licensing of biological data.


I posted the following to 
academia.stackexchange.com, http://academia.stackexchange.com/q/12718/285


As I write there is one reply.

Summary of my SE question:

1) A distributor of biological data is claiming proprietary ownership of 
the data. This runs contrary to what I know about such data. Can anyone 
comment?


2) The distributor also says a script to download the (200) data files is 
prohibited. Saying I cannot use a script to download the data (curl in my 
case) is in IMO downright bizarre. Is expecting a user to download 200 
files manually reasonable, and how would the server tell the difference 
anyway? They're all just http requests.


Please CC me on any reply. Thanks.
 Regards, Faheem

#
http://academia.stackexchange.com/q/12718/285
#

This question may be too specialist to be on-topic here. In which
case, please feel free to transfer it to another SE site, or close, as
appropriate.

I am planning to publish an applied statistics paper. This paper develops 
an algorithm and then applies this algorithm to some data. I obtained most 
of this data from the site http://www.imgt.org. The data I am using are 
immunoglobulin and T cell receptor nucleotide sequences, in the form of 
FASTA files. I'm using around 200 of these.


Here is an [random example][1] of the data I am using (click on [6 
Sequence (FASTA format)] to get the FASTA file).


Now, I have a problem. In [Warranty Disclaimer and Copyright 
Notice](http://www.imgt.org/Warranty.html), is written



The IMGT® software and data are provided as a service to the
scientific community to be used only for research and educational
purposes. Individuals may print or save portions of IMGT® for their
own personal use. Any other use of IMGT® material need prior written
permission of the IMGT director and of the legal institutions (CNRS
and Université Montpellier 2).


I just heard from Prof. Marie-Paule Lefranc and she replied:


I have no objection that the data you retrieved for your work from
IMGT/LIGM-DB be made available to the reviewers, but unfortunately we
cannot authorize a script or a distribution of the IMGT/LIGM-DB files
with your code to the users.



You can provide the users with the list of the IMGT/LIGM-DB accession
numbers you used, with the source of the data clearly identified:
(IMGT/LIGM-DB version number) and reference to NAR 2006.


Well, this just made my life more difficult. To start with, I'm
puzzled by this. Isn't biological data like this public domain? Is it
really possible to treat immunoglobulin and T cell receptor nucleotide
sequence data as proprietary information?

I just wrote back and asked Prof. Lefranc what license the data was
published under, which I had not done earlier.

Additionally, how does one make data available to reviewers and not to
users? That is awkward, to say the least.

##

Re: Biological data being used by an unpublished research paper is considered proprietary

2013-09-16 Thread Paul Wise
I am not a lawyer but I don't think facts are copyrightable. In some
jurisdictions there are database rights and copyright on collections
of facts (like phone books) that could apply here. I suggest you
consult the lawyers for your research institute for the legal
situation in your jurisdiction.

The script thing sounds silly and easy to work-around - fake the
user-agent and put a few seconds between requests.

-- 
bye,
pabs

http://wiki.debian.org/PaulWise


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/caktje6ets4u6oykxfmoe3u1dofmo8jjehiwu_sgcagv500o...@mail.gmail.com



Re: Biological data being used by an unpublished research paper is considered proprietary

2013-09-16 Thread Peter Rice

On 16/09/2013 11:31, Faheem Mitha wrote:


Hi,

This is really not Debian-related, except insofar as the software in
question is something that might have been in Debian one day. I talked
about that with people on debian-med recently. So, it is technically
off-topic.


I posted a reply on stackexchange with instructions to get the data from 
the EBI SRS server.


However, I have run into this issue before in the context of biological 
database entries and Debian so it may be worth discussing here. There 
were objections to including SwissProt entries in the example data for 
the EMBOSS package because the licensing of SwissProt does not allow 
them to be edited. That was resolved by agreeing that scientific facts 
should not be edited so that the files could be accepted as part of a 
Debian package even though they could not be changed. A fine compromise 
I feel.


regards,

Peter Rice
EMBOSS team


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/5236f28f.2020...@yahoo.co.uk



Re: Biological data being used by an unpublished research paper is considered proprietary

2013-09-16 Thread Jeff Epler
It looks like both you and the site you wish to access are based in
France, so please forgive this US-centric intrusion.

Under US law, it may be the case that violating website terms of service
is a felony crime with jail time attached.
https://www.eff.org/deeplinks/2013/01/rebooting-computer-crime-law-part-1-no-prison-time-for-violating-terms-of-service

Based on the scenario you describe, and if the communication passed
through a system under US jurisdiction, you might be in violation of
this stupid law.

Jeff


signature.asc
Description: Digital signature


Re: Biological data being used by an unpublished research paper is considered proprietary

2013-09-16 Thread John Hasler
Jeff Epler writes:
 Under US law, it may be the case that violating website terms of
 service is a felony crime with jail time attached.

The USA Federal courts have made it clear that this is not the case.  As
far as I know there have been no convictions or even prosecutions under
this theory.  A casual reading of the legislative history of this
(execrable) law indicates that it was not the intent of Congress to
criminalize such things as violation of TOS.
-- 
John Hasler 
jhas...@newsguy.com
Elmwood, WI USA


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87li2wyeuh@thumper.dhh.gt.org