On Aug 27, 2012, at 5:37 PM, Craig James wrote: > I'm not a lawyer, but ... There is a difference between the BindDB > data and what the users enter. The terms under which the data are > licensed have nothing to do with who owns the user-entered queries. I > looked around the BindDB web site and couldn't find a privacy policy > anywhere.
The hardest part about getting that data set was to find a place which 1) had the data and 2) which was completely willing to release it. Michael Gilson specifically said it was no problem, that there was no assurance of privacy on the site, and that they had no concerns in releasing the data to me for this project. > If I were a user, I'd assume that what I entered was > private unless the site's privacy policy explicitely said otherwise. Why would you assume that? Every company I've worked for or consulted for has specifically said that internal structures are never to be sent out of the organization, excepting where certain agreements, which spell out what can be done with the data, are in place. This was true even when I was doing bioinformatics work in 1998, so it's nothing new. There are very few limits on what a US organization can do with your data. Privacy policies exist because the organization is voluntarily limiting what they will do with your information, in exchange for more trust, information, or money from you. The limits I know of apply to personal information. SMILES strings are not personal, nor covered under copyright (that I can tell), so I know of no legal restriction to prevent BindingDB from doing what they did. There are of course non-legal reasons. eMolecules, as a search provider, would not want to do this because you all know that you might engender bad trust from your clients. After all, people will violate the corporate guidelines against revealing internal data on public sites. So even if legal, I can see why you would not want to do this. But BindingDB is supported by NIH grant R01GM070064, and not financially by the users of the site. Hence I believe they are more buffered from the handful of people who might protest. Of course, then they would need to reveal to their in-house people that they broke the policy... There are two well-known examples of publicly released query sets which ended up with troubles. Both were problematical because the anonymized data could still be de-anonymized to reveal personal information. These are: http://en.wikipedia.org/wiki/Netflix_Prize#Privacy_concerns http://en.wikipedia.org/wiki/AOL_search_data_leak Neither are relevant to the BindingDB data. I don't even know the year when the queries were done, much less the source IP address. (Although the data looks to be time ordered, so there might be some hint of time information.) > There may be a legal precedent somewhere: if a web site has no policy, > does everything the user types (or draws) automatically become public > domain? I doubt it. You are confusing privacy with copyright. If I sketch a structure in Marvin, which Marvin converts to a SMILES string, then do I own the copyright to that SMILES string? No, I don't believe it does. Copyright doesn't cover that case. Just like copyright doesn't cover trademarks, or personal names. Do you think that the SMILES strings "exhibit the minimal creativity required for copyright protection"? They don't contain medical information restricted under HIPPA. They don't disclose video tape rental or sale records or the like, so aren't covered under the Video Privacy Protection Act. And so on. Nothing I know of makes this private or restricted information. Oh, and here's another example. I published information about some of the search queries people used to get to my web site: http://www.dalkescientific.com/writings/diary/archive/2007/12/23/navel_gazing.html I am not the only one who does this. Surely these queries are not covered under copyright or privacy protection. I have no privacy statement on my web site. Do you believe the contents of the "referer" line, which your browser by default sends to each and every server, are required under law to be treated as private by the people who run the server? > The very fact that someone entered a particular structure can be > highly revealing, even if you don't know who submitted the structure. Yes. Which is why you're not supposed to do that to untrusted sites. And by default, everyone is untrusted. > Before you release this data (if you haven't already), you might want > to ensure that the users of BindDB understood that their queries might > some day become public. At eMolecules, we have an absolute policy > that no query will ever be revealed. Without it, we would be blocked > at every major pharma and biotech company in the world. This isn't > speculation ... they've told us so (and in several cases, actually > blocked us until their legal department was assured of our policy and > reputation). Yes, but these are very different circumstances. You want your pharma clients to come to your site and pay you money. BindingDB wants to collect and distribute binding data, and make more data publicly available. In any case, if I read you correctly, since BindingDB doesn't have an established policy, shouldn't the major pharmas and biotechs already be blocking access to their site? So what's the problem? Who is going to get mad? What are the possible consequences to me or to BindingDB? What are the advantages to either of us for retracting those data sets? Cheers, Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Blueobelisk-discuss mailing list Blueobelisk-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss