Dear Finn, What you are doing sounds very interesting -- and you are not alone in your wishing to extract and aggregate data from scientific publications which can then be distributed openly. For example, Peter Murray-Rust Chem-Informatics group at the University of Cambridge does similar things in chemistry (see e.g. the Crystal Eye project [1]).
[1]:http://www.ckan.net/package/read/crystal-eye Regarding your specific questions: 1. I think you are right that your aggregated material may count as a 'Database' when aggregated. 2. The key question is regarding the original data in the publications. In my view (but I should warn IANAL) it is doubtful that the small sets of 'factual data' in individual papers is copyrightable or protected as a DB (and even if it was your extraction might count as 'small'). However, I would also note that some publishers do explicitly claim control of data (e.g. the American Chemical Society I believe). So you've got two options here: 1. Continue as you are. Even if eventually someone gets upset you can either argue with them then or take material out (this is the 'seek forgiveness not permission' approach). One advantage of this route is that you already have a useful tool by the time any debate starts which can make quite a difference to the outcome ... 2. You can try and tell publishers what you are doing in advance and see what happens. At this point I should mention a project being developed by the Working Group on Open Data in Science here at the Foundation (<http://okfn.org/wiki/wg/science>). Entitled 'Is It Open Data', it's a service to make it easy for people (scientists especiallhttp://www.opendatacommons.org/licenses/fil/y) to make enquiries to publishers (and others) about the openness of the scientific data they hold -- and to record publicly the results of those efforts. You can see the current (very alpha) version here: <http://isitopen.ckan.net/> The FAQ/Guide may be particularly relevant given your situation: <http://isitopen.ckan.net/guide/> Regards, Rufus 2009/2/26 Finn Aarup Nielsen <[email protected]>: > I have recently started a wiki with scientific data and text in > neuroscience: Brede Wiki, http://neuro.imm.dtu.dk/wiki/ > > I started with triple licenses of the share-alike type: GPL, GFDL and > CC-by-sa: http://neuro.imm.dtu.dk/wiki/Brede:Copyrights > > After browsing the Open Knowledge web-site and following links > to licenses it seems to me that the situation is more complex. > > I manually extract data from scientific articles (more precisely results > from statistical analyses in neuroimaging experiments) and encode them in > MediaWiki templates so my Brede Wiki now contains content like "{{Brain > volume | n = 1 | region = Left hippocampus | mean = 0.940 | std = 0.208 | > unit = cm3 | group = Major depression patients }}", see the wiki page: > > http://neuro.imm.dtu.dk/wiki/Hippocampal_volume_reduction_in_major_depression > > Such data would typically be found in tables of the scientific paper. Some > of the papers are CC-by, but most are copyrighted by commercial > publishers. > > I have thought that such data would be "facts" or "measurement" on nature, > not be subjected to copyright, but that I and other wiki-contributors > would be able to gain Database Rights when they become aggregated with > other results. I have seen the "Open Database License" which seems > appropriate for distributing the database. However, it is not clear to me > whether "left hippocampus 0.940" constitutes a copyrightable entity (a > creative work?) belonging to the publisher or it can be considered a fact > falling in under "Open Data Commons - Factual info licence". The worst > case would be that publishers regard this data as under their copyright > and regard its presentation on the web-site and its copylefted > distribution as a violation. > > Any thought on this? I have just seen that there is some discussion in the > paper: > http://events.linkeddata.org/ldow2008/papers/08-miller-styles-open-data-commons.pdf > > > /Finn > ___________________________________________________________________ > > Finn Aarup Nielsen, DTU Informatics, Denmark > Lundbeck Foundation Center for Integrated Molecular Brain Imaging > http://www.imm.dtu.dk/~fn/ http://nru.dk/staff/fnielsen/ > ___________________________________________________________________ _______________________________________________ okfn-discuss mailing list [email protected] http://lists.okfn.org/cgi-bin/mailman/listinfo/okfn-discuss
