hi Peter, If many knowledge projects are advancing our knowledge through the means that you have described, surely there are others than the one you started yesterday? Can you provide a list or literature review of such studies?
My OA APC study uses data from different sources that do not have a common set of terms: dataverse.scholarsportal.info/dataverse If we had to restrict data collection to CC-BY licensed works this research could not be done, and to the extent it could be done, publishers who do not want us to study them could easily opt out by not using CC-BY licenses on the pages where this information is found. In other words CC-BY licenses raise issues for data collection analysis. I would like to note some methodological concerns with such the approach described by PMC (automatically gathering data from tables).Taking data from different studies without fully accounting for difference in methods (eg definition or measurement) could easily lead to false conclusions. Worse, such false conclusions would be highly replicable leading to false confidence in results, ie anyone could repeat the same mistakes and come to the same conclusion of unknown external validity. For the 2016/17 OA APC dataset I am adding a "providence" column because the data in the 2016 APC column comes from different researchers with some differences in data collection. Even in a single dataset, to analyze one needs to understand when you are comparing apples with apples or macintoshes with Spartans. Automating data analysis without full comprehension of the data strikes me as problematic. best, Heather Morrison -------- Original message -------- From: Peter Murray-Rust <pm...@cam.ac.uk> Date: 2017-01-24 4:27 AM (GMT-05:00) To: "Global Open Access List (Successor of AmSci)" <goal@eprints.org> Subject: Re: [GOAL] How much of the content in open repositories is able to meet the definition of open access? There are many activities where CC BY or a more liberal licence (CC 0) is the only way that modern science can be done. Many knowledge-based projects in science , technology, medicine, use thousands of documents a day to extract and publish science. (We started one yesterday at https://github.com/ContentMine/cm-ucl/ to extract data from tables in PDF. This will aim to analyse 1000 papers per day - and that limit is set by the licences - if we were allowed we could index 10,000 papers/day in all disciplines. To do reproducible science it is critical that the raw data (in this case scientific articles) are made publicly available so that others can reproduce the work. Any friction such as writing to the author, reading a non-standard licence, etc. makes the project impossible. We are often limited to using the Open subset (CC BY) in EuropePMC. We cannot afford to put a single CC NC, CC ND, "unlicensed freely available" manuscript in the repository in case we are sent a take-down notice. That would destroy the whole experiment. These experiments are part of the science of the future. If we had been allowed to use them it is liklely that the Ebola outbreak in Liberia would have been predicted (The Liberian government's assessment, not mine). Whether it would have been prevented we don't know, but at least it would not have been impeded by copyright and paywalls. Put simply. Unless the scientific material is CC BY or CC 0 we cannot use it for knowledge-driven STM. I have estimated that the opportunity cost of this can run into billions of dollars. Repositories do not work for science. They are fragmented, non-interoperable and covered with prohibitions on automatic re-use. I have not met scientists who are systematically using institutional repositories of data mining. It seems that the desire of arts, humanities are in direct conflict with the needs of STM. I note that there are few scientists posting on this list. Maybe this division should be recognised and the STM community should continue with its own policies og CC BY and the rest use whatever commonality they can achieve. There are no simple solutions where the law is concerned. Only CC BY gives certainty. CC NC and CC ND may be valuable for A+H but they are very difficult to operate in any area of endeavour. I was told 12 years ago on this list that I should be patient and the Green program would deliver universal access and then I could start mining the literature. I have been patient but it hasn't happened. I am told that OpenAIRE still doesn't expose full-text. We should recognize it and look for alternative solutions. On Mon, Jan 23, 2017 at 7:55 PM, Heather Morrison <heather.morri...@uottawa.ca<mailto:heather.morri...@uottawa.ca>> wrote: With all due respect to the people who created and shared the "how open is it" spectrum tool, I find some of the underlying assumptions to be problematic. For example the extreme of closed access assumes that having to pay subscriptions, membership, pay per view etc. is the far end of closed. My perspective is that the opposite of open is closure of knowledge. Climate change denied, climate scientists muzzled, fired or harassed, climate change science defunded, climate data taken down and destroyed, deliberate spread of misinformation. This is not a moot point. This end of the spectrum is a reality today, one that is far more concerning for many researchers than pay walls (not that I support paywalls). Fair use in listed in a row named closed access. I argue that fair use / fair dealing is essential to academic work and journalism, and must apply to all works, not just those that can be subject to academic OA policy. There is an underlying assumption about the importance and value of re-use / remix that omits any discussion of the pros, cons, or desirability of re-use / remix that I argue we should be having. Earlier today I mentioned some of the potential pitfalls. Now I would like to two potential pitfalls: mistranslation and errors in instructions for dangerous procedures. There are dangers of poor published translations to knowledge per se (ie introduce errors) and to the author's reputation, ie an author could easily be indirectly misquoted due to a poor translation. There are good reasons why some authors and journals hesitate to grant downstream translations permissions. Reader side translations (eg automated translation tools) are not the same as downstream published translations, although readers should be made aware of the current limitations of automated translation. If people are copying instructions for potentially dangerous procedures (surgery, chemicals, engineering techniques), and they are not at least as expert as the original author, it might be in everyone's best interests if downstream readers are not invited and encouraged to manipulate the text, images, etc. In creative works, eg to prepare a horror flick, by all means take this and that, mix it together and create something new and intriguing. I am not convinced that the same arguments ought to apply to works that might guide procedures in a real hospital operating room. I suggest the "how open is it" spectrum is a useful exercise that has served a purpose for some but not a canon for all to adhere to. best, Heather Morrison -------- Original message -------- From: David Prosser <david.pros...@rluk.ac.uk<mailto:david.pros...@rluk.ac.uk>> Date: 2017-01-23 2:16 PM (GMT-05:00) To: "Global Open Access List (Successor of AmSci)" <goal@eprints.org<mailto:goal@eprints.org>> Subject: Re: [GOAL] How much of the content in open repositories is able to meet the definition of open access? I rather like the ‘How open is it?’ tool that approaches this as a spectrum: http://sparcopen.org/our-work/howopenisit/ I may be quite ‘hard line’, but I acknowledge that by moving along the spectrum a paper, monograph, piece of data (or whatever) becomes more open - and more open is better than less open. If the funders have gone to the far end of the spectrum it is perhaps because they feel that the greatest benefits are there, not because they have been convinced that they have to follow the strict, ‘hard line’ definition of open access. David On 23 Jan 2017, at 18:30, Richard Poynder <richard.poyn...@gmail.com<mailto:richard.poyn...@gmail.com>> wrote: Hi Marc, You say: "I certainly qualify as an OA advocate, and as such: I don’t equate OA with CC BY (or any CC license); in fact, I’m a little bit tired of discussions about what 'being OA' means." I hear you, but I think the key point here is that OA advocates (perhaps not you, but OA advocates) are successfully convincing a growing number of research funders (e.g. Wellcome Trust, RCUK, Ford Foundation, Hewlett Foundation, Gates Foundation etc.) that CC BY is the only acceptable form of open access. So however tired you and Stevan might be of discussing it, I believe there are important implications and consequences flowing from that. Richard Poynder On 23 January 2017 at 16:31, Couture Marc <marc.cout...@teluq.ca<mailto:marc.cout...@teluq.ca>> wrote: Hi all, Just to be clear, my position on the basic issue here. I certainly qualify as an OA advocate, and as such : - I don’t equate OA with CC BY (or any CC license); in fact, I’m a little bit tired of discussions about what “being OA” means. - I work to help increase the proportion of gratis OA, still much too low. - I try to convince my colleagues that CC BY is the best way to disseminate scientific/scholarly works and make them useful. I favour CC BY over the restricted versions (mainly -NC) because I find the arguments about potentially unwanted or devious uses far less compelling than those about the advantages of unrestricted uses and the drawbacks of restrictions that can be much more stringent than they seem at first glance. Like Stevan said, OA advocates are indeed a plurality. The opposite would bother me. Marc Couture _______________________________________________ GOAL mailing list GOAL@eprints.org<mailto:GOAL@eprints.org> http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal -- Richard Poynder www.richardpoynder.co.uk<http://www.richardpoynder.co.uk/> _______________________________________________ GOAL mailing list GOAL@eprints.org<mailto:GOAL@eprints.org> http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal _______________________________________________ GOAL mailing list GOAL@eprints.org<mailto:GOAL@eprints.org> http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal -- Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069
_______________________________________________ GOAL mailing list GOAL@eprints.org http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal