Re: Validation of posted archives
on Thu, 22 Mar 2001 Greg Kuperberg g...@math.ucdavis.edu wrote: On Wed, Mar 21, 2001 at 06:18:42PM -0500, Albert Henderson wrote: In short, I would not be so sure that LANL's service is not filled with rubbish. It takes some chutzpah for an outsider to speculate that established, self-respecting authors are writing rubbish. Indeed there is no good reason to speculate at all, since it's all out in the open. For example here are the 19 articles (+ 5 cross-listings) in the geometric topology category in the math arXiv in February: http://front.math.ucdavis.edu/math.GT/0102 Which ones are rubbish? The one by Alexander Dranishnikov? The two by Stavros Garoufalidis and Jerome Levine? The one by Hugh Morton? I know these people. Whatever shortcoming of their work you have in mind, I'd be happy to let them know. Time will tell. All papers can't be wonderful. In the classic article on the value of comprehensive reviews, Conyers Herring reported his study of published articles in solid state physics. Only half retained value after 5 years. Some were found to be in error or duplicating other work. Other studies of the literature report similar results. A task force at McGill rejected the majority of studies it reviewed as bad science. There is a distribution of quality in every field, of course -- a social phenomenon. Take heart. Herring recognized, the literature is not all garbage: There is a lot of gold. He also pointed out that primary papers can be distilled to a 10th of their original bulk in reviews. PHYSICS TODAY 21,9:27-33 Sept 1968 Best wishes, Albert Henderson 70244.1...@compuserve.com
Re: Validation of posted archives
On Tue, 27 Mar 2001, Albert Henderson wrote: Time will tell. All papers can't be wonderful. In the classic article on the value of comprehensive reviews, Conyers Herring reported his study of published articles in solid state physics. Only half retained value after 5 years. Some were found to be in error or duplicating other work. Other studies of the literature report similar results. A task force at McGill rejected the majority of studies it reviewed as bad science. There is a distribution of quality in every field, of course -- a social phenomenon. Take heart. Herring recognized... Color this herring red. Have we forgotten what's at issue here? Albert was claiming that the Archive literature is not of the quality of the pubished literature. Shall I count the many sources of piscine fragrance here? Two will do: Red Herring #1 [Albert's]: Albert replies with data on the (low) quality of the PUBLISHED literature. Irrelevant! We were talking about alleged quality differences between the for-free Archives and the for-fee journals. We will settle for the exact same (low) quality literature in both, thank you very much! Red Herring #2 [Greg's]: Greg was touting the quality of Archive contents, forgetting (again) that those papers are the very same ones that eventually appear in journals. It's not unrefereed self-publishing, it is the self-archiving of both pre-refereeing preprints and postrefereeing (published) postprints. Can we get back to something more important, namely, the self-serving editorial in Science (to which I am currently busy penning a reply): http://www.sciencemag.org/cgi/content/full/291/5512/2318b Stevan Harnad har...@cogsci.soton.ac.uk Professor of Cognitive Sciencehar...@princeton.edu Department of Electronics and phone: +44 23-80 592-582 Computer Science fax: +44 23-80 592-865 University of Southampton http://www.ecs.soton.ac.uk/~harnad/ Highfield, Southamptonhttp://www.princeton.edu/~harnad/ SO17 1BJ UNITED KINGDOM NOTE: A complete archive of the ongoing discussion of providing free access to the refereed journal literature online is available at the American Scientist September Forum (98 99 00 01): http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html You may join the list at the site above. Discussion can be posted to: american-scientist-open-access-fo...@amsci.org
Re: Validation of posted archives
(I posted this Wednesday but it does not seem to have appeared. Here it is belatedly. -- SH) List-Post: goal@eprints.org List-Post: goal@eprints.org Date: Wed, 21 Mar 2001 18:19:14 + (GMT) From: Stevan Harnad har...@cogprints.soton.ac.uk To: September 1998 American Scientist Forum american-scientist-open-access-fo...@listserver.sigmaxi.org Subject: Re: Validation of posted archives On Wed, 21 Mar 2001, Guillermo Julio Padron Gonzalez wrote: The name of a journal is part of the validation of a published paper. We all use the rigorousness of the peer review and the editorial criteria- of the journals to judge about the validity of a published paper. I agree that there can be exceptions, but they are just that: exceptions. It is clear that nobody has the time or the willingness to dive into each paper to find out whether it is the final version of a validated paper or it is just electronic garbage. The fact is that a non-administered archiving system may cause a proliferation of non-validated, duplicated, misleading and even fraudulent information in the web and there will be no way to identify the valid information, so the readers will go to validating sites, v. g. the publisher site. Unless OAI included some kind of validation... You are COMPLETELY on the wrong track. I am in Gatwick Airport, headed for a meeting in Florence, so can only give the briefest of replies: There are currently at least 20K+ refereed journals, with at least 2,000,000 refereed articles annually (this estimate could be as much as an order of magnitude too low!). Most of those 2,000,000+ refereed articles are currently inaccessible to most of researchers, across disciplines, all over the planet, including the authors of those 2,000,000 articles. (Read my words carefully, I am weighing them as I write them: I said most of those articles are inaccessible to most of its potential readers.) I am not speaking for the Open Archives Initiative (OAI), which is much broader than what we are discussing (it is providing a convention for metadata tagging that will make all OAI-compliant Archives interoperable, whether or not their texts are refereed, whether or not their texts are free, whether or not their texts are journal material; and so far OAI is for metadata, not the full texts themselves). I am speaking only for a SUBSET of the Open Archives Initiative, namely, the Self-Archiving Initiative. It is for this initiative that the eprints.org free software for creating OAI-compliant Eprint Archives was created. The objective of this initiative is to free a SUBSET of the world's on-paper and on-line literature, a TINY subset, but precisely the one I mentioned above: Those 2,000,000+ annual articles in the world's 20K+ refereed journals. Now follow the logic: To MOST of the planet's would-be users of that literature, MOST of those articles are currently inaccessible, because they or their institutions cannot afford to pay the Subscription/Site-License/Pay-Per-View (S/L/P) fees that would give them access to it all. It is for these would-be users that the author/institution self-archiving is being done, and also for the authors of all that literature, who lose a vast quantity of potential impact for their research findings because it is inaccessible to so many of its would-be users (readers, citers, replicators, extenders). For this vast population, a free, author-self-archived corpus NOW would be an incalculable benefit. The authentication/validation/protection you speak about can come later, once the 2,000,000+ papers are up there, online and free, for all these currently disenfranchised potential users. This is NOT the time to worry about such things. Kindly see the paper I linked in my prior reply. The link is to short answers to precisely questions like yours, questions that have been repeatedly raised and replied to in this Forum. They are prima facie questions. It is natural to raise them. They are answered in that paper by number. And let me close with the Los Alamos Lemma: about 50,000 Physicists have already had the good sense to self-archive 150,000 of their papers to date in the Physics Archive without worrying for one microsecond about the concern you raised. Look at the colossal usage figures for that Archive, and ask yourself whether all those users (for 10 years now) would have been better off without access until the day when the problem you raise has been solved in advance. The Los Alamos Lemma is that any worry or objection that did not hold back the Physicists from self-archiving should not be holding back any of the rest of the disciplines either. Amen. I must fly to Florence now. Stevan Harnad har...@cogsci.soton.ac.uk Professor of Cognitive Sciencehar...@princeton.edu Department of Electronics and phone: +44 23-80 592-582 Computer Science fax: +44 23-80 592-865
Validation of posted archives
Validation/authentication seems to be an area where sociology rather than technology has been relied on. Might I point out to this forum that about a year ago, my accumulated conventional publication list since around 1995 had mostly migrated to (or had always been in) electronic form, and I decided to download all of the available reprint files from the various journals (for which we had site licesenes to do so I add). This amounted to around 25 Acrobat files (yes, they were all Acrobat). I decided to see if any of them were validatable in a digital sense; a technology that Adobe have in fact built into Acrobat via so called X.509 certificates. None were. Indeed, any validation (really authentication, see below) there was was often associated with the production company, which is of course a sub-contractor to the publisher. Most of the Acrobat files also had no security settings, ie they were readily editable. Several publishers I phoned admitted that no form of digital authentication was being applied; worse they seem unaware that it could be applied. Whilst I am prepared to believe any current problem with validation and authenticity is tiny, we all thought that about computer viruses ten yeara ago. Few would think so now. I might add that two of our last articles have been in XML form, and that these have in fact been digitally signed as both authentic and valid (see below) using X.509 certificates. To prove the point, my X.509 certificate is attached with this email to prove its authenticity! The destination of the article mentioned above is in fact as supplemental data rather than the primary published article, but by so signing, our article at least can be authenticated as coming from us, and that it was created on a given date, and has not been changed since, and furthermore that it can be assumed to be valid XML. I have alluded above at the difference in meaning between validation and authentication, since I suspect the two words sometimes are used interchangeably. Authentication is the ability to verify that a document/assertion has been created by the authority to whom it is attributed and that it is uncorrupted after its creation Validation is the ability to show that a specified validation process has been correctly carried out; for example that the carbon alencies in a specified molecule all are four, or that say an XML document has the correct form. The latter is of course far more significant to science in the long term, but also far more difficult to implement. -- Henry Rzepa. +44 (0)20 7594 5774 (Office) +44 (0870) 132-3747 (eFax) Dept. Chemistry, Imperial College, London, SW7 2AY, UK. http://www.ch.ic.ac.uk/rzepa/ smime.p7s Description: S/MIME Cryptographic Signature
Re: Validation of posted archives
When I said validation, I meant the process by which the information is validated, and so far, the best way to validate the information in scholar publishing is peer-review. My doubts were in the field of mistake/misconduct, so that the reader would find too much information that is duplicated, inaccurate, or fraudulent. Anyway, your point adds another edge to this discussion. At the end, it would be great if people could readily know the level of legitimacy and reliability of a piece of information posted in Internet. Guillermo -Original Message- From: Rzepa, Henry [mailto:h.rz...@ic.ac.uk] Sent: Thursday, 22 March, 2001 03:23 AM To: american-scientist-open-access-fo...@listserver.sigmaxi.org Subject: Validation of posted archives Validation/authentication seems to be an area where sociology rather than technology has been relied on. Might I point out to this forum that about a year ago, my accumulated conventional publication list since around 1995 had mostly migrated to (or had always been in) electronic form, and I decided to download all of the available reprint files from the various journals (for which we had site licesenes to do so I add). This amounted to around 25 Acrobat files (yes, they were all Acrobat). I decided to see if any of them were validatable in a digital sense; a technology that Adobe have in fact built into Acrobat via so called X.509 certificates. None were. Indeed, any validation (really authentication, see below) there was was often associated with the production company, which is of course a sub-contractor to the publisher. Most of the Acrobat files also had no security settings, ie they were readily editable. Several publishers I phoned admitted that no form of digital authentication was being applied; worse they seem unaware that it could be applied. Whilst I am prepared to believe any current problem with validation and authenticity is tiny, we all thought that about computer viruses ten yeara ago. Few would think so now. I might add that two of our last articles have been in XML form, and that these have in fact been digitally signed as both authentic and valid (see below) using X.509 certificates. To prove the point, my X.509 certificate is attached with this email to prove its authenticity! The destination of the article mentioned above is in fact as supplemental data rather than the primary published article, but by so signing, our article at least can be authenticated as coming from us, and that it was created on a given date, and has not been changed since, and furthermore that it can be assumed to be valid XML. I have alluded above at the difference in meaning between validation and authentication, since I suspect the two words sometimes are used interchangeably. Authentication is the ability to verify that a document/assertion has been created by the authority to whom it is attributed and that it is uncorrupted after its creation Validation is the ability to show that a specified validation process has been correctly carried out; for example that the carbon alencies in a specified molecule all are four, or that say an XML document has the correct form. The latter is of course far more significant to science in the long term, but also far more difficult to implement. -- Henry Rzepa. +44 (0)20 7594 5774 (Office) +44 (0870) 132-3747 (eFax) Dept. Chemistry, Imperial College, London, SW7 2AY, UK. http://www.ch.ic.ac.uk/rzepa/
Re: Validation of posted archives
Date: Wed, 21 Mar 2001 09:26:46 -0500 From: Guillermo Julio Padron Gonzalez guillermo.pad...@cigb.edu.cu From your mail I see there is no administrator of files posted in Internet using OAI. May be it has been already discussed in the Forum--I am sorry if it indeed was--but I have two questions: 1. How can a reader differentiate a non-validated--non peer-reviewed--archive from a validated peer-reviewed version? There is a metadata category refereed vs. unrefereed. Also Journal Name, etc. http://www.eprints.org/ 2. How can this system avoid the possibility of charlatans posting their non peer-reviewed or even rejected papers using OAI? As far as I know, any person outside the journal/publishers sites can post them. It is self-archiving, so in principle I can post someone else's article as my own (plagiarism), or can post my own and call it refereed when it is not, or can post an inaccurate version of the final draft. All this is easily monitored and checked, if anyone wants to set up a system to do so, but it is not necessary! The archive of record for refereed papers, for the time being, is the publisher's paper version, in libraries the world over. The self-archived version is merely FREEING these papers online, for one and all. Peer review continues to be implemented by journals. If one retrieves un unrefereed paper, caveat emptor. And the incentive to plagiarize or to misclassify one's own work does not have much force behind it. See: http://www.ecs.soton.ac.uk/~harnad/Tp/resolution.htm#8. Stevan Harnad har...@cogsci.soton.ac.uk Professor of Cognitive Sciencehar...@princeton.edu Department of Electronics and phone: +44 23-80 592-582 Computer Science fax: +44 23-80 592-865 University of Southampton http://www.ecs.soton.ac.uk/~harnad/ Highfield, Southamptonhttp://www.princeton.edu/~harnad/ SO17 1BJ UNITED KINGDOM NOTE: A complete archive of the ongoing discussion of providing free access to the refereed journal literature online is available at the American Scientist September Forum (98 99 00 01): http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html You may join the list at the site above. Discussion can be posted to: american-scientist-open-access-fo...@amsci.org
Re: Validation of posted archives
Dear Stevan: The name of a journal is part of the validation of a published paper. We all use the rigorousness of the peer review and the editorial crite-ria of the journals to judge about the validity of a published paper. I agree that there can be exceptions, but they are just that: exceptions. It is clear that nobody has the time or the willingness to dive into each paper to find out whether it is the final version of a validated paper or it is just electronic garbage. The fact is that a non-administered archiving system may cause a proliferation of non-validated, duplicated, misleading and even fraudulent information in the web and there will be no way to identify the valid information, so the readers will go to validating sites, v. g. the publisher site. Unless OAI included some kind of validation... Regards, Guillermo -Original Message- From: Stevan Harnad [mailto:har...@coglit.ecs.soton.ac.uk] Sent: Wednesday, 21 March, 2001 10:36 AM To: american-scientist-open-access-fo...@listserver.sigmaxi.org Subject: Re: Validation of posted archives Date: Wed, 21 Mar 2001 09:26:46 -0500 From: Guillermo Julio Padron Gonzalez guillermo.pad...@cigb.edu.cu From your mail I see there is no administrator of files posted in Internet using OAI. May be it has been already discussed in the Forum--I am sorry if it indeed was--but I have two questions: 1. How can a reader differentiate a non-validated--non peer-reviewed--archive from a validated peer-reviewed version? There is a metadata category refereed vs. unrefereed. Also Journal Name, etc. http://www.eprints.org/ 2. How can this system avoid the possibility of charlatans posting their non peer-reviewed or even rejected papers using OAI? As far as I know, any person outside the journal/publishers sites can post them. It is self-archiving, so in principle I can post someone else's article as my own (plagiarism), or can post my own and call it refereed when it is not, or can post an inaccurate version of the final draft. All this is easily monitored and checked, if anyone wants to set up a system to do so, but it is not necessary! The archive of record for refereed papers, for the time being, is the publisher's paper version, in libraries the world over. The self-archived version is merely FREEING these papers online, for one and all. Peer review continues to be implemented by journals. If one retrieves un unrefereed paper, caveat emptor. And the incentive to plagiarize or to misclassify one's own work does not have much force behind it. See: http://www.ecs.soton.ac.uk/~harnad/Tp/resolution.htm#8. Stevan Harnad har...@cogsci.soton.ac.uk Professor of Cognitive Sciencehar...@princeton.edu Department of Electronics and phone: +44 23-80 592-582 Computer Science fax: +44 23-80 592-865 University of Southampton http://www.ecs.soton.ac.uk/~harnad/ Highfield, Southamptonhttp://www.princeton.edu/~harnad/ SO17 1BJ UNITED KINGDOM NOTE: A complete archive of the ongoing discussion of providing free access to the refereed journal literature online is available at the American Scientist September Forum (98 99 00 01): http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html You may join the list at the site above. Discussion can be posted to: american-scientist-open-access-fo...@amsci.org
Re: Validation of posted archives
On Wed, 21 Mar 2001, Guillermo Julio Padron Gonzalez wrote: The name of a journal is part of the validation of a published paper. We all use the rigorousness of the peer review and the editorial crite-ria of the journals to judge about the validity of a published paper. I agree that there can be exceptions, but they are just that: exceptions. It is clear that nobody has the time or the willingness to dive into each paper to find out whether it is the final version of a validated paper or it is just electronic garbage. The fact is that a non-administered archiving system may cause a proliferation of non-validated, duplicated, misleading and even fraudulent information in the web and there will be no way to identify the valid information, so the readers will go to validating sites, v. g. the publisher site. Unless OAI included some kind of validation... I hope you do not mind me adding to this discussion. If I may clear up perhaps a confusion about the protocol OAI: OAI is a protocol for the distribution of Metadata, much the same as TCP/IP is a protocol used by the Internet to distribute information. I would no more expect OAI to provide me with guarantees about the content than I would TCP/IP about this email. (As an aside, OAI does not provide any facility for the distribution of full-text papers (it can merely distribute 'pointers' to papers).) Therefore the validation, or otherwise, of papers and their heritage rests with the application(s) that use OAI. As an example of an Open Archive that has had ample opportunity to be filled with rubbish; (correct me if I am quoting wrong), arXiv has, in its ten years, only had to delete 2 papers out of 160,000. This would suggest that either arXiv has a very efficient staff or this is not really a problem (or, as I suspect, both). Your suggestion, to me, does seem a rational one (and indeed currently exists between arXiv and the APS - I believe the APS will accept submissions using arXiv papers), that there are archives of pre-print papers which are then picked up by validating services (i.e publishers) which then repackage archives into validated subject/editorial content. It would then be your choice as to whether you use the e-Print server or the packaged (and pay-for) service of Publishers, and naturally the effect of the publisher service would be to improve the e-Print content (... invisible hand of peer review). All the best, Tim Brody.
Re: Validation of posted archives
On Wed, Mar 21, 2001 at 06:11:11PM +, Tim Brody wrote: As an example of an Open Archive that has had ample opportunity to be filled with rubbish; (correct me if I am quoting wrong), arXiv has, in its ten years, only had to delete 2 papers out of 160,000. This would suggest that either arXiv has a very efficient staff or this is not really a problem (or, as I suspect, both). I would say that maintaining minimum standards in the arXiv is a problem, but one that has been solved. The arXiv mainly works on the self-respect system. Most serious authors have respect for their own work, and the arXiv takes certain steps to reinforce this principle. First, in order to register an author should provide some evidence that someone in the research community will be interested in his or her submissions. Any real academic affiliation is sufficient and the arXiv assumes it if the author has an academic e-mail address. It's not perfect since occasionally someone obtains an academic e-mail address without any affiliation whatsoever, but it works pretty well. Otherwise the author needs a letter of reference from someone in the research community. It doesn't have to be anything like a letter of recommendation for a job, just some kind of expression of interest in the author's work. Second, the component archives are moderated, mainly for the purpose of fixing misclassifications. At this level the arXiv doesn't keep out work that is merely wrong or weak; it might still be relevant to research. But there are separate categories for submissions that defy classification, and that includes material that makes no sense to the moderators. Third, submissions are irrevocable. You can always submit a new version, but all versions remain available. So you have to live with your mistakes; the best you can do is submit a withdrawal notice asking people not to read previous versions. This three-tier system has nothing to do with the handful of deleted submissions. Deleted submissions are things like conference announcements, garbled files, and duplicates. Besides that, I'm not sure what the original poster had in mind as rubbish. Maybe half of the articles in the math arXiv are ho-hum works that would never interest me. But that's not the same as rubbish; most of these papers are legitimate but boring. Maybe 5% are so lame that I would be embarrassed to have my name on them. But even most of these are on-topic and publishable. On the other hand, the arXiv does have some excellent papers that will never be published, or that have even been rejected by a journal. I think it's important to put peer review after permanent archival. Your suggestion, to me, does seem a rational one (and indeed currently exists between arXiv and the APS - I believe the APS will accept submissions using arXiv papers), that there are archives of pre-print papers which are then picked up by validating services (i.e publishers) which then repackage archives into validated subject/editorial content. This is a major controversy surrounding the arXiv right now. It is important for journals or other vehicles of peer review to validate research papers. But why bother repackaging them? -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Validation of posted archives
on Wed, 21 Mar 2001 Tim Brody tdb...@ecs.soton.ac.uk wrote: On Wed, 21 Mar 2001, Guillermo Julio Padron Gonzalez wrote: The name of a journal is part of the validation of a published paper. We all use the rigorousness of the peer review and the editorial crite-ria of the journals to judge about the validity of a published paper. I agree that there can be exceptions, but they are just that: exceptions. It is clear that nobody has the time or the willingness to dive into each paper to find out whether it is the final version of a validated paper or it is just electronic garbage. The fact is that a non-administered archiving system may cause a proliferation of non-validated, duplicated, misleading and even fraudulent information in the web and there will be no way to identify the valid information, so the readers will go to validating sites, v. g. the publisher site. Unless OAI included some kind of validation... I hope you do not mind me adding to this discussion. If I may clear up perhaps a confusion about the protocol OAI: OAI is a protocol for the distribution of Metadata, much the same as TCP/IP is a protocol used by the Internet to distribute information. I would no more expect OAI to provide me with guarantees about the content than I would TCP/IP about this email. (As an aside, OAI does not provide any facility for the distribution of full-text papers (it can merely distribute 'pointers' to papers).) Therefore the validation, or otherwise, of papers and their heritage rests with the application(s) that use OAI. As an example of an Open Archive that has had ample opportunity to be filled with rubbish; (correct me if I am quoting wrong), arXiv has, in its ten years, only had to delete 2 papers out of 160,000. This would suggest that either arXiv has a very efficient staff or this is not really a problem (or, as I suspect, both). The LANL server is undoubtedly efficient, but probably not effective in screening out useless material. Mathematical proofs validate much of its content but contribute little to usefulness. Moreover, the peer-reviewed journals in physics have a much higher acceptance rate than journals in other fields. In short, I would not be so sure that LANL's service is not filled with rubbish. More important, physics and mathematics are far removed from topics useful to quacks who promise to treat everything from aching backs to zodiacal destiny. LANL's most effective feature perhaps is its use of XXX -- an insignia that keeps out children who are protected by parental controls from Internet peril. Best wishes, Albert Henderson 70244.1...@compuserve.com . .