Re: eprints and authentication
Why should a pdf be locked? Getting away from the idea that work is always on paper says to me that it should not be read-only *at the user end*. The emerging means of authentication described by Adrian should be an excellent way forward, but why the need to lock as well? I ask because for projects such as ours, which involves adding third-party reference links to pdf documents, locking is not insurmountable but is against the principle of what we are trying to demonstrate. I prefer to use the term signed. This authenticates the document (or a fragment of it) but does not prevent others from re-using it (although the original signature is now invalidated if they do quote it with changes). A document can be signed many times by many people of course. I am convinced that as we move into an information anywhere and from anywhere era, the need to know which bit came from where and when becomes essential. -- Henry Rzepa. +44 (0)20 7594 5774 (Office) +44 (0)20 7594 5804 (Fax) Dept. Chemistry, Imperial College, London, SW7 2AY, UK. http://www.ch.ic.ac.uk/rzepa/
Re: Central vs. Distributed Archives
On Wed, 8 Nov 2000, Greg Kuperberg wrote: While libraries certainly should help preserve e-prints, I do not trust any one library, nor any other sole institution, to archive material single-handedly. Any caretaker can lose or destroy a unique copy of any document... That is why it is important to redundantly and openly mirror an archive and not just allow third-party searches. The arXiv has 18 mirror sites on six continents Who is disagreeing with this? All requisite redundancy is just as desirable, and feasible, and inevitable, with institution-based distributed archiving as with discipline-based archiving. I think there is an incorrect analogy at the heart of Greg's frequent use of the term fragmented in speaking about the institution-based approach to self-archiving: I think Greg continues to equate (1) archiving with publishing, and (2) institutional digital collections with localized books-on-shelves (ripe for a Library-of-Alexandria catastrophe; hence his example of the lost/destroyed unique document). And (3) (unrefereed, unpublished) PREprints continue to be treated as the paradigm for it all, whereas it is much more informative and representative to see it in terms of (refereed, published) POSTprints: We are, after all, aiming at freeing the REFEREED literature -- with the prepublication embryological stages merely an added bonus, rather than the focus of it all. So, to summarize: Whilst, our refereed papers are already, as they are, safely in the hands of journals and libraries, blissfully mirrored (though unblissfully unfree), we need not fret about Alexandria. Freeing a postprint (sic) via self-archiving (whether central or institutional, interoperable or not) is a bonus, a plus, a freebie, a way to make it accessible to those multitudes worldwide who cannot access it because of the S/L/P firewalls surrounding the safe, Alexandria versions. It is inviting Zeno's Paralysis (again) to say: Keep waiting till you have an Alexandria-proof centralized, mirrored, redundant arXiv-style Archive to self-archive them in before you dare to self-archive your (already safely mirrored) postprints. Nay! Release them from their hostagehood behind obsolete, impact-blocking, and completely surmountable access barriers online today through self-archiving, addict fellow-researchers the world over to that new, free form of access to it all, and the redundancies and mirrors will come tomorrow, in plenty of time to keep the freed corpus aloft in the skies. (And nothing is at risk: the firewalled version remains as safe -- from catastrophic loss as well as illicit access -- as it ever was.) If that is now transparent for postprints, it should be equally transparent that the same applies to preprints: They are destined to become postprints (hence secure, for the above reasons) anyway. Being available online early is a bonus; a freebie. Moreover, it is bonus that has no prior history of enjoying the safe/secure status of postprints anyway: access to preprints was always restricted and evanescent, destined to be superseded by the secure postprint once it was available. Now the redundancy and mirroring that will be accorded the freed postprint corpus, once it is freed, will also be inherited by the preprint corpus. So there is nothing to lose, and everything to be gained, by self-archiving all preprints and postprints now, in either the centralized OAI-compliant (http://www.openarchives.org) archives like arXiv (http://arXiv.org), or in institutional OAI-compliant archives, like Eprints (http://www.eprints.org). Ignore Cassandras: Preservation problems are eminently soluble, once the goods are up there: the real problem now is how to get researchers to put them up there, at long last. Central archives have gone part of the distance but are proving too slow. Institutional archives are natural allies in hastening us on the road to the optimal and inevitable. As a rule, it is better for web sites to share the same archive than to each have fragments. It is better for Oxford and Cambridge to each have all of Shakespeare's plays than for Oxford to have only the comedies and Cambridge to have only the tragedies. That is why I favor shared interoperability, which is in some ways centralized, to fragmented interoperability, which is optimistically called decentralized. Massive redundancy is one of the few strengths of the existing paper-based system; I am not an expert on digital storage, coding or preservation, but I am not at all sure that Greg is technically right above (and I'm certain that the Oxford/Cambridge hard-copy analogy is fallacious). I would like to hear from specialists in localized vs. distributed digital coding, redundancy, etc. -- bearing in mind that in the case of the refereed literature, this is all moot anyway, because free access now, is infinitely preferable to no access, no matter how short-lived it risks being. The locus classicus is still safely ensconced behind the toll
Re: Exponential growth
On Wed, 8 Nov 2000, Greg Kuperberg wrote: Maybe you want to say more conservatively that new submissions should be superlinear, i.e., concave up. Yes, yes, that's it. (And that's: new self-archived eprint (whether pre- or post-), NOT new submission. Submission is for journals. Self-archiving is better described as a deposit.) And maybe instead of asymptotics you are interested in the short term. In that case the right way to say it is that you open archiving should grow faster in the near term. Yes, it should go concave up, steeply, until the entire (finite) current refereed corpus is up there, online and free. And I do mean steeply. There is no reason it should not all have been up there, freed, yesterday, so certainly no reason to drag it out for another decade. As to asymptotics: I am referring to the current refereed corpus; this annual corpus is finite though also itself growing somewhat annually, but not nearly so fast as to require my refining the shape of the curve: the sharp concave up covers it all... Stevan Harnad har...@cogsci.soton.ac.uk Professor of Cognitive Sciencehar...@princeton.edu Department of Electronics and phone: +44 23-80 592-582 Computer Science fax: +44 23-80 592-865 University of Southampton http://www.ecs.soton.ac.uk/~harnad/ Highfield, Southamptonhttp://www.princeton.edu/~harnad/ SO17 1BJ UNITED KINGDOM NOTE: A complete archive of the ongoing discussion of providing free access to the refereed journal literature online is available at the American Scientist September Forum (98 99 00): http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html You may join the list at the site above. Discussion can be posted to: american-scientist-open-access-fo...@amsci.org
Re: Central vs. Distributed Archives
Greg Kuperberg writes But I disagree entirely with the claim that distributed interoperability has never been tried before. It has been tried several times, whole-heartedly with these two projects: MPRESS - mathnet.preprints.org NCSTRL - ncstrl.org And it has been a factor in many other projects, including Hypatia and the AMS preprint server. Some of these projects are more successful than others, but *all* of them suffer from inconstancy of the underlying archives. The largest project that has been done with a distributed interoperability is RePEc. RePEc catalogs 11 items now. While there is the occasional case that an archive my become obsolete, from about 140 archives, I think 5 have been made obsolete, i.e. have been moved to a place outside the original archive maintainer's control. Thus while it is problem, it is not a minor one. It is by far outweight by other advantages, such as distributed costs, minimum quality control, and wide community partipation. Cheers, Thomas Krichel http://openlib.org/home/krichel RePEc:per:1965-06-05:thomas_krichel 2000-10-05 to 2001-01-06: Institute for Economic Research / Hitotsubashi University 2-1 Naka / Kunitachi / Tokyo 186-8603 / Japan / +81(0)42 580 8349 tho...@micro.ier.hit-u.ac.jp
Re: Central vs. Distributed Archives
Steve, I think you misunderstand Greg's concern (and mine) We do not disagree with what you want to do; we want to add to it. We are assuming, I think, that something similar to the plan you advocate will be the basic process. I do not think it enough to say distributed=secure. It's only the first step to security. In addition to being distributed, there also needs to be a reliable caretaker--not just to do the housekeeping, but to ensure that the archive is kept compatible with changing technology. I suggested that the archives be organized redundantly both by discipline and by university (and possibly by geographic/political entity, as well as what anyone wants to do). There are undoubtedly well-organized academic departments that can do this. There are also academic departments that cannot be relied on to do this right, because of size, interest, or finances. The same goes for professional societies. Certainly no individual can be relied on: all humans are mortal. All of this goes as well for refereed as for unrefereed, preprint as for reprint, officially published as for unpublished. As a librarian, I do not assume it is good enough that our refereed papers are already, as they are, safely in the hands of journals and libraries, ... There are very few library copies of many journals, and though there is excellent backup from national libraries, even their collections are incomplete. The literature published up to now will be much more secure when it too has been digitized and placed on free publicly available mirrored servers, with all the additional precautions. Besides security, this will also make them generally available with all the additional advantages of plans such as yours.
Re: Central vs. Distributed Archives
Greg: As a rule, it is better for web sites to share the same archive than to each have fragments. It is better for Oxford and Cambridge to each have all of Shakespeare's plays than for Oxford to have only the comedies and Cambridge to have only the tragedies. That is why I favor shared interoperability, which is in some ways centralized, to fragmented interoperability, which is optimistically called decentralized. Massive redundancy is one of the few strengths of the existing paper-based system; Stevan: I am not an expert on digital storage, coding or preservation, but I am not at all sure that Greg is technically right above (and I'm certain that the Oxford/Cambridge hard-copy analogy is fallacious). I would like to hear from specialists in localized vs. distributed digital coding, redundancy, etc. -- bearing in mind that in the case of the If I may separate the political issues from the technical. Political: There is a fear that a decentralised system will result in no overall responsibility for archive continuity. But, equally, a centralised body can decide that a system is no longer useful or is too expensive to be free - what happens if XXX goes pay-per-view? What rights do mirrors have to store XXX if they are told to remove their archive? Technical: The fear is that there will be only one copy of a paper stored in an institution department or library and if that archive is lost that paper disappears into digital oblivion. Data storage is very cheap - there is little difference between storing 1 or 100 copies. Oxford and Cambridge could farm all world physics archives and store their contents. This is not currently done because Open Archives include pay-per-view archives, where only the abstract can be farmed - and hence there is no provision for farming of texts. I may also point out that there are already archives that perform distributed mirroring - math arXiv is primarily made up of papers that have been archived elsewhere (judging by the lack of associated meta data and updates). Tim Brody Computer Science, University of Southampton email: tdb...@soton.ac.uk Web: http://www.ecs.soton.ac.uk/~tdb198/
Re: Central vs. Distributed Archives
On Thu, 9 Nov 2000, David Goodman wrote: Steve, I think you misunderstand Greg's concern (and mine) We do not disagree with what you want to do; we want to add to it. We are assuming, I think, that something similar to the plan you advocate will be the basic process. I do not think it enough to say distributed = secure. It's only the first step to security. In addition to being distributed, there also needs to be a reliable caretaker--not just to do the housekeeping, but to ensure that the archive is kept compatible with changing technology. I agree completely. I didn't say distributed = secure (there's a lot more to security than that). I said being freely accessible now, in distributed institutional Eprint archives is a powerful new way to complement being freely accessible in centralized Eprint archives, which are still growing much too slowly. It should not be delayed for one moment by security concerns, not one moment. I suggested that the archives be organized redundantly both by discipline and by university (and possibly by geographic/political entity, as well as what anyone wants to do). Again, complete agreement. There are undoubtedly well-organized academic departments that can do this. There are also academic departments that cannot be relied on to do this right, because of size, interest, or finances. The same goes for professional societies. Certainly no individual can be relied on: all humans are mortal. All of this goes as well for refereed as for unrefereed, preprint as for reprint, officially published as for unpublished. Agreed, and digital librarians are clearly the pertinent experts. As a librarian, I do not assume it is good enough that our refereed papers are already, as they are, safely in the hands of journals and libraries, ... Yes, but let us not again mix up agendas. There could have been -- independent of any movement to free the refereed literature online -- a movement to increase the security of the on-paper corpus (both papers and books) on-line. That's fine, desirable, but unrelated to this Forum's agenda, which is to FREE the refereed corpus online. Concerns about strengthening the paper literature's current security should not be wrapped into the freeing (now!) initiative for the refereed literature; nor should freeing (now!) be made in any way conditional on first meeting a priori security concerns. Although it is an oversimplification, it is best to treat the freeing initiative as a pure freebie, a windfall, over and above what we have already. We are talking about archiving, not publishing, an extra version of what is already published (on-paper). This face-valid, immediate goal should be kept as distinct from preservation concerns as it should be kept from peer-review-reform concerns (likewise worthy, but orthogonal, and indeed even at cross-purposes if yoked in any way to the freeing initiative). There are very few library copies of many journals, and though there is excellent backup from national libraries, even their collections are incomplete. The literature published up to now will be much more secure when it too has been digitized and placed on free publicly available mirrored servers, with all the additional precautions. Besides security, this will also make them generally available with all the additional advantages of plans such as yours. David, the securing issue is a separate one from the freeing! The material on the shelves now is not free; nor is it, let us agree, as secure as it might be. Increasing its security by distributed digital back-up is one thing (and need not be freely accessible either); freeing it online is quite another. Please, please keep these two separate or you will only encourage more Zeno's Paralysis! Stevan Harnad har...@cogsci.soton.ac.uk Professor of Cognitive Sciencehar...@princeton.edu Department of Electronics and phone: +44 23-80 592-582 Computer Science fax: +44 23-80 592-865 University of Southampton http://www.ecs.soton.ac.uk/~harnad/ Highfield, Southamptonhttp://www.princeton.edu/~harnad/ SO17 1BJ UNITED KINGDOM NOTE: A complete archive of the ongoing discussion of providing free access to the refereed journal literature online is available at the American Scientist September Forum (98 99 00): http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html You may join the list at the site above. Discussion can be posted to: american-scientist-open-access-fo...@amsci.org
Re: Central vs. Distributed Archives
On Thu, Nov 09, 2000 at 11:16:11AM +, Stevan Harnad wrote: Nay! Release them from their hostagehood behind obsolete, impact-blocking, and completely surmountable access barriers online today through self-archiving, addict fellow-researchers the world over to that new, free form of access to it all, and the redundancies and mirrors will come tomorrow, in plenty of time to keep the freed corpus aloft in the skies. Entirely aside from whether your proposals are the best ones, you have previously described them as being nothing other than the Ginsparg model. Well I think of myself as devoted to the Ginsparg model, but my interpretation of it is significantly different from the one that you give here. In 1997 my thinking was much more like yours, but three years of direct experience with the arXiv has changed it. My creed is, build a large, integrated, immortal archive now, and the e-prints will come tomorrow. I won't insist that this approach is right for your discipline, because maybe you know your own community better than I do. But I do feel strongly that it is right for my discipline. And I can't speak for Paul Ginsparg either, but I would be surprised if he contradicted me outright, since he has influenced my thinking a great deal through direct correspondence. In general your liberation terminology doesn't sit so well with me. I do hint at liberation terminology from time to time; in fact the name of my front end, Front for the Mathematics arXiv, is a deliberate allusion. If the math arXiv is revolutionary, I would liken it to the American revolution. We are building a new system on new territory and letting immigrants come. I see a lot of Alexander Hamilton in our approach, and somewhat less of Thomas Jefferson. Your comments have some character of Jefferson, but very little of Hamilton, and often they sound almost Marxist. I might compare your overall vision to the Communards of Paris. But hey, you could be right in your own society. You have also correctly picked up that I don't accept the dichotomy between preprints and postprints. My view is that the preprint and the postprint are Tweedledum and Tweedledee. But that is a topic for another posting. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
On Thu, Nov 09, 2000 at 05:58:14PM +, Tim Brody wrote: I may also point out that there are already archives that perform distributed mirroring - math arXiv is primarily made up of papers that have been archived elsewhere (judging by the lack of associated meta data and updates). I don't understand this comment. Most of the papers in the math arXiv are eventually published, and many are in preprint series of one sort or another. However I conjecture that at least half of the submissions in the most recent three months are not on any other web site, not even on a home page. And for those that are not published or not yet published, the arXiv is the only project that explicitly promises to keep them permanently. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *
Re: Central vs. Distributed Archives
On Thu, Nov 09, 2000 at 07:16:47PM +, Stevan Harnad wrote: I don't think sublinear or linear growth is right for your discipline (maths) either... Of course more growth is better than less. Several of us (both the arXiv staff led by Paul Ginsparg and the math advisory committee chaired by Dave Morrison, on which I serve) have worked hard to accelerate the growth of the math arXiv. I can report a partial victory. The archives that we glued together were at best growing linearly with a low slope and were showing some signs of sublinearity. After we put them together there was a discontinuous increase in new submissions, and linear growth commenced with a higher slope. I don't have a chart but the numbers are there at http://front.math.ucdavis.edu/math After we had changed so much, I was surprised that growth was still linear. (Paul Ginsparg wasn't surprised.) I now believe that linear growth in e-prints is inherent. But both the discontinuity and the one-time change in slope were heartening. That is a realistic goal when you change the system. -- /\ Greg Kuperberg (UC Davis) / \ \ / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/ \/ * All the math that's fit to e-print *