On Sun, 31 Oct 2004, Prof. Tom Wilson wrote: > Quoting Joseph Halpern <halp...@cs.cornell.edu>: > >jh> My guess is that CS researchers will typically not put their >jh> papers on university servers unless required to do so, simply because of >jh> laziness.
It is true of just about *all* researchers that they will typically not put their papers on any server unless they are required to do so (laziness). If the problem of achieving 100% OA were merely the problem of getting those who already self-archive in some way or other (i.e., those who are not lazy) to do it in some other way (be it central disciplinary server, institutional server, departmental server, or home page) then we would not need a self-archiving mandate at all, and we be almost there! It is important to keep this reality in mind in what follows, otherwise all we are doing is meditating on our favorite way to self-archive, rather than solving the problem of getting the non-self-archivers to self-archive, so we can reach 100% OA. I would suggest setting aside for the moment those who already self-archive, and how they do it, and focussing on those who do not (the lazy ones). >jh> There's less overhead in putting a paper on your home page >jh> than there is in putting it on a university server and authors know that, >jh> once it's on citeseer, their paper is easily accessible (and, I would >jh> guess, more likely to be seen than on a university server). (1) The number of keystrokes it takes to self-archive a paper on one's home-page may be a few (not many) fewer, but that is not the point: The real problem (and the relevant laziness) is that of those who are *not* doing those keystrokes *at all*, not that of those who are doing too few! (2) Since the advent of the OAI protocol (1999), OAIster and citebase, there is no difference whatsoever in either ease of accessibility or likelihood of being seen, between a paper in an OAI archive (whether institutional, disciplinary, or departmental) and a paper harvested by citeseer. If anything, the advantage is the other way (because citeseer is not OAI-compliant). (3) Let us not mix up (i) the fact that citeseer is a (harvested) central disciplinary archive that happens to be quite *populated* with (ii) other facts about citeseer (such as that it is central, disciplinary, or in the CS field). (4) The salient feature of citeseer is that it is *harvested*. If citeseer trawled for self-archived full-texts in physics or biology -- or even (surprisingly!) social science -- instead of computer science, it would be populated too. (Possibly not as populated as in computer science, but one can't be sure of that either.) Our webwide trawls for OA full-texts using ISI-based citations in Biology and Social Science are currently generating a hit rate of 10-15%. http://opcit.eprints.org/oacitation-biblio.html http://citebase.eprints.org/isi_study/ http://www.crsc.uqam.ca/lab/chawki/ch.htm (5) Hence what is really being compared here is not institutional versus disciplinary archives, but harvested versus non-harvested (full-text) archives. (6) So let us not compare apples and oranges: The right comparison is whether the probability (and rate) of reaching 100% OA is higher (a) if authors do fewer keystrokes and we instead design more full-text trawlers and harvesters like citeseer, or (b) if authors do a few more keystrokes (to make their full-texts OAI-compliant) and then OAIster (etc.) can just harvest their metadata, as they were designed to do. (7) And this is entirely independent of whether self-archiving needs to be mandated in order to ensure that we reach 100% OA soon enough. (8) What is certain is that if OAI-compliant self-archiving is to be mandated, it is institutions that are in the natural position to implement the mandate and monitor compliance (probably at the departmental level), for it is institutions (and not disciplines) that share with their own researchers the benefits of maximising research impact, and the costs of losing research impact. Tom Wilson replies (to Joe Halpern) > ...perhaps loyalty > to a discipline is stronger than loyalty to an institution, which can > vary over an academic career. And your comment, "unless required to do > so" chimes in with my earlier point about academic authors needing > some motivation to submit to institutional archives. I'm afraid that several factors are again being mixed up here: (1) "Loyalty to a discipline" is an abstraction, and an irrelevant one, here: Disciplines do not count an author's publications, weigh their impact, and employ and fund him accordingly. His institution does (and to a certain extent his research funders do too). If an author elects to self-archive so as to maximize his research's visibility, access, usage and impact, this is primarily for the sake of his research itself, and his own career, for which all the carrots and sticks are in the hands of his institution (and funder), not his discipline. (So much for self-archiving out of loyalty to one's discipline!) (2) The author motivation in question is not specific to self-archiving in his institutional archive: It is the motivation to self-archive at all. (That motivation is so as to maximize his research's visibility, access, usage and impact.) (3) The author's institution's motivation likewise needs to be taken into account; and that too is to maximize the visibility, access, usage and impact of its own employees' research output. For authors and their institutions, as noted, share in the benefits of enhanced research impact, as well as in the costs of lost research impact. Hence it is authors' institutions that wield the carrot/stick for maximizing impact through self-archiving, and have both the means and the interest to monitor compliance (both for themselves, and for their employees' research funders, who likewise have a stake in the impact of the research they fund, and often subsidise their fundees' institutions with substantial overheads). Again, "discipline loyalty" has nothing to do with any of this. (4) Logically and practically, if there existed a central, OAI-compliant archive for each discipline (and some central entity to foot the costs and maintain the entire disciplinary archive in each case, as the Physics ArXiv does today), then it would make absolutely no difference whether authors self-archived in their disciplinary OAI archive or their institutional archive. But this is not the case today: There are few disciplinary archives, and the burden and complexity of creating and maintaining them are substantially greater than offloading and distributing the load on each individual university (and its departments) for its own research output alone -- at far lower cost, per university, along with a far more natural means of monitoring compliance. (5) But because of the "laziness" problem noted earlier (let us henceforth call it, more charitably, "sluggishness"), merely creating archives, be they central or institutional, is not enough: Self-archiving needs to be mandated, and compliance needs to be monitored and rewarded, just as publishing itself needed to be mandated, and compliance needed to be monitored and rewarded. And in both cases, the natural candidate for mandating and monitoring the requisite practice (for the author's own good!) is the author's institution (backed up by his research funder). "Loyalty to a discipline" has absolutely nothing to do with it. Two footnotes: (6) Authors changing institutions is trivial in an OAI-compliant world. It is simple for the author's new institution to automatically harvest the metadata as well as the full-texts from the author's old OAI-compliant institutional archive. (Wanting to remove one's work from the old institution is absurd as wanting to remove it from the shelves of one's old library -- or any library!) (7) An article's or author's "discipline" -- in a digital, distributed, OAI-compliant world -- is not a *place* but a metadata tag (or rather several of them, as few disciplines are hermetic and autonomous). > But is there really 'less overhead' in putting something on your own > home page? If I (and I am talking personally, rather than generally) > put something on my own home page it involves a degree of labour in > converting to html - if I simply send, say, a Word document to the > organizer of the institutional archive, all that is needed is an > e-mail attachment. But perhaps the perception that it IS more trouble > is part of the problem?? This is just the extra-keystroke saga again: It takes a few keystrokes to convert, a few to email, not many more to self-archive. I wonder how many people who have expressed strong opinions about what is and is not feasible/optimal have ever actually gone through the motions of self-archiving one of their papers in an OAI-compliant archive? and even those few keystrokes are an over-estimate, as all subsequent papers can "clone" most of the repeated metadata and enter only what is new. And the proof that the problem is not the *number* of keystrokes but the sluggishness, simpliciter, is that even in institutions such as St. Andrews, which have established a proxy self-archiving service that will do all the keystrokes *for* the author "Let us Archive it for you!" http://eprints.st-andrews.ac.uk/proxy_archive.html the cupboards are nearly bare, waiting for a self-archiving mandate (just as there would be few publications at all, if not for the "publish or perish" mandate). As the Swan & Brown (2004) survey reports: the majority of authors state that they will willingly self-archive if it is mandated (but not otherwise). http://www.ingentaselect.com/rpsv/cw/alpsp/09531513/v17n3/s7/ (I invite those authors who would like to actually see what they are talking about when they say self-archiving calls for too many keystrokes to self-archive one paper in http://demoprints.eprints.org/ and sample self-archiving for themselves.) >jh> My own strong preference is for discipline-based archives, rather than for >jh> intsitutional archives. The arXiv is extreemely successful because, for >jh> large areas of physcis, that's *the* place to have your paper appear if >jh> you want people to be aware of it. Yes, but how long are we willing to keep waiting for other discipline-based archives to be created and filled? The Physics ArXiv has been around for nearly 15 years now, and no other has sprouted since. The only two other comparable-sized disciplinary collections are *harvested* ones: Citeseer (Computer Science) and RepEc (Economics). Harvesting, as noted earlier, is for the already-converted (who have self archived in any which way already). The problem, however, is the sluggish, who are still the vast majority -- in *every* discipline. They're the ones for whom the self-archiving mandate is needed. (Even Physics, at Arxiv's present linear growth rate, unchanged since 1991, will not be 100% OA for at least another 10 years). http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0043.gif But employers cannot mandate the creation of central disciplinary archives: They can only mandate the creation and filling of their own archives, for each of their own disciplines (departments). Research funders *can* create their own archives (not exactly disciplinary, but dedicated to the research they fund), and NIH is on the verge of doing just that; but even there, the mandate is far more likely to propagate to non-NIH-funded research and to other disciplines if it is implemented institutionally: "A Simple Way to Optimize the NIH Public Access Policy" http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/4091.html (Besides: The likely causality is that Arxiv is today preferred by its users because of the many papers that are in it -- not that the many papers are in it because it was preferred by users: ArXiv was just a natural online adaptation of a "self-archiving" practice among certain physicists that preceded the Internet. If OAI had come before ArXiv, the same physicist practice would naturally have been implemented institutionally. It was pre-OAI functionality that dictated central archiving back in 1991. >jh> It is also easier to tailor a >jh> discipline-based archive to the needs of a discipline; I can well >jh> imagine that different features might be appropriate for an archive of >jh> computer science papers than for an archive of papers in art history. >jh> Of course, if all archives eventually hook together, this point may become >jh> moot, but I think it holds for now. (1) Of course all OAI-compliant archives -- including ArXiv -- will eventually hook together. (They already do, via OAIster, but since ArXiv is the only one of them with critical mass, there's no advantage for ArXiv-users motivating them to use it through OAIster, nor any advantage in upgrading OAIster's functionality to match and exceed ArXiv's -- though that can and will easily be done when there are more archives with critical mass in OAIster.) http://oaister.umdl.umich.edu/o/oaister/ (2) What functionality does Joe think an individual OAI archive can provide for users (I am not speaking about depositing authors) that an OAI harvester and service provider could not provide, and better? Tom Wilson: > Again, Thomas Krichel noted that the needs of disciplines would probably vary How, specifically, (1) for the user, and (2) for the author? > and I think that differences may also relate to such factors as the delay > between submission and publication in a field, How? The self-archiving of the pre-refereeing preprint is (and must remain) optional for the author. The self-archiving of the peer-reviewed final draft ("postprint") can and should be done the moment it is accepted for publication. What are the discipline differences, and the factors, and the role of the length of the delay between preprint and postprint? > the significance of primacy of discovery in the discipline, > and national cultures. He who wants to ensure primacy can (optionally) self-archive the preprint (or date-stamp it is some other way). So what is the point here? Remember that the target of OA is the postprint, the target literature being the peer-reviewed journal literature. > Perhaps, also, the various disciplinary archives may vary in what > they accept What they accept? It is journals that accept, and the target of OA is the postprints accepted by the journals. The preprints are another matter and not central to OA. > and if an archive has a policy of accepting anything it may play > against that archive being accepted as a reliable source. The mark of reliability is the journal that accepted the postprint. Unrefereed preprints are tagged as such. Caveat emptor. Nor is this new. Scholars and scientists have always been able to distinguish between refereed publications and unrefereed drafts. Archives are merely access-providers, not certifiers of reliability: Journals are the certifiers of reliability. And before the predictable next question is asked: "What certifies that a postprint has indeed been accepted?" -- please read the self-archiving FAQs on "Certification" and "Authentication": http://www.eprints.org/self-faq/#5.Certification http://www.eprints.org/self-faq/#2.Authentication > There is obviously a researchable topic here and perhaps the more > that is known about the appeal to authors in different disciplines > and different countries of different modes of 'open access' > availability, the easier it will be to devise policies that stand > some chance of working in different contexts. Open Access is already 10 years overdue. I suppose we could now turn to researching disciplinary and national differences, but it seems to me we'd be better off just going ahead and mandating self-archiving, at last. It is unlikely that the outcome of the research would be the only result that could counter-indicate doing this for any discipline, namely, that researchers in that discipline would *not* benefit from maximising the visibility, access, usage and impact of their research output. For that is all that OA does, and is intended to do. >jh> Full disclosure: I run the computer science part of the arXiv. Despite >jh> the predominance of citeseer in the computer science community, I >jh> believe that the CS arXiv plays an important role both because of issues of >jh> copyright (when they post their papers on the arXiv, authors >jh> explicitly give the arXiv permission to post papers) and because of >jh> stability (since the Cornell library has assumed responsibility for >jh> the arXiv, there's some assurance it will be around for the long term). > > Permanence is clearly important but, as the Deputy Director of a university > computer services said to me when we were discussing archiving and I loosely > used the words 'in perpetuity', "Nothing is for ever!" :-) What needs to be forever is the journal's published version of the article (the one subscribed to by libraries). The immediate purpose of OA is immediate *access-provision* to all those would-be users whose institutions cannot afford access to the journal's published version, in order to maximize its visibility, access, usage and impact. These author-provided self-archived supplementary versions (although they can and will be preserved) do not have the primary preservation burden, and it is a great mistake to delay access and impact on the assumption that they do > Joseph Halpern wrote: > >jh> My own sense is that the more the better. What I've told the librarians >jh> at Cornell is that I'd ultimately prefer one submission process that >jh> would simultaneously put my papers on all relevant archives. But if I >jh> were to choose just one archive, I'd choose a discipline-based archive. This is all splendid, but preaching to the converted: The problem is the vast, sluggish majority who do no keystrokes, and self-archive nowhere, not those who don't do enough keystrokes, and don't self-archive in enough places! >tw> But is there really 'less overhead' in putting something on your own >tw> home page? > >jh> Of course, this depends on the university. But typically, with >jh> university archives, there are forms to fill out (title, author, >jh> abstract, etc.). None of that is required on one's homepage (although >jh> some of us do put that information there). This is just haggling again over the number of keystrokes, when the problem, again, is the vast, sluggish majority who still do no self-archiving keystrokes at all. >tw> But perhaps the perception that it IS more trouble is part of the problem?? > >jh> It may well be in some cases. And certainly there's not much overhead >jh> involved in any case. But we're all busy people. There's no question >jh> that part of the success of citeseer is due to the fact that so little >jh> overhead is involved. Fewer keystrokes, more self-archiving. Accepted. But now can we talk about the vast, sluggish majority that does *no* self-archiving at all? That's why the self-archiving mandate is needed. >jh> We're hoping to see journals archived on the arXiv. For journals, some >jh> assurance of permanence is critical. The only journals you'll see archived in ArXiv are OA journals (by definition, unless you only mean preservation archiving for back issues). OA journals are not the problem. Non-OA journals are, and they require author self-archiving for the current impact of current articles. That said, hosting OA journal-archiving too is a fine service -- if journals can be persuaded to want it! Stevan Harnad