I hope the following exchange will be helpful to those universities that are currently drafting and implementing institutional/departmental self-archiving policies: http://www.ecs.soton.ac.uk/~lac/archpol.html It concerns the degree to which the metadata of deposits are checked before they appear publicly in the eprint archive.
---------- Forwarded message ---------- On Thu, 10 Apr 2003, [identity deleted] wrote: > Entering data on a particular day [does] not result in that data > becoming immediately available. There appears to be a serious problem > with the data processing. > > According to my "user area homepage", I have three items pending - one > from 13th March and 2 from 28th March! Why is it necessary for 4 or more > weeks to elapse before entries that I make can be added to the database? > This *is* a *software* problem. If the reason is that I have not > completed the entry properly, that is still a *software* problem because > I don't know what I haven't done correctly - there are no error messages. This is most definitely *not* a software problem but a human factor problem! The delay in the appearance of your data is 100% a function of the fact that the vetting of the deposits is not being done promptly -- by a designated human being. I know this for a fact. I have been performing, myself, that vetting function for CogPrints -- a public central archive rather than a local departmentla/institutional archive -- for 6 years now, as the designated vettor. As soon as a paper is deposited, it is in the submission buffer. I, as vettor, can immediately review the metadata, and then OK the deposit, within 1 minute, if I am at the helm. With an average of 5 deposits per week, this has been no problem. (If the load ever gets bigger, I can recruit additional designated vettors, but the OAI and distributed institutional archiving have evolved since the founding of CogPrints, and that is likely to distribute the load more sensibly than central archiving, once self-archiving picks up momentum, with each research self-archiving in his own departmental archive.) The (human) resources for either (1) prompt, careful, group-based vetting of the metadata by designated vettors in each research group, or (2) no vetting of the metadata and automatic acceptance of the deposits *must* be part of any departmental self-archiving policy. Without it, discouraging delays and misunderstandings of the kind you describe are inevitable. But they have nothing whatsoever to do with either the software or the principle (and benefits) of departmental self-archiving of all refereed research output. Just as the deposit of a single paper is only the matter of a few keystrokes and a few minutes of time (meaning that the self-archiving of *all* the research output [including the retrospective legacy output] of even the most prolific of departmental researchers represents no more than a few man-hours -- a tiny investment for a huge return, especially with the help of the "cloning" feature that automatically repeats all metadata that are common to all or many papers, making redundant re-entry unnecessary), so the vetting of each single paper is a matter of still fewer keystrokes and minutes of time. All that is needed is a designated vettor available to reliably vet that day's deposits -- plus a one-time, start-up corps of vettors who will process the legacy data. The calculation of the number of man-hours required, both for any department's legacy data and for the ongoing future daily research output per group can easily be done, and it will be found to be ludicrously small, especially for the size of the benefits it will confer on us all: http://www.neci.nec.com/~lawrence/papers/online-nature01/ But that calculation must be done, as an essential part of any departmental self-archiving policy. And a decision has to be made as to whether the department or institution will (1) resource rigorous vetting per group, or they prefer to (2) have deposits immediately appear automatically. (Option (2) is not a great risk, as the Eprints software itself makes sure that certain obligatory fields are filled, the depositor himself can review his own data, and if/when later metadata errors are discovered, the depositor can correct them. The vetting capability we provided with the Eprints software was originally modelled on that of the Physics ArXiv, which receives 3500 deposits per month, from all over the world, in one central archive in which no individual or institutional interests are vested. But any local departmental archive -- once the legacy data are in there -- will have monthly deposit frequencies equal to that department's monthly output in research papers. I think one vettor per research group could easily set aside the few minutes per day that it would take to keep up with checking the metadata for his group's daily deposits [option (1)], but if that resource is not available I suggest having the deposit accepted automatically [option (2)] as a far preferable (and not very risky) alternative to having it sit for a month in a submission buffer with no designated vettor to check and accept it.) To repeat, this is a departmental archive policy matter, not an archive software matter. It is regrettable that in this case the practise seems to have been allowed to precede thinking the policy through and choosing between (1) or (2), thereby creating needless misunderstandings about the software and the principle, but this can easily be remedied now, and all researchers alerted. Such are the advantages of implementing a research archive at departmental scale -- and of the small (indeed trivial) nature of the policy problem in question. Stevan Harnad