Re: Central vs. Distributed Archives

2000-11-02 Thread Greg Kuperberg
I have been skimming the September98 forum on and off for a few months.
As a cursory Internet search will demonstrate, I strongly support
what I consider the Ginsparg model, especially in my own discipline,
mathematics.  I would call it the arXiv model.  But while I agree in
outline with Stevan Harnad et al, I disagree in some of the details.
(And that's where the devil is.) Here is my take on three issues in
particular.

1) I have mixed feelings about the grass-roots connotations of the Open
Archives Inititiative and even more in Harnad's phrase self-archiving.
I do believe that the research literature should be electronic and free,
and it is possible that each discipline must pass through an anarchic,
do-it-yourself phase of open archival before moving on to a more
organized stage.  However, when I started archive work in mathematics,
we already had an array of separate preprint servers cum e-print archives.
The effort since then has been to reorganize much of this jumble into the
math arXiv.  Having many copies of one huge archive is superior to having
many little archives, no matter how interoperable.  Serious permanence
and stability requires closer cooperation than that.

At the overall STM level the literature may have to be divided
into single-discipline or few-discipline fragments for some time.
The Los-Alamos based arXiv works well for the TeX-based e-print culture
in mathematics, physics, and parts of computer science.  But it is
not clear how to extend that particular system to the rest of science.
If you have to have disjoint archives, fragmented interoperability is
then a good goal to work towards.  But you have to realize that it is
only a partial solution.  And I have reservations about encouraging every
tenth researcher to set up yet another archive, because that can lead to
entrenched Lilliputian feifdoms of e-prints.  By my standards the physics
part of the arXiv, with 130,000 e-prints, is large; the math arXiv,
with 13,000, is medium-sized; and an archive with 1,300 or less is tiny.

2) I have been accused, sometimes correctly, of being overzealous in
my support of the arXiv.  I see that Stevan Harnad has about as much
enthusiasm as I do, and I can't criticize that.  But if the September98
forum has strong advocacy in favor of open archives, it doesn't make sense
to limit criticism.  Because then you're just preaching to the choir.
If you don't want to debate whether or not open archives are a good idea,
maybe that makes sense.  But then you shouldn't dwell on how fantastic
open archives are; instead you should steer the discussion to practical
plans.

3) I also can't criticize Elsevier's Chemistry Preprint Server project.
In a way I can't even criticize commercial publishers with high journal
prices, even though I believe that the mathematical literature should
be free.  A for-profit company is entitled to maximize profit.  If it is
publicly traded, it is legally required to do so up to a point.  (But the
same token, the customer, academia, is entitled to minimize expenses.)
I'm against Napster-style copyright infringement and I have mixed
feelings about journal boycotts.  My approach is less confrontational.
My own recent papers lie permanently in the arXiv, I keep the copyright,
and I will publish in any journal that wants the papers on those terms.

From this point of view, I am not sure about the Chemistry Preprint
Server, because I don't see the business model for it.  But then, I
don't see the business model for Google either, and I think that Google
is great.  It is possible that the Chemistry Preprint Server will be
an important gift from Elsevier to the chemistry research community.
Arguably the chemists should have done it for themselves, but maybe they
lack leadership and need Elsevier to do it for them.
--
  /\  Greg Kuperberg (UC Davis)
 /  \
 \  / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/
  \/  * All the math that's fit to e-print *


Re: Central vs. Distributed Archives

2000-11-02 Thread Stevan Harnad
On Thu, 2 Nov 2000, Greg Kuperberg wrote:

 1) I have mixed feelings about the grass-roots connotations of the
 Open Archives Initiative and even more in Harnad's phrase
 self-archiving.

You have to distinguish between the Open Archives Initiative (OAI) and
the (Author/Institution) Self-Archiving (Sub-)Initiative.

OAI has now evolved into an initiative for shared standards and
interoperability in the metadata tagging of the contents of online
archives -- WHETHER OR NOT the contents (i.e., apart from the metadata)
of the archives are full-text or free: http://www.openarchives.org

A commercial publisher, for example, can establish an OAI-compliant
Open Archive as readily as any other institution or individual, and
would benefit from the increased visibility provided by the
OAI-compliant interoperability for the contents of the Archive, even if
the full-texts were kept behind an S/L/P financial firewall.

A journal publisher can also establish an OAI-compliant FREE Open
Archive, if they do wish to give away their full-text contents at this
time (as around 400 biomedical publishers are currently willing to do,
as indicated in a very recent posting:
http://www.freemedicaljournals.com
-- although most of those archives are not yet OAI-compliant).

Nor is the OAI particularly committed to either centralized,
discipline-based Open Archiving (e.g. ArXiv, CogPrints) or distributed,
institution-based Open Archiving (Eprints): It is developing
interoperability standards that apply to both, with the objective of
making the difference between them less significant, eventually perhaps
even irrelevant.

The (Author/Institution) Self-Archiving (Sub-)Initiative, however, is
SPECIFICALLY concerned with freeing the refereed research literature
through author/institution self-archiving (in OAI-compliant Open
Archives): http://www.eprints.org

 I do believe that the research literature should be
 electronic and free, and it is possible that each discipline must pass
 through an anarchic, do-it-yourself phase of open archival before
 moving on to a more organized stage.

It is not at all clear why you describe open archiving as anarchic!
It was precisely in order to put order into distributed online digital
archiving resources through interoperability that the OAI was
initiated!

And the other aspect of the order is the order already provided by the
refereed journals, in the form of peer review and its certification.
That order is medium-independent, and will be preserved in a
well-tagged Open Archive: Journal-Name will be a field, etc.

The only do-it-yourself issue is self-archiving itself. And the issue
is very clear: If researchers want the refereed literature freed, now,
then they can do it themselves, by self-archiving, now. Otherwise, they
have to wait until someone else (the journal publishers?) decides to
free it for them -- and that could prove to be a very long wait
indeed.

Harnad, S. (1999) Free at Last: The Future of Peer-Reviewed
Journals.  D-Lib Magazine 5(12) December 1999
http://www.dlib.org/dlib/december99/12harnad.html

 However, when I started archive work in mathematics, we already had an
 array of separate preprint servers cum e-print archives. The effort
 since then has been to reorganize much of this jumble into the math
 arXiv. Having many copies of one huge archive is superior to having
 many little archives, no matter how interoperable. Serious permanence
 and stability requires closer cooperation than that.

Again, it is a question of how long the researcher community is willing
to wait for the optimal and inevitable: It is now within immediate
reach to eliminate all the research access/impact-barriers, now,
through self-archiving. Interoperability will integrate the results
into a global Archive of the entire refereed research literature, in
all disciplines, as searchable as the Institute for Scientific
Information's Current Contents Database -- but including the full-texts
themselves (and free). (See ARC as a prototype and fore-taste of this
capability:  http://arc.cs.odu.edu/)

But note that arXiv-style centralized, discipline-based self-archiving
in Physics, the most advanced self-archiving on the planet -- with
130,000 archived paper in 10 years -- has only freed 30-40% of the
Physics literature so far, and will take 10 more years to free it all
at the present steady linear growth rate:
http://arXiv.org/cgi-bin/show_monthly_submissions

Note that I used to cite the above graph repeatedly as evidence that
the self-archiving cup is half-full. But it is also evidence that it is
still half-empty -- and taking another 10 years to fill.

So the idea is that distributed, pan-disciplinary, institution-based
self-archiving (OAI-compliant, of course) may be what is needed to get
this growth rate into the exponential range for Physics, as well as to
carry it over into all the other disciplines.

Of course multiple copies and mirroring (and harvesting and caching)
will be as important for 

Re: Central vs. Distributed Archives

2000-11-02 Thread Greg Kuperberg
On Thu, Nov 02, 2000 at 03:07:58PM +, Stevan Harnad wrote:
 It is not at all clear why you describe open archiving as anarchic!
 It was precisely in order to put order into distributed online digital
 archiving resources through interoperability that the OAI was
 initiated!

I certainly think that a standard for interoperability could be useful,
but it is wishful thinking to suppose that it can tame an anarchy of many
tiny little e-print archives.  In my discipline, when the literature
is excessively decentralized, as it was entirely before 1998 and still
largely is, neither authors nor readers have any confidence that papers
floating around on the Net are permanent.  And they are right, because
no one could promise to keep those papers forever with any credibility.
Any given paper could be erased accidentally if it is in one tiny
archive somewhere.  Or maybe the maintainer of that particular archive
never explicitly promised permanence anyway; if so he could shut down
his archive when he gets tired.  The fact that the arXiv is so large
and so widely used and mirrored is a necessary ingredient for assuring
permanence.

 The only do-it-yourself issue is self-archiving itself. And the issue
 is very clear: If researchers want the refereed literature freed, now,
 then they can do it themselves, by self-archiving, now.

The self in self-archiving could mean individuals acting for themselves,
or it could mean the research community acting for itself by directly
supporting one or a few archives.  I have the feeling that you don't
see this as an important distinction.  I'll give you an analogy to show
you what I mean.  I use Linux, which an open, standards-based operating
system.  It would be absurd to call my use of Linux self-programming,
even though Linux is maintained by some of its users.  I see the arXiv as
highly analogous to Linux.  This is why I am reluctant to use the phrase
self-archiving.

 Again, it is a question of how long the researcher community is willing
 to wait for the optimal and inevitable: It is now within immediate
 reach to eliminate all the research access/impact-barriers, now,
 through self-archiving.

I can't say that this ambitious goal is within immediate reach in
mathematics, because many of us have worked hard to make it happen and
we see a lot of work ahead.  We can't expect all mathematicians to change
their minds in one day.  I have no desire to believe, as I once did,
that the exponential rocket is about blast off.

If you think that encouraging many small archives to spring up is the
magic step, then I simply disagree.  Because when we glued together
many small archives into the math arXiv, the whole was much more than
the sum of the parts.  Even though the math arXiv has only 5% of new
math papers, and even though it will take years for it to get to even
50%, it is at least growing more quickly than all of the Lilliputian
mathematical archives put together.

  The Los-Alamos based arXiv works well for the TeX-based e-print culture
  in mathematics, physics, and parts of computer science. But it is not
  clear how to extend that particular system to the rest of science.

 Why? This formula has been repeated so many times that people are
 actually believing it, without anyone ever having explained why it
 should be thought to be true!

I don't mean to say that other disciplines can't have an open archive
that's *like* the arXiv.  I certainly think that they can.  I mean that
other disciplines are sufficiently different that their open archives
might need separate administration.  And that would lead to fragmentation,
which concerns me more than it does you.
--
  /\  Greg Kuperberg (UC Davis)
 /  \
 \  / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/
  \/  * All the math that's fit to e-print *


Re: Central vs. Distributed Archives

2000-11-02 Thread Stevan Harnad
On Thu, 2 Nov 2000, Greg Kuperberg wrote:

 I certainly think that a standard for interoperability could be useful,
 but it is wishful thinking to suppose that it can tame an anarchy of many
 tiny little e-print archives. In my discipline, when the literature
 is excessively decentralized, as it was entirely before 1998 and still
 largely is, neither authors nor readers have any confidence that papers
 floating around on the Net are permanent. And they are right, because
 no one could promise to keep those papers forever with any credibility.
 ... The fact that the arXiv is so large and so widely used and
 mirrored is a necessary ingredient for assuring permanence.

(1) Archives meeting the conditions to be registered OAI-compliant
data-providers http://www.openarchives.org/sfc/sfc_archives.htm are
not likely to be tiny little ones (though it is no problem if some of
them are).

(2) Most Eprints Archives are likely to be university-based archives, for
all the university's refereed research, in all its disciplines. That's
hardly tiny (or impermanent) either.

(3) The goal is to free the refereed literature, across disciplines,
now. Once the literature is thus freed the process will be irreversible.

(4) The mechanisms for preserving and navigating it will continue to
evolve and improve, with the whole world's refereed assets in this
distributed basket (suitably mirrored, harvested, cached, backed up,
etc.).

(5) The immediate issue is hence not the PERMANENCE of the
self-archived drafts but their EXISTENCE, free for all, now. The
permanence will take care of itself.

 The self in self-archiving could mean individuals acting for themselves,
 or it could mean the research community acting for itself by directly
 supporting one or a few archives. I have the feeling that you don't
 see this as an important distinction.

You are right; I think it is a red herring. Most of the individuals in
question (the authors of the refereed literature) are researchers at
universities and research institutions. In principle each of them could
set up his own Eprints Archive and register it with the OAI (and that
would be fine as a start, and would free the literature irreversibly).

But of course the likely, practical strategy is for the researchers'
universities and research institutions (or, more specifically, their
libraries) to create and administer their institutional Eprint Archives
for all their researchers' refereed output, in all disciplines. (We can
have at least as abiding a faith in the durability of the collections
on universities' airwaves, then, as we now have in the durability of
the collections on their shelves).

 I can't say that this ambitious goal is within immediate reach in
 mathematics, because many of us have worked hard to make it happen and
 we see a lot of work ahead. We can't expect all mathematicians to change
 their minds in one day.

You are now talking about something else: You are talking about what it
will take to induce the research cavalry to drink, once they have been led
to the waters of self-archiving.

There's no second-guessing human nature, but my own hunch is that the
motivational structure at the researchers' own institution -- the one
that benefits from (and rewards) the impact of its own researchers'
refereed output, and the one that is today weighed down by the serials
crisis and the limitations that that puts on its own researchers'
access to the refereed output of researchers at other institutions --
may provide just the kind of local incentive for self-archiving that a
centralized, discipline based entity so far seems unable to provide.

In any case, these two routes to the liberation of the refereed corpus
(centralized and distributed) are complementary (and interoperable!).

 If you think that encouraging many small archives to spring up is the
 magic step, then I simply disagree. Because when we glued together
 many small archives into the math arXiv, the whole was much more than
 the sum of the parts. Even though the math arXiv has only 5% of new
 math papers, and even though it will take years for it to get to even
 50%, it is at least growing more quickly than all of the Lilliputian
 mathematical archives put together.

I am not a mathematician, but this whole is greater than the sum of its
parts argument does not add up for me!

Centralized archiving in maths is at 5% and will take years to get to
50%. What possible reason would there be not to encourage complementing it
by institutional Eprint Archives immediately -- given that they will all be
co-harvested (and mirrored, and cached, etc.) in global virtual archives
anyway, thanks to interoperability?

 other disciplines are sufficiently different that their open archives
 might need separate administration. And that would lead to fragmentation,
 which concerns me more than it does you.

My concern is freeing the refereed literature online, now. There is no
reason it should stay hostage to S/L/P barriers for another 

Re: Central vs. Distributed Archives

2000-11-02 Thread Greg Kuperberg
On my other points:

On Thu, Nov 02, 2000 at 03:07:58PM +, Stevan Harnad wrote:
 I have, as moderator, terminated discussion on a few irrelevant or
 saturated topics (is there a conspiracy of university administrators to
 control researchers' intellectual property? is the library serials
 crisis simply a consequence of under-funding the libraries? how can we
 reform or abandon peer review?), but comments, whether supportive or
 critical, on the Forum's central theme -- How to free the refereed
 literature online, now? -- have never been suppressed.

You may see it as closing discussion of all sides of a topic, but I see
some character of closing down just one side of a debate.  Obviously you
are referring to Al Henderson's argument that free scholarly communication
is a stress response to penny-pinching by university administrations.
I'll grant that he has said that many times, and I'll also grant that the
argument sounds absurd to me.  (I am one of the researchers supposedly
bullied by the administration, and if anything my complaint is that
the higher-ups are biased in favor of the historical subscription-based
system.)  But even though I don't agree with him at all, he is no more
repetitive than you are or I am.  Invoking cloture strikes me as an
overreaction.

 I couldn't agree with you more! But what gives you the impression that
 this Forum is trying to prevent companies from doing whatever they
 like?

What you said originally was:

   The Elsevier policy of publicly archiving pre-refereeing preprints
   could be a good first step towards the optimal and inevitable, but it
   is also possible that it is intended as a Trojan Horse,...

I think it's divisive to speculate that someone else's e-print archive is
a Trojan Horse.  It's true that I'm not sure that the CPS is compatible
with Elsevier's mission of maximizing profit.  But let's give it the
benefit of the doubt.
--
  /\  Greg Kuperberg (UC Davis)
 /  \
 \  / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/
  \/  * All the math that's fit to e-print *


Re: Central vs. Distributed Archives

2000-11-02 Thread Stevan Harnad
On Thu, 2 Nov 2000, Greg Kuperberg wrote:

  what gives you the impression that
  this Forum is trying to prevent companies from doing whatever they
  like?

 What you said originally was:

sh The Elsevier policy of publicly archiving pre-refereeing preprints
sh could be a good first step towards the optimal and inevitable, but it
sh is also possible that it is intended as a Trojan Horse,...

 I think it's divisive to speculate that someone else's e-print archive is
 a Trojan Horse.  It's true that I'm not sure that the CPS is compatible
 with Elsevier's mission of maximizing profit.  But let's give it the
 benefit of the doubt.

Good. Both sides of the question have been aired.

(Please distinguish my actions as moderator, when I invoke cloture,
from the expression of my own views on this topic -- which carry no
more weight then anyone else's ex officio.)

Stevan Harnad


Re: Central vs. Distributed Archives

2000-11-02 Thread Stuart A Yeates
Stevan Harnad wrote:

 (3) The goal is to free the refereed literature, across
 disciplines, now. Once the literature is thus freed the
 process will be irreversible.

Do you mean free as in liberty or free as in free beer ?

This particular bone of contention has effectively split what used to be be
known as a free software movement, but is now known as the free software/open
source movement.


--stuart yeates s.yea...@cs.waikato.ac.nz


Re: Central vs. Distributed Archives

2000-11-02 Thread Stevan Harnad
On Fri, 3 Nov 2000, Stuart A Yeates wrote:

  (3) The goal is to free the refereed literature, across
  disciplines, now. Once the literature is thus freed the
  process will be irreversible.

 Do you mean free as in liberty or free as in free beer ?

 This particular bone of contention has effectively split what used to be be
 known as a free software movement, but is now known as the free software/open
 source movement.

Free in the way advertisements are free (which I suppose is more like
free beer -- when you're giving away your own home-brew).

But this refereed brew is definitely not free in the sense of liberty
(that would be the vanity press). It is constrained by and answerable to
peer review. Hence it is not relevantly like software either.

But once it successfully passes that quality-control process, and is
certified as such, the author can and should maximize the access to,
and hence the impact of this give-away refereed research by
self-archiving it online, free for all.

http://www.arl.org/sc/subversive/


Stevan Harnad har...@cogsci.soton.ac.uk
Professor of Cognitive Sciencehar...@princeton.edu
Department of Electronics and phone: +44 23-80 592-582
 Computer Science fax:   +44 23-80 592-865
University of Southampton http://www.ecs.soton.ac.uk/~harnad/
Highfield, Southamptonhttp://www.princeton.edu/~harnad/
SO17 1BJ UNITED KINGDOM


Re: Central vs. Distributed Archives

2000-11-02 Thread Stevan Harnad
I like Greg Kuperberg's postings, even though we disagree. Greg too is an
advocate of freeing the literature through author self-archiving, but he
prefers centralized archives, whereas I think both centralized and
distributed archiving are welcome and should be encouraged, as both can
hasten the freeing of the refereed literature.

Centralized archiving has been with us for over 10 years, and at its
current rates it will take 10 more years to free the Physics literature
alone, where it is most advanced. In Greg's own field of mathematics,
it might be going even more slowly. It looks to me as if centralized
self-archiving can now use the help of distributed institutional
self-archiving.

By way of counterevidence, Greg cites the fact that in mathematics
institutional self-archiving predated centralized self-archiving
and was unreliable. It was centralized self-archiving that accelerated
and stabilized the process.

What Greg seems to overlook is that the institutional self-archiving he
describes PRE-DATED the Open Archives Initiative (OAI), with its
interoperability. Hence the question of whether or not distributed
self-archiving in OAI-compliant Institutional Eprint Archives will
accelerate the freeing of the literature has not yet been tested.

Greg also seems to conflate, at some junctures, the self-archiving of
unrefereed preprints with the self-archiving of refereed postprints,
as if self-archiving were in some sense a rival to or substitute for
refereed publication (which I certainly do not think it is);
self-archiving is merely a way to free the refereed literature.

On Thu, 2 Nov 2000, Greg Kuperberg wrote:

 In 1997, the year before the universal math arXiv was started, there
 were already some 10 or 20 thousand research papers freely available on
 the web. Most of them were on personal home pages, but thousands were
 in institutional and subject-based preprint series.

This is irrelevant, as noted above. These archives were not
OAI-compliant and hence could not be integrated or navigated in a
useful way.

 Nonetheless the vast majority of these papers were still eventually
 sold as published papers.

This too is irrelevant. The initiative to free the refereed literature
is a PRO-RESEARCHER and PRO-RESEARCH initiative, not an anti-publisher
initiative (nor even particularly a pro-library initiative):

The goal is to free the refereed literature for one and all online.
That is what self-archiving does.

The goal is NOT to prevent other versions of the refereed literature
from being sold, on-paper or on-line, if there is a market for them.
(Why would we want to do that?)

 So what were the publishers selling? Not peer review, because you
 can learn from Math Reviews where a paper has been published without
 subscribing to the journal. To a large extent the journal system was
 selling, and is still selling, stability and permanence.

Fine. Let it continue to do so (whether the stability/permanence is real
or merely imagined). As long as another version is online and free, the
goal is met.

 So that has been the fundamental question of open archival in
 mathematics for years. That is why some of the recalcitrant math
 publishers say that the arXiv is just a preprint server and not a
 permanent e-print archive. Of course I don't agree with them; I
 choose the arXiv over subscription journals as the future route to
 permanent archival.

I'm afraid that this is not making sense to me. What is the argument?
That the jeering of some publishers nullifies the fact that that portion
of the refereed literature that has been freed is indeed free?

The substantive question is: Are the refereed papers online and free? If
they are, who cares if some people keep calling them prepints, when in
reality they include both, pre-refereeing preprints + post-refereeing
postprints (= eprints)?

But I sense another point of disagreement with Greg: Earlier he said
it's not the peer-review that makes people keep paying for the for-fee
(refereed) version despite the availability of the for-free (refereed)
version, but the stability and permanence. Perhaps. But if the
implementation of the peer-review were no longer paid for by the
continued support for the publishers' version, perhaps the true value
and causal role of peer-review in all of this would become clearer.

Moreover, for now, it is not true stability/permanence that
distinguishes the publishers' for-fee version and the archives'
for-free version, but mere PERCEIVED stability/permanence.

With time, that may change. But for now it certainly isn't any reason to
deter us from self-archiving, either centrally or institutionally. On
the contrary; as long as the publishers' for-fee version is seen as the
guarantor of the stability/permanence, there is no reason whatever NOT
to SUPPLEMENT that with the self-archived free version -- without giving
the stability/permanence issue another thought!

 As a practical matter most of the institutional preprint series in
 mathematics 

Re: Central vs. Distributed Archives

2000-11-02 Thread Stevan Harnad
On Fri, 3 Nov 2000, Stuart A Yeates wrote:

 So if I hear you correctly OAI will have no traffic with technical reports or
 technical report servers? these _are_ vanity press.

Incorrect. Eprints Archives are for both unrefereed preprints and
refereed postprints, suitably tagged as such.

Stevan Harnad


Re: Central vs. Distributed Archives

2000-11-02 Thread Greg Kuperberg
On Thu, Nov 02, 2000 at 09:29:24PM +, Stevan Harnad wrote:
 Centralized archiving has been with us for over 10 years, and at its
 current rates it will take 10 more years to free the Physics literature
 alone, where it is most advanced. In Greg's own field of mathematics,
 it might be going even more slowly. It looks to me as if centralized
 self-archiving can now use the help of distributed institutional
 self-archiving.

Actually the main difference in math is that we in effect started
later than physics did.  Part of the reason for that is that some of
the mathematicians involved, including me but not mainly me by any
means, instead devoted effort to umbrella archive projects (i.e.,
global virtual archives) that ultimately failed.  We have had much
more success by moving in the opposite direction, i.e., by strengthening
distributed open archival with a centralized foundation.

 What Greg seems to overlook is that the institutional self-archiving he
 describes PRE-DATED the Open Archives Initiative (OAI), with its
 interoperability.

This is partly untrue.  The MPRESS project (http://mathnet.preprints.org/)
has a lot in common with OAI, and it was started before the universal
math arXiv.  It has its own metadata standard, Dublin Core, and its
has a number of institutional preprint series among its data feeds.
But it hasn't yet caught on.  It doesn't seem to make much difference to
authors whether a preprint series is indexed by MPRESS or not.  Part of
the trouble with MPRESS is that not all of its sources are providing
as good metadata as they promised.  Ironically the lion's share of good
metadata in MPRESS comes from the math arXiv.

I would like to know where OAI thinks that MPRESS went wrong.  In fact
since I maintain a service provider for the math arXiv, I looked into
using OA-compliant metadata instead of the ad hoc metadata that I get from
the arXiv.  I discovered that the OA standard is an oversimplification
of the full arXiv metadata record, to the point that I can't use the
OA format.

But don't get me wrong.  I am in favor of fragmented interoperability if
you really can't hope for something better.  And as I said, the overall
STM literature might well have to be fragmented, for now, down to the
level of individual disciplines (e.g. chemistry) or small groups of
disciplines (physics+math+cs).
--
  /\  Greg Kuperberg (UC Davis)
 /  \
 \  / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/
  \/  * All the math that's fit to e-print *


Re: Central vs. Distributed Archives

2000-11-02 Thread Steve Hitchcock

At 21:29 02/11/00 +, Stevan Harnad wrote:

 Obviously I'm not a conservative offering rationales for inaction.
 And my worry is not a priori. NCSTRL and MPRESS are two long-standing
 attempts at standards-based fragmented interoperability. Neither one
 has as much readership as the younger, fully integrated math arXiv.

They pre-dated OAI and Eprints. Have just a bit more patience; but be
prepared to set aside prior prejudices or you will obstruct precisely
what we both want to facilitate!


NCSTRL was effectively the model for OAi. Greg Kuperberg suggests that
NCSTRL has not been successful. It would be useful to have some meaningful
measure of whether NCSTRL has been successful or not, and to hear the views
of the NCSTRL developers (who are also involved in OAi). Maybe real
evidence will yield clues to the ultimate destiny of OAi - central or
distributed.

The Harnad-Kuperberg dialogue has been fascinating but, to my mind, hasn't
resolved the issue conclusively. It will be critical to understand what the
user wants.

Steve


Re: Central vs. Distributed Archives

2000-11-02 Thread Michael L. Nelson
(note: I'm not sure this will get through all the aliases -- I don't think
this email addr is registered with the UPS list, for example)

On Thu, 2 Nov 2000, Steve Hitchcock wrote:

 NCSTRL was effectively the model for OAi. Greg Kuperberg suggests that
 NCSTRL has not been successful. It would be useful to have some meaningful
 measure of whether NCSTRL has been successful or not, and to hear the views
 of the NCSTRL developers (who are also involved in OAi). Maybe real
 evidence will yield clues to the ultimate destiny of OAi - central or
 distributed.


just a point of clarification:  NCSTRL was not directly the model for OAI,
at least architecturally.

OAI has more in common with:

- RePEc (http://www.repec.org/)
- SODA (http://www.dlib.org/dlib/march99/maly/03maly.html)

and similar architectures.

A subset of the Dienst protocol gave us a starting ground for defining a
harvesting protocol, but even that has been relaxed to allow Dienst and
OAI to progress independently.

Most OAI service providers will probably assume a distributed storage
model, because it is certainly easier to build.  But technically OAI is
agnostic with respect to centralized vs. distributed storage of data.
OAI focuses only on metadata.

Regarding centralized vs. distributed, I would submit CiteSeer

http://citeseer.nj.nec.com/cs

as an exemplary DL that seems to have resolved the tension between the two
models - providing both links to distributed copies and cached centralized
copies.

regards,

Michael

 The Harnad-Kuperberg dialogue has been fascinating but, to my mind, hasn't
 resolved the issue conclusively. It will be critical to understand what the
 user wants.

 Steve


 --
 UPS mail list
 Mail submissions to u...@vole.lanl.gov
 To subscribe or unsubscribe visit http://vole.lanl.gov/mailman/listinfo/ups


---
Michael L. Nelson
207 Manning Hall, School of Information and Library Science
University of North Carolinam...@ils.unc.edu
Chapel Hill, NC 27599   http://ils.unc.edu/~mln/
+1 919 966 5042 +1 919 962 8071 (f)


Re: Central vs. Distributed Archives

2000-11-02 Thread Greg Kuperberg
On Thu, Nov 02, 2000 at 10:08:09PM +, Steve Hitchcock wrote:
 NCSTRL was effectively the model for OAi. Greg Kuperberg suggests that
 NCSTRL has not been successful.

I don't want to disparage a project as big and difficult as NCSTRL.
It has had some success.  It's important.  But I don't think that it's
nearly as successful as the arXiv.  I guess I said something stronger
before, that NCSTRL is not as heavily read as the math arXiv, which
is much smaller than the whole arXiv system.  Well possibly I'm wrong
on that.  But I note that the math arXiv is just as heavily read on a
per-paper basis as the larger parent arXiv system.
--
  /\  Greg Kuperberg (UC Davis)
 /  \
 \  / Visit the Math ArXiv Front at http://front.math.ucdavis.edu/
  \/  * All the math that's fit to e-print *