[GOAL] Re: G8 Science Ministers endorse open access

2013-06-17 Thread Tim Brody
On Sun, 2013-06-16 at 16:15 -0400, Stevan Harnad wrote:

[snip]
 In backing down on Gold (good), Finch/RCUK, nevertheless failed to
 provide any
 monitoring mechanism for ensuring compliance with Green (bad). It only
 monitors
 how Gold money is spent.
 
 
 Finch/RCUK also backed down on monitoring OA embargoes (which is bad,
 but
 not as bad as not monitoring and ensuring immediate deposit.)

By Finch/RCUK do you mean the current RCUK guidance, because section
3.14 of:
http://www.rcuk.ac.uk/documents/documents/RCUKOpenAccessPolicy.pdf

Is all about monitoring for gold *and green* (including embargoes)?

measure the impact of Open Access across the landscape including use of
both immediate publishing (‘Gold’) and the use of repositories(‘Green’),
and

For articles which are not made immediately open access ... a statement
of the length of the embargo period [will be required]

I spent last Friday at a workshop of UK EPrints users that was all about
how we're going to report open access compliance to RCUK.

-- 
All the best,
Tim


signature.asc
Description: This is a digitally signed message part
___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal


[GOAL] Re: Harnad Comments on Proposed HEFCE/REF Green Open Access Mandate

2013-03-21 Thread Tim Brody
Hi Arthur,

I don't understand how a link is more useful than a copy (although
obviously having both is preferable)?

Let us say that either a) an author imports a record from a publisher
with link or b) pastes a link into the repository. Either way, that link
tells us nothing about the state of the item on the publisher's web site
(gold or green). As you say, you could hit a jump-off page, robots
challenge or otherwise.

In order to say for certain that the link given is as the metadata
describes, and that the item is available under the correct license,
someone will have to visit the publisher's site (from a public IP) and
set a flag to say 'verified'.

By comparison, taking a copy is little extra effort and the institution
can say unambiguously that they have an open access copy. It also has
the added benefit of guaranteeing long-term access to that work. After
all, that is what libraries have been doing for hundreds of years with
paper.

--
All the best,
Tim.

On Wed, 2013-03-20 at 08:54 +1100, Arthur Sale wrote:
 Thanks Tim. No I don't think I missed the point.
 
 I agree that certification of all *repository* documents for REF (or
 in our case ERA) is the same, whether the source is from a
 subscription or an open access source. The point is that repository
 documents are divorced from the source, and are therefore suspect.
 Researchers are as human as everyone else, whether by error or fraud.
 However, Gold is slightly easier to certify (see next para), even
 leaving aside the probability that the institution may not subscribe
 to all (non-OA) journals or conference proceedings.
 
 One of the reasons I argue that the ARC policy of requiring a link to
 OA (aka Gold) journal articles (rather than taking a copy) is that one
 compliance step is removed. The link provides access to the VoR at its
 canonical source, and there can be no argument about that. Taking a
 copy inserts the necessity of verifying that the copy is in fact what
 it purports to be, and relying on the institution's certification.
 
 May I strongly urge that EPrints, if given a URL to an off-site
 journal article, at the very least *inserts* the URL (or other
 identifier) into a canonic source link piece of metadata, whether or
 not it bothers about making a copy (which function should be able to
 be suppressed by the repository administrator as a repository-wide
 option).
 
 One of the problems that the take-a-copy crowd ignore, is that the
 link to a Gold article might in fact not be direct to the actual VoR,
 but to a guardian cover page. This cover page might contain
 publisher advertising or licence information before the actual link,
 or it might require one to comply with free registration maybe even
 with a CAPTCHA. It may be protected with a robots.txt file. No matter,
 the article is still open access, even though repository software may
 not be able to access it. (Drawn to my attention by private
 correspondence from Petr Knoth.)
 
 Arthur Sale
 
 -Original Message-
 From: goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] On Behalf Of 
 Tim Brody
 Sent: Tuesday, 19 March 2013 9:19 PM
 To: Global Open Access List (Successor of AmSci)
 Subject: [GOAL] Re: Harnad Comments on Proposed HEFCE/REF Green Open Access 
 Mandate
 
 Hi Arthur,
 
 I think you missed the point I was trying to make. The statement I was 
 responding to was that gold includes everything you need to audit against 
 (UK) funder compliance and the same can not be said for Green.
 
 I have no wish to debate the merits of gold vs. green, beyond pointing out 
 that publisher-provided open access is no easier to audit than 
 institution-provided open access. Indeed, if institutions are doing the 
 reporting (as they will in the UK) an OA copy in the repository is easier to 
 report on than a copy held only at the publisher.
 
 I don't know where Graham got the idea that gold will make auditing easier. 
 Whether the publisher provides an OA copy or the author, all the points you 
 make apply equally.
 
 --
 All the best,
 Tim.
 
 On Tue, 2013-03-19 at 08:40 +1100, Arthur Sale wrote:
  Tim, you oversimplify the auditing of green. Try this instead, which is 
  more realistic.
  For green, an institution needs to:
  
  1) Require the author uploads a file. Timestamp the instant of upload.
  
  (1A) Check that the file gives a citation of a journal or conference 
  published article, and that the author is indeed listed as a co-author. You 
  might assume this, but not for auditing. EPrints can check this.
  
  (1B) Check that the refereeing policy of the journal or conference complies 
  with the funder policy. This is absolutely essential. There are 
  non-compliant examples of journals and conferences. More difficult to do 
  with EPrints, but possible for most.
  
  (1C) Check that the file is a version (AM or VoR) of the cited published 
  article. This requires as a bare minimum checking the author list and the 
  title from the website metadata

[GOAL] Re: Harnad Comments on Proposed HEFCE/REF Green Open Access Mandate

2013-03-19 Thread Tim Brody
Hi Arthur,

I think you missed the point I was trying to make. The statement I was
responding to was that gold includes everything you need to audit
against (UK) funder compliance and the same can not be said for Green.

I have no wish to debate the merits of gold vs. green, beyond pointing
out that publisher-provided open access is no easier to audit than
institution-provided open access. Indeed, if institutions are doing the
reporting (as they will in the UK) an OA copy in the repository is
easier to report on than a copy held only at the publisher.

I don't know where Graham got the idea that gold will make auditing
easier. Whether the publisher provides an OA copy or the author, all the
points you make apply equally.

--
All the best,
Tim.

On Tue, 2013-03-19 at 08:40 +1100, Arthur Sale wrote:
 Tim, you oversimplify the auditing of green. Try this instead, which is more 
 realistic.
 For green, an institution needs to:
 
 1) Require the author uploads a file. Timestamp the instant of upload.
 
 (1A) Check that the file gives a citation of a journal or conference 
 published article, and that the author is indeed listed as a co-author. You 
 might assume this, but not for auditing. EPrints can check this.
 
 (1B) Check that the refereeing policy of the journal or conference complies 
 with the funder policy. This is absolutely essential. There are non-compliant 
 examples of journals and conferences. More difficult to do with EPrints, but 
 possible for most.
 
 (1C) Check that the file is a version (AM or VoR) of the cited published 
 article. This requires as a bare minimum checking the author list and the 
 title from the website metadata, but for rigorous compliance the institution 
 needs to be able to download the VoR for comparison (ie have a subscription 
 or equivalent database access). [In Australia we do spot checks, as adequate 
 to minimize fraud. Somewhat like a police radar speed gun.] [Google Scholar 
 does similar checks on pdfs it finds.] EPrints probably can't help.
 
 2) Make it public after embargo. In other words enforce a compulsory upper 
 limit on embargos, starting from the date of upload of uncertain provenance 
 (see 3). EPrints can do this.
 
 3) Depending on the importance of dates, check that the upload date of the 
 file is no later than the publication date. The acceptance date is unknowable 
 by the institution (usually printed on publication in the VoR, but not 
 always), and then requires step 1C to determine after the event. Doubtful 
 that EPrints can do this.
 
 4) Require every potential author to certify that they have uploaded every 
 REF-relevant publication they have produced. Outside EPrints responsibility, 
 apart from producing lists on demand for certification.
 
 I just adapted this from your constraints on gold, and common Australian 
 practice in the ERA and HERDC, which have long been audited.
 
 Arthur Sale
 
 -Original Message-
 From: goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] On Behalf Of 
 Tim Brody
 Sent: Monday, 18 March 2013 8:45 PM
 To: Global Open Access List (Successor of AmSci)
 Subject: [GOAL] Re: Harnad Comments on Proposed HEFCE/REF Green Open Access 
 Mandate
 
 On Sat, 2013-03-16 at 08:05 -0400, Stevan Harnad wrote:
  On Sat, Mar 16, 2013 at 5:14 AM, Graham Triggs 
  grahamtri...@gmail.com wrote:
  
 
  
  2) By definition, everything that you require to audit Gold is
  open, baked into the publication process, and independent of
  who is being audited.  The same can not be said for Green.
 
 RCUK and HEFCE will require institutions to report on, respectively, the APC 
 fund and REF return.
 
 For gold, an institution needs to:
 
 1) Determine whether the journal policy complies with the funder policy.
 
 2) Run an internal financial process to budget for and pay out the APC.
 
 3) Check whether the item was (i) published (ii) published under the correct 
 license.
 
 4) (For REF) take a copy of the published version.
 
 For green, an institution needs to:
 
 1) Require the author uploads a version.
 
 2) Make it public after embargo.
 
 
 So, actually I think green is easier to audit than gold. Even if it were as 
 you say, it will still be the institution that is tasked with auditing. For 
 most institutions that will be done through their repository (or 
 cris-equivalent). It therefore follows that green (Do I have a public copy?) 
 will be no more difficult than gold (Do I have a publisher CC-BY copy?).
 
 (Commercial interest - as EPrints we have built tools to make the REF return 
 and are working on systems to audit gold and green for RCUK
 compliance.)
 
 --
 All the best,
 Tim
 
 
 
 ___
 GOAL mailing list
 GOAL@eprints.org
 http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal



signature.asc
Description: This is a digitally signed message part
___
GOAL mailing list
GOAL@eprints.org
http

[GOAL] Re: Harnad Comments on Proposed HEFCE/REF Green Open Access Mandate

2013-03-18 Thread Tim Brody
On Sat, 2013-03-16 at 08:05 -0400, Stevan Harnad wrote:
 On Sat, Mar 16, 2013 at 5:14 AM, Graham Triggs
 grahamtri...@gmail.com wrote:
 

 
 2) By definition, everything that you require to audit Gold is
 open, baked into the publication process, and independent of
 who is being audited.  The same can not be said for Green.

RCUK and HEFCE will require institutions to report on, respectively, the
APC fund and REF return.

For gold, an institution needs to:

1) Determine whether the journal policy complies with the funder policy.

2) Run an internal financial process to budget for and pay out the APC.

3) Check whether the item was (i) published (ii) published under the
correct license.

4) (For REF) take a copy of the published version.

For green, an institution needs to:

1) Require the author uploads a version.

2) Make it public after embargo.


So, actually I think green is easier to audit than gold. Even if it were
as you say, it will still be the institution that is tasked with
auditing. For most institutions that will be done through their
repository (or cris-equivalent). It therefore follows that green (Do I
have a public copy?) will be no more difficult than gold (Do I have a
publisher CC-BY copy?).

(Commercial interest - as EPrints we have built tools to make the REF
return and are working on systems to audit gold and green for RCUK
compliance.)

-- 
All the best,
Tim


signature.asc
Description: This is a digitally signed message part
___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal


[GOAL] Re: Wikipedia founder to help in [UK] government's research scheme

2012-05-02 Thread Tim Brody
On Wed, 2012-05-02 at 19:00 +0900, Andrew A. Adams wrote:
  The [UK] government has drafted in the Wikipedia founder Jimmy
  Wales to help make all taxpayer-funded academic research in Britain
  available online to anyone who wants to read or use it.
  
 I was hoping that the new government might be less star-struck than the 
 previous one. Plus ca change, plus ca meme chose, it would seem. We really 
 don't need Jimmy Wales advising on this. The team behind eprints has been 
 (with minimal funding) developing the technology needed for many years and 
 there are many academics in the UK much better versed in the intricacies of 
 UK academic work and life than Mr Wales. Sigh. I foresee another lost couple 
 of years wasted on this instead of getting to grips with the known problem 
 and the known solution (including providing better funding for eprints 
 development to the team that created it and still does the software 
 engineering for it).

Thanks for the kudos.

This article did take me to the UK.gov working group:
http://www.researchinfonet.org/publish/wg-expand-access/

Unfortunately they seem to have a focus on big deal licensing (!) and
author-pays economics. I haven't heard anything from their institutional
repository sub-group, although there are a lot of layers between me and
them ... hopefully IRs - a solution to access - won't get drowned out by
licensing/author-pays reform - a solution to library budget constraints
- in their report.

In terms of the UK Gateway to Research I expect that is the political
equivalent to data.gov.uk. It doesn't make much sense to have national
gateways as a research tool and anyway in implementation I can't see
much chance of a one solution to rule them all working. In all
likelihood we will continue as we are - institutional based
EPrints/DSpaces/etc. that are harvested into a central tool for tracking
mandate compliance and value for money for UK spending.
(This is already in the pipeline with the RCUK ROS system - most likely
using something like CERIF to share data within and between
institutions, funders, and the UK and EU governments)

-- 
Tim Brody

School of Electronics and Computer Science
University of Southampton
Southampton
SO17 1BJ
United Kingdom

Email: t...@ecs.soton.ac.uk
Tel: +44 (0)23 8059 7698



[ Part 1.2, This is a digitally signed message part ]
[ Application/PGP-SIGNATURE (Name: signature.asc) 501 bytes. ]
[ Unable to print this part. ]


[ Part 2: Attached Text ]

___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal



[GOAL] Re: RCUK Open Access Feedback

2012-03-19 Thread Tim Brody
On Sun, 2012-03-18 at 21:28 +0900, Andrew A. Adams wrote:
 David Prosser wrote:
  Say I wanted to data mine 10,000 articles.  I'm at a university, but I am c=
  o-funded by a pharmaceutical company and there is a possibility that the re=
  search that I'm doing may result in a new drug discovery, which that compan=
  y will want to take to market.  The 10,000 articles are all 'open access', =
  but they are under CC-BY-NC-SA licenses.  What mechanism is there by which =
  I can contact all 10,000 authors and gain permission for my research?
 
 
 The intent of CC-NC is that one cannot take the original material, re-mix it 
 (or even just as-is) and sell the resulting new work. It does not mean that 
 the information it contains cannot be used in a commercial setting, but that 
 the expression it contains cannot be used in a commercial setting. A simple 
 example is that a CC-NC licensed book cannot be recorded as an audio play 
 which is then sold. If one makes an audio book it must be available for free. 
 However, copies of a CC-NC book can be distributed to students who are paying 
 for a course in English literature as one of the books studied.

I don't understand this concern about 'NC' (non-commercial). I
understood that the give-away open access literature was given-away by
authors precisely because the motivation for publishing publicly funded
research is not for direct commercial gain. Instead, authors derive
impact from others reading and citing their work.

If a company were to create and sell an audio version of a research work
then that increases the author's impact. That doesn't preclude someone
else creating a for-free audio version, nor readers accessing the
original self-archived or gold-OA text version.

OA is not about anti-capitalism - if someone can take the resource (OA
research literature), add value and re-sell it (with suitable
attribution) then that can only be to the advantage of authors and
readers.

-- 
Tim Brody

School of Electronics and Computer Science
University of Southampton
Southampton
SO17 1BJ
United Kingdom

Email: tdb2 at ecs.soton.ac.uk
Tel: +44 (0)23 8059 7698
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: This is a digitally signed message part
Url : 
http://mailman.ecs.soton.ac.uk/pipermail/goal/attachments/20120319/5d1bd3a8/attachment.bin
 


[GOAL] Re: RCUK Open Access Feedback

2012-03-19 Thread Tim Brody
On Sun, 2012-03-18 at 21:28 +0900, Andrew A. Adams wrote:
 David Prosser wrote:
  Say I wanted to data mine 10,000 articles.  I'm at a university, but I am c=
  o-funded by a pharmaceutical company and there is a possibility that the re=
  search that I'm doing may result in a new drug discovery, which that compan=
  y will want to take to market.  The 10,000 articles are all 'open access', =
  but they are under CC-BY-NC-SA licenses.  What mechanism is there by which =
  I can contact all 10,000 authors and gain permission for my research?
 
 
 The intent of CC-NC is that one cannot take the original material, re-mix it 
 (or even just as-is) and sell the resulting new work. It does not mean that 
 the information it contains cannot be used in a commercial setting, but that 
 the expression it contains cannot be used in a commercial setting. A simple 
 example is that a CC-NC licensed book cannot be recorded as an audio play 
 which is then sold. If one makes an audio book it must be available for free. 
 However, copies of a CC-NC book can be distributed to students who are paying 
 for a course in English literature as one of the books studied.

I don't understand this concern about 'NC' (non-commercial). I
understood that the give-away open access literature was given-away by
authors precisely because the motivation for publishing publicly funded
research is not for direct commercial gain. Instead, authors derive
impact from others reading and citing their work.

If a company were to create and sell an audio version of a research work
then that increases the author's impact. That doesn't preclude someone
else creating a for-free audio version, nor readers accessing the
original self-archived or gold-OA text version.

OA is not about anti-capitalism - if someone can take the resource (OA
research literature), add value and re-sell it (with suitable
attribution) then that can only be to the advantage of authors and
readers.

-- 
Tim Brody

School of Electronics and Computer Science
University of Southampton
Southampton
SO17 1BJ
United Kingdom

Email: t...@ecs.soton.ac.uk
Tel: +44 (0)23 8059 7698



[ Part 1.2, This is a digitally signed message part ]
[ Application/PGP-SIGNATURE (Name: signature.asc) 501 bytes. ]
[ Unable to print this part. ]


[ Part 2: Attached Text ]

___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal



Re: OA Archives: Full-texts vs. metadata-only and other digital objects

2005-06-13 Thread Tim Brody

Tim Gray wrote:

Stevan

Thank you for your full and illuminating reply to my query about how much
material in OA archives is available as full text. I am surprised at how
low you estimate the figure to be and that it is not, yet, possible to
produce a definitive number.


Knowing the difference between a full-text (also whether it's
scholarly/published/peer-reviewed) is something in the realm of Google
Scholar, Citeseer etc.

Without wishing to recreate one of those services I don't know of a
method for producing a definitive number. I suspect simple approaches
(e.g. does record have PDF link) will be undermined by (sorry for
picking on you!) sites like:
http://library.isibang.ac.in:8080/dspace/
No prizes for spotting why that wouldn't work :-)


I am wondering if the Open DOAR (Directory of Oopen Access Repositories -
the 'sister project' to the Directory of Open Access Journals, DOAJ) will
set strictly 'full text only' rules for inclusion in its directory? And how
will it relate to the archives.eprints directory you are involved with? It
gets confusing to me because there are so many lists of repositories around
on the web. How does the celestial harvesting list you mention relate to
the archives.eprints list (are they the same list?) or the large list kept
by the University of Illinois at Urbana-Champaign (UIUC) at
http://gita.grainger.uiuc.edu/registry/?


Celestial is an OAI cache - it retrieves every metadata record from
those archives I've added to it. To make archives.eprints (IAR) I
stapled together the GNU EPrints listing with Celestial's record counts
(as an aside, anyone can use the records graphs from Celestial). I keep
a firmer technical control of Celestial than I do the IAR.

UIUC is the point of entry to get added to OAIster, but provides
analyses of all *OAI* repositories registered with it. The IAR includes
many archives with no or broken OAI interfaces, as well as aggregates
(e.g. single entry with multiple OAI interfaces). We also collect
additional metadata in the IAR that isn't exposed by OAI (type,
software, etc.). (Not forgetting the registry at www.openarchives.org 
Hussein Suleman's OAI explorer)

My hope and expectation is that OpenDOAR will include some metric of
full-textness. There was also an effort for the recent Amsterdam
SURF/JISC/CNI meeting to ascertain some figures (by survey) for the
content of IRs - I believe that report will be published in the next
month or so.


I take the archives.eprints to be the closest to a definitive list of the
OA Institutional Repositories which we are concerned with here - alhtough I
notice that our 'DSpace@Cambridge' repository
http://www.lib.cam.ac.uk/dspace/index.htm is not included.


Here?
http://archives.eprints.org/index.php?url=http%3A%2F%2Fwww.dspace.cam.ac.uk%2F


I see the distinction between OA Archives and the Open Access Initiative.
Maybe this is not strictly relevant to this forum and a basic
misunderstanding of the purposes of archiving, but I still cannot
understand why people are archiving *just* the metadata and not the full
text. It makes OA search engines like OAIster more like a any other
standard bibliographic database with mostly subscription-only access.


I'm glad to see you're an archivangilist rather than a repologist
('sorry, the full-text isn't available here')!

It's the IR vs Open archives paradigm. The IR serves an institutional
need to *track* as well as to *expose* research output. Tracking
research output does not require making that research available for-free
on the Web. The purpose of Open archives is to make research more
efficient by maximising access to research, hence maximising research
impact.

If a high quality body of freely accessible literature is available
through IR's, then the services that build on them will be more useful.
There are a lot of records appearing out there, but the full-texts
available from ad hoc Web pages still dwarfs IRs. There is also no clear
distinction between prestigious research and the capture all
philosophy - administrators and authors need to realise that what they
put into the IR may very well turn up on automated CVs, and they
probably don't want to have their high-impact peer-reviewed articles
hidden amongst 1000's of powerpoint slides!

Sincerely,
Tim Brody tdb...@ecs.soton.ac.uk
Administrator, Institutional Archives Registry
http://archives.eprints.org/


Re: BBC cites a preprint from arXiv

2005-05-24 Thread Tim Brody

Eric F. Van de Velde wrote:

http://news.bbc.co.uk/2/hi/science/nature/4564477.stm

Is this a first? I.e., a major news organization uses unrefereed
self-archived preprint as the basis of a news story. Although not a
major hard-news story, it was posted on the main page of the BBC news
web site. Does this point to the growing acceptance of Open
Archives and/or of arXiv? Does it point to a growing disregard for peer
review (at least, outside of the academy)?


There's this previous occasion (New Scientist, but citing arXiv
'publication'):
http://news.bbc.co.uk/1/hi/sci/tech/4357613.stm

The paper in question is http://arxiv.org/abs/hep-th/0504003.

I would cynically suggest the story has more to do with promoting Doctor
Who than it does a breakthrough in theoretical wormhole physics (or peer
review).

All the best,
Tim.


Re: Open Access vs. NIH Back Access and Nature's Back-Sliding

2005-02-04 Thread Tim Brody

Brian Simboli wrote:


(Worries that
people will merely use OAIster or google to bring up all the articles
for a given issue can be circumvented if the journal title is suppressed
in the metadata for the freely available article.)


This wouldn't help citation linking, which is already pretty patchy.
Anyway, I think you'll find autonomous services already get around
missing metadata through triangulation!


Interestingly, aren't the physics societies right now partially
committed to something like a de facto subscription overlay model, in
that many physics peer-reviewed postprints are being archived on
arxiv.org and are therefore freely accessible? Why shouldn't the physics
socieities then just directly link to the postprints at arxiv.org,
obviating the need for authors to engage in duplicative, afterglow
self-archiving efforts?  Or is it the case that, if only a portion of
articles published by the physics societies have self-archived
counterparts on arxiv, the tipping point has not been reached yet
where it becomes not in their economic interest to allow access to a
free copy (via author self-archiving)?


I believe that some physics societies will accept *submissions* from a
pre-print server, but it's not the case that the publisher version gets
pushed back onto an e-print server (unless the author has permission and
does that himself, which I haven't noticed).

Searching for referee in arXiv finds only ~1000 matches, referee or
corrected only 37,000. So, perhaps:
1) Physicists don't need to make corrections (so only the pre-print is
arXived)
2) Only the post-refereed version gets archived
3) Physicists don't provide a comment when they do update to reflect
referee's comments

See also Alma Swan's presentation
http://www.eprints.org/jan2005/ppts/swan.ppt).

All the best,
Tim.


Alma Swan wrote:


In recent days there has been some discussion as to whether NIH's retreat
may in fact be due to a fear of adverse effects on the scholarly
publishing
industry if immediate self-archiving were to be mandated by NIH for its
grantholders
(http://www.earlham.edu/~peters/fos/newsletter/02-02-05.htm).
And, certainly, the Nature Publishing Group appears to be changing its
policy on self-archiving. It is not easy to follow NPG's arguments so far
because they are rather complicated, but it appears to be suggesting
that it
is aiding Open Access by moving from allowing immediate self-archiving by
authors in their institutional repositories to allowing it only after a
period of six months post-publication of an article. The logic of this is
not at all clear. It would be very helpful if NPG would clearly
explain the
causal inferences and its policy but one has to infer that NPG has
apprehensions about a possible adverse effect of self-archiving upon its
business.

Many publishers, particularly some learned societies, share these
apprehensions and that is perfectly understandable if they base their
view
of the future on imaginings rather than on actual evidence.

In the case of self-archiving, there is absolutely no need for this
sort of
self-terrorising. The experiment has been done and the results are
clear-cut. Fourteen years ago the arXiv was set up (www.arxiv.org). It
houses preprints and postprints in physics, predominantly in the areas of
high-energy physics, condensed matter physics and astrophysics. It is the
norm for researchers in these areas to post their articles either
before or
after refereeing to this repository. In 2003, the 421 physics journals
listed in ISI's SCI published a total of 116,723 articles. The arXiv
receives approximately 42,000 articles per annum, meaning that around a
third of all physics research articles appear not only in journals but
ALSO
in the arXiv.

Have physics publishers gone to the wall in the last 14 years?  No,
and not
only have they continued to survive, they have also continued to
thrive. I
have recently asked questions about this of two of the big learned
society
publishers in physics, the American Physical Society in the US and the
Institute of Physics Publishing Ltd in the UK. There are two salient
points
to note:
1. Neither can identify any loss of subscriptions to the journals that
they
publish as a result of the arXiv.
2. Subscription attrition, where it is occurring, is the same in the
areas
that match the coverage of the arXiv as it is across any other areas of
physics that these societies publish in.

Both societies, moreover, see actual benefits for their publishing
operations arising from the existence of arXiv. The APS has cooperated
closely with arXiv including establishing a mirror (jointly with
Brookhaven
National Laboratory)... We also revised our copyright statement to be
explicitly in favor of author self-archiving. These efforts strengthened
(rather than weakened) Physical Review D [an APS journal that covers
high-energy physics] ...I would say it is likely we maintained
subscriptions
to Physical Review D that we may otherwise have lost if we hadn't been so

Re: Elsevier Gives Authors Green Light for Open Access Self-Archiving

2004-06-30 Thread Tim Brody

Regarding the article in the UK's Guardian newspaper:

Open access jeopardises academic publishers, Reed chief warns
Richard Wray, Wednesday June 30, 2004
http://education.guardian.co.uk/higher/books/story/0,10595,1250591,00.html

At the end:

   Reed has, however, made some concessions towards the open access
   movement...  Alongside the rise of open access publishers, such as
   BioMed Central and PLoS, some academics are pushing for the right
   to place copies of articles they write for subscription journals on
   their own websites. Reed has changed its copyright rules to allow
   self-archiving in this way.

Tim Brody
Southampton University
http://citebase.eprints.org/


Re: Scientometric OAI Search Engines

2004-05-05 Thread Tim Brody
The likelihood is the user searched Google before they tried Pubmed or 
ScienceDirect:
Ingelfinger Over-Ruled harnad comes up with an OA version as the top 
match.

With OAI and OpenURL the OA version could be linked in as easily as the 
aggregators currently linked to by PubMed (although perhaps not as 
reliably, but then if you get a hit at least you know the version is 
accessible).

While it would be nice for services to link to OA versions, it doesn't 
take more than 30 seconds to copy/paste some appropriate keywords into 
Google, which seems to do a good job of discovering an accessible version.

Tim Brody
Citebase Search: http://citebase.eprints.org/


Re: EPrints, DSpace or ESpace?

2004-04-13 Thread Tim Brody

   [2 Postings: (1) L. Waaijers; (2) T. Brody]

(1) Leo Waaijers (SURF, Netherlands)

Stevan Harnad wrote:


By the way, the real OAI google is OAIster, and it
contains over 3 million pearls from nearly 300 institutions
http://oaister.umdl.umich.edu/o/oaister/ but many are not journal articles
(and even if they all were, that still wouldn't be nearly enough yet!):
http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving_files/Slide0023.gif


And -- as of March 10 -- Yahoo searches OAIster! See
http://www.umich.edu/~urecord/0304/Mar08_04/07.shtml

Leo Waaijers



(2) Tim Brody (ECS, Southampton):


Henk Ellermann, Google Searches Repositories: So What Does Google
Search For?, http://eepi.ubib.eur.nl/iliit/archives/000479.html

But it is not only the quantity. Even when documents are available it does not
mean that they are available to everyone. And if it's available to anyone, you
still can't be sure that the system is running...

What we badly need is a continuous and authoritative review of existing
Institutional Repositories. The criteria to judge the repositories would
have to include:

   * number of documents, (with breakdown per document type)
   * percentage of freely accessible documents
   * up-time

It is great that Google becomes part of the Institutional Repositories effort, 
but
we should learn to give fair and honest [data] about what we have to offer. 
There
is actually not that much at the moment. We can only hope that what Google will
expose is more than just the message amateurs at work.


I would agree with Henk that the current -- early -- state of
'Institutional Repositories' (aka Eprint Archives) is not yet the promised
land of open access to research material.

Institutional research archives (and hence the services built on them)
will succeed or fail depending on whether there is the drive within the
institution to enhance its visibility and impact by mandating that its
author-employees deposit all their refereed-research output. Then,
once it achieves critical mass, the archive can support itself as part
of the culture of the institution.

The archive is the public record of the best the institution
has done. So those archives that Henk refers to, with their patchy,
minimal contents, need to look at what is going into this public record
of their research output, and must decide whether it reflects the
institution's achievements.

As a technical aside, DP9 was developed for exposing OAI things to Web
crawlers some time ago: http://arc.cs.odu.edu:8080/dp9/about.jsp

I would be surprised if Google were to base any long-term service on
only an archive's contents. Without the linking structure of the Web a
search engine is left with only keyword-frequency techniques, which the
Web has shown fails to scale to very large data sets. For my money,
Google-over-Citebase/Citeseer-over-Institutional Archives is much more
interesting (the Archive gives editorial management, Citebase/Citeseer
the linking structure, and Google the search wizardry).


Stevan Harnad:

Eprints, for example, has over 120 archives worldwide of exactly the same kind,
with over 40,000 papers in them:
http://archives.eprints.org/eprints.php?action=analysis


I have revised the description on that page to say that a *record*
is not necessarily a full-text. And of course a full-text is not
necessarily a peer-reviewed postprint. It would help bean-counters like
myself if repository/archive administrators would tag in an obvious place
what their content types are (i.e. what type of material is in the
system), and how the number of metadata records corresponds to publicly
accessible full-texts.

Tim Brody
Southampton University
http://citebase.eprints.org/


Re: OAI compliant personal pages

2004-02-10 Thread Tim Brody
Jim Till wrote:

 On Tue, 10 Feb 2004, Jean-Claude Guédon wrote [in part]:

 [j-cg] the growing number of open access repositories
 [j-cg] including OAI compliant personal pages

 I noted with interest Jean-Claude's comment about OAI
 compliant personal pages. How can such pages be identified
 as OAI compliant (and, how can their number be estimated)?

I don't know what J-CG means. Individuals can of course set up an OAI
repository, which is just a collection of metadata records. If it's
OAI-compliant it could be registered with Open Archives Initiative -
Repository Explorer http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai

There isn't a 'discovery' method as such for OAI -- we have searched for GNU
EPrints sites by using a Web search for terms that are common across
installations.
http://archives.eprints.org/eprints.php

Regards,
Tim Brody


Re: Copyright: Form, Content, and Prepublication Incarnations

2003-11-18 Thread Tim Brody

On Mon, 17 Nov 2003, Troy McClure wrote:


ive been browsing through the citebase and quite a few of such messages came up:

This paper has been withdrawn by the authors due to copyright.
http://citebase.eprints.org/cgi-bin/citations?id=oai%3AarXiv%2Eorg%3Anlin%2F0301018


Stevan Harnad:

I have to admit that this is the first I've ever heard of any papers
being removed from Arxiv for copyright reasons. I will ask Tim Brody (creator
of citebase) to see whether there is a more sensitive way to do a count,
but using copyright and (remove or withdraw) I found 6 papers out
of the total quarter million since 1991.


There are only 13 deleted (flagged as such by arXiv's OAI interface)
papers in arXiv.org, or 0.005%.

One of those papers is available as an earlier version:
http://citebase.eprints.org/cgi-bin/citations?id=oai%3AarXiv%2Eorg%3Anlin%2F0301018
(go to arXiv, click v1, get full-text)

Tim Brody


Re: Berlin Declaration on Open Access

2003-10-22 Thread Tim Brody
- Original Message -
From: Stevan Harnad har...@ecs.soton.ac.uk

 The Berlin Declaration is just the beginning of a series of steps that
 the signatories will be taking to promote open access. Among these steps,
 the Max-Planck Society is Edoc, an open-access repository of all of the
 research output of the Max-Planck Institutes' many research
 laboratories. This is a truly remarkable concerted act of institutional
 self-archiving, and a superb example for the research world at large.

 http://edoc.mpg.de

I had trouble finding any full-text, open-access research articles
(literature that would otherwise be inaccessible without a subscription) in
edoc?

All the best,
Tim.


Re: Nature's vs. Science's Embargo Policy

2003-01-15 Thread Tim Brody
There are two sides to the first world/developing world research divide:
access to the First world by the Developing world (FD), and access to the
Developing world by the First world (DF).

An APC model solves the FD problem - as an author is paying the publisher to
provide maximum dissemination through free-access, therefore (assuming the
reader has access to the Web!) any researcher can access the paper
regardless of their financial situation.

The DF problem is more to do with journal-impact  language barriers rather
than the economics of the situation. In theory developing-world
researchers - given the current system - are on an equal footing with any
other world researcher. Arthur Smith (hope I'm not quoting out of context)
has said in this list that the first world is currently subsidising the
developing, as it is paying the vast majority of the costs (through
subscriptions etc.), while the developing world pays very little of this but
has the same potential to be published in the high-impact journals.

No sustainable economic model can allow the developing world to have both
free access AND be able to publish in those first-world, high-impact
journals for free - not without being subsidised by the first world.

That said, free (open) access *will* allow developing-world journals to play
on a level playing field with the first. Once the literature is free-access,
aggregating services can index both first-world and developing-world
journals - and provide impact factors for both.

All the best,
Tim Brody

- Original Message -
From: ept e...@biostrat.demon.co.uk
To: american-scientist-open-access-fo...@listserver.sigmaxi.org
Sent: Tuesday, January 14, 2003 3:43 PM
Subject: Re: Nature's vs. Science's Embargo Policy


 Alan Story wrote:

   Jan:
  
   Further on the question of open access by potential authors.
  
   A few questions re: BioMed Central waivers ( of the $500
 article-processing
   charge):
 .


 EPT is watching these discussions and trying to work out the impact of
 open access on developing country science.

 My understanding is that both the BMC $500 charge and the PloS $1500
 charge are to cover the costs both of document conversion and peer
 review and I am not sure what % of these figures is for peer review. I
 do not understand why peer review costs are considered to be so high,
 since the reviewers give their professional skills for free and the
 other costs are merely mailing and record keeping. The whole process can
 now be automated, as has been done by the Canadian journal, Conservation
 Ecology (www.consecol.org). See also www.arl.org/sparc/ for other tools
 for automated peer reviewing. Once such tools are set up, peer review
 costs must be almost nil.

 For developing country scientific organisations, replacing one
 unaffordable cost (tolls) by another unaffordable cost (APC) is of
 little encouragement. Even though the APC costs are substantially less,
 and may be eliminated for developing country authors (if they can 'make
 a reasonable case', and see the query from Alan Storey), one must hope
 that these efforts are interim means of getting from 'here' to 'there'.
 To ensure the international scientific community has access to ALL
 research ouput, there must be a true level playing field. Only then can
 the 'missing' research generated in the developing world, and critical
 for international programmes (in AIDS/malaria/tuberculosis/environmental
 protection/biodiversity/taxonomy/ biosafety/biopolicy) become part of
 mainstream knowledge. Only then can the isolation of the scientific
 community in under-resourced countries be overcome and international
 partnerships be established to the benefit of all of us. Carry out a
 search for 'malaria' on the non-profit distributor of many developing
 country journals, Bioline International, to see an example of the
 missing research. Use www.bioline.org.br and search from the homepage
 across all material on the system.

 My understanding has always been that the open access movement aimed to
 provide free access to institutional archives - free of costs both to
 the author and the reader. Any costs to be met would be borne by
 institutions, which have an interest in distributing their own research
 output in ways that make the greatest impact. Again, my understanding is
 that costs for setting up an institutional eprint server would be:
 an initial modest setting-up cost, some hand-holding costs for authors
 in preparing documents for the eprints servers, followed by low
 maintainenance costs. These could surely be 'absorbed' by most
 organisations. Essential peer review costs would be readily paid for by
 savings plus automation.

 And that sounds just fine for science in the developing world.

 Barbara Kirsop
 Electronic Publishing Trust for Development - www.epublishingtrust.org


Re: Draft Policy for Self-Archiving University Research Output

2003-01-08 Thread Tim Brody
If the author's employment contract states that their employer (the
University) reserves non-commercial distribution rights then that author can
not sign away those rights to a publisher (without the agreement of the
University).

In my opinion I would rather the IPR were held by the institution - who paid
for the research, facilities  support - rather than with the publisher. If
not for any other reason than an institution will rarely hold the same kind
of monopoly as the big publishers.

All the best,
Tim.

- Original Message -
From: Fytton Rowland j.f.rowl...@lboro.ac.uk
To: american-scientist-open-access-fo...@listserver.sigmaxi.org
Sent: Wednesday, January 08, 2003 3:52 PM
Subject: Re: Draft Policy for Self-Archiving University Research Output


 Um - before you can have a postprint you must have published the paper
 somewhere.  In many (most?) cases you will have transferred the copyright
to
 the journal.  So how can the University then assert its ownership of a
 copyright that you, the individual academic, have already given away in
the
 belief that it was yours to give?

 Fytton Rowland.

 - Original Message -
 From: Picciotto, Sol s.piccio...@lancaster.ac.uk
 To: american-scientist-open-access-fo...@listserver.sigmaxi.org
 Sent: Wednesday, January 08, 2003 1:57 PM
 Subject: Re: Draft Policy for Self-Archiving University Research Output


  It seems that copyright ownership could be an important obstacle to
 archiving
  postprints. I have proposed at Lancaster that academic staff employment
  contracts be modified to make it clear that the university asserts its
 rights
  as employer to copyright in staff research publications, but only to the
 extent
  of reserving the right to authorise non-commercial publication on the
 internet,
  e.g. in an eprints archive. This would circumvent a possible restriction
  resulting from any copyright assignment the author signs. The idea has
 been met
  favourably here, both by the AUT (professional association) and
 management, but
  both have referred it for discussion at national level.
 
  I think the university should be willing to forego any claim to income
 from
  research publications, but should retain the right to authorise
 non-commercial
  publication. The decision on when to publish, which version, etc, should
 be
  left to the author(s), within a policy such as that suggested here for
  Southampton, which would greatly facilitate acceptance of eprints
 archiving as
  a standard practice.
 
  cheers
 
  Sol
  
 
  Prof. Sol Picciotto
  Head,
  Lancaster University Law School
  Lancaster University
  LANCASTER LA1 4YN,
  U.K.
  direct phone (44)(0)1524-592464
  fax (44)(0)1524-525212
  s.piccio...@lancaster.ac.uk
 
  **
 
   -Original Message-
   From: Stevan Harnad [SMTP:har...@ecs.soton.ac.uk]
   Sent: Wednesday, January 08, 2003 12:49 PM
   To:   american-scientist-open-access-fo...@listserver.sigmaxi.org
   Subject:   Draft Policy for Self-Archiving University Research
 Output
  
   Comments are invited on the following draft for a university policy on
   the self-archiving of research output:
  
   http://www.ecs.soton.ac.uk/~lac/archpol.html
  
   It is being formulated both for use at Southampton
   University, and as a possible model for wider adoption,
   particularly in connection with a recommended restructuring
   http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2373.html
   of UK's Research Assessment Exercise (RAE)
   http://www.rareview.ac.uk/
   and its emulation in other countries
   http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2356.html
  
   Stevan Harnad
 
 


Re: Online Self-Archiving: Distinguishing the Optimal from the Optional

2002-12-13 Thread Tim Brody
- Original Message -
From: Arthur P. Smith apsm...@aps.org

  The main focus of your tragic loss article was the obsolescence of
 paper, and the resulting consequences. One consequence which was perhaps
 not widely anticipated is expanded access to research journal content -
 now available from
 the desktop instead of having to go to the library. And the increased
 availability that
 consortium deals and other special arrangements are providing. So the
 library as a physical
 facility is less useful, but as a provider of information, surely the
 utility
 of every library has grown over the past 8 years? Are the other things
 you mention
 (phone, fax, email, etc.) really a substitute for traditional scholarly
 communication?

The SPARC paper (http://www.arl.org/sparc/IR/ir.html) identified four
features of scholarly publishing: registration, certification, awareness,
and preservation.

Given the growth of e-journals, consortia agreements, and aggregators (or,
in the case of the big publishers, simply a single publisher's holdings),
what role does the institutional library - and it's librarians - have in the
future of scholarly publishing?

Is the future of the research library a web page of user names and
passwords, along with a form for request-a-journal?

(... if the research literature was Open Access, perhaps even this would be
supplanted by a single Google-search?)

All the best,
Tim.


Re: UK Research Assessment Exercise (RAE) review

2002-11-26 Thread Tim Brody
Chris Zielinski asks:

 how many articles have been read but not cited?

The folloowing estimates are from Citebase's database
(http://citebase.eprints.org/) -

(but duly noting caveats on data-quality, scope, coverage, noisiness,
potential for abuse etc, http://citebase.eprints.org/help/coverage.php
http://citebase.eprints.org/help/#impactwarning )

Looking at the 91,017 arXiv.org articles that have a journal reference
(the author has said where the article was/will be published)

17628 (19.4%) have not been cited but have at least once been downloaded
from uk.arXiv.org

(of the remainder 73265 have both been cited and downloaded, 98 have been
cited but not downloaded, and 26 were neither cited or downloaded)

I believe this is because physicists read all the new additions to the
arXiv.org, as it forms a convenient inbox of research. However, over time
downloads are more discerning between low impact and high impact (pink line
is the top quartile of papers by citation impact):
http://citebase.eprints.org/analysis/hitslatencybyquartile.png

Correlation r between hits and citation impact for the top quartile is
0.3359 with an n of 25,532.

Citations and downloads are mutually re-inforcing. If an author has read an
article they are more likely to cite it, conversely if an author sees a
citation they are likely to read the article that has been cited.

All the best,
Tim.

- Original Message -
From: informa...@supanet.com
To: american-scientist-open-access-fo...@listserver.sigmaxi.org
Sent: Tuesday, November 26, 2002 7:39 AM

 In fact, Stevan mentions other new online scientometric measures such as
 online usage [hits], time-series analyses, co-citation analyses and
 full-text-based semantic co-analyses, all placed in a weighted multiple
 regression equation instead of just a univariate correlation. Indeed,
 impact factors are very crude quasi-scientometric and subjective measures
 compared even with such simple information (easy to obtain for online media)
 as counts of usage - for example, how many articles have been read but not
 cited?

 All these are indeed worth pursuing and, I would have thought, right on the
 agenda of the OA movement.

 Chris Zielinski
 Director, Information Waystations and Staging Posts Network
 Currently External Relations Officer, HTP/WHO
 Avenue Appia, CH-1211, Geneva, Switzerland
 Tel: 004122-7914435 Mobile: 0044797-10-45354
 e-mail: zielins...@who.int and informa...@supanet.com
 web site: http://www.iwsp.org


Re: Book on future of STM publishers

2002-07-19 Thread Tim Brody
I presume Albert Henderson's's assertion that student work is of lesser
value is based on personal opinion rather than on any scientometric
study of the relative impact of different types of research.

I believe the majority of the members of research groups consist of
research students (PhDs); hence the novel work that research students
undertake forms the bedrock from which research in general is developed
(not only through the students carrying their own work on into research
posts and professorships, but also as it feeds directly into the student's
research group and the research community in general).

It would seem, therefore, that research dissertations may be a potentially
valuable resource after all - one that for too long has been accessible
only from library archives.

All the best,
Tim Brody
(PhD Research Student)

- Original Message -
From: Albert Henderson chess...@compuserve.com
To: american-scientist-open-access-fo...@listserver.sigmaxi.org
Sent: Thursday, July 18, 2002 9:09 PM
Subject: Re: Book on future of STM publishers

 The fundamental flaw in Stevan's position is
 that it discounts the receipt of value --
 recognition and targeted dissemination -- exchanged
 by the journal author. If one recognizes that the
 journal publisher does provide such value, the
 journal author is on the same footing as the book
 author. No man but a blockhead ever wrote, except
 for money, as Samuel Johnson observed. Steven's
 position is out of bounds. The question is moot.

 In the case of the dissertation, the acceptance
 is of a lesser value, since it is student work.
 Most books derived from dissertations require a
 good deal of additional work before they are
 publishable in the usual sense and recognizable
 by the world beyond dissertation examiners.

 The future of STM publishing is a great topic
 for magazines that have a short shelf life.
 They can attract a curious readership and sell
 lots of advertising by puzzling over questions
 without answers.

 I for one have serious doubts whether the future
 of any industry niche would be a fit subject for
 a student dissertation. Most predictive visions
 offered decades ago by experts are today only
 meaningful as evidence of lobbying and other
 promotional efforts. Book or dissertation, I
 would expect to shelve this topic near astrology.

 Albert Henderson
 Former Editor, PUBLISHING RESEARCH QUARTERLY 1994-2000
 70244.1...@compuserve.com


 .
 .



Re: Chat: E-Archives Challenge: Results

2001-05-29 Thread Tim Brody
On Mon, 28 May 2001, Wentz, Reinhard wrote:

 Since issuing the challenge I have thought of a definite limitation of
 e-archiving: The list of references in e-archived articles will never look
 as beautiful as the ones produced by publishers' professional proof readers,
 copy editors and other valuable members of a publishing team. I can send a
 sample (in colour!) of such a list to anybody doubting that statement and
 also some pictures of what professional copy editors (what a splendid body
 of people!) are up to in their spare time.

*cough*

Do I get 10 quid if I point out you're completely wrong?

e-archiving will require authors to provide reference lists in standard
formats (or a format that can be heuristically extracted). Thus, using
e-archives, your reference lists will look as elegant as you wish because
the format will no longer be determined by author, journal or field, but
by the person viewing them!

(and the reason for this is that author's citations won't be shown unless
they produce good quality metadata and references that can be
automatically linked)

Of course, this does not negate the old axiom rubbish in, rubbish out,
but whatever goes in, it will look very beautiful on the way out.

All the best,
Tim Brody


Re: Digitometrics

2001-05-26 Thread Tim Brody
 On Thu, 24 May 2001, Tim Brody wrote (about my proposed 2nd criterion for
 evaluation of an eprint archive, which was: 2) its suitability for
 yielding citation data [an 'impact-ranking' criterion?]):

 [tb] One might also add the facility to export hit data, as an
 [tb] alternative criterion (or any other raw statistical data?).

 What kind of raw statistical data might be most useful, in the future, for
 'impact-ranking'?

Perhaps the beginning of the answer lies in what can be measured, then what
can be measured accurately, and lastly what is useful to users.

The first part is (in no particular order): hits, citations, authors,
institutions, countries, dates, and sizes, ...?

 At the arXiv archive, one section of the FAQ section (under Miscellaneous)
 addresses the question: Why don't you release statistics about paper
 retrieval?.  (See: http://xxx.lanl.gov/help/faq/statfaq).

 The short answer provided is: Such 'statistics' are difficult to assess
 for a variety of reasons.  The longer answer also includes the comments
 that:
 [*snip* accentuates faddishness]
 And,
 [*snip* big brother is watching]

 Thought-provoking comments?

I would say there are better reasons than the two you chose, some of which
mentioned by arXiv. For example, no system administrator would appreciate
someone downloading a paper 1000 times just to up their hits!
Also, as pointed out by arXiv, knowing how little one's research is read or
cited could put a researcher off arXiving all together.

(I provide an example of such statistics from cite-base, but leave it to the
user to decide whether they are useful or not)

All the best,
Tim Brody


Re: Validation of posted archives

2001-03-21 Thread Tim Brody
On Wed, 21 Mar 2001, Guillermo Julio Padron Gonzalez wrote:

 The name of a journal is part of the validation of a published paper.
 We all use the rigorousness of the peer review and the editorial
 crite-ria of the journals to judge about the validity of a published
 paper. I agree that there can be exceptions, but they are just that:
 exceptions.

 It is clear that nobody has the time or the willingness to dive into
 each paper to find out whether it is the final version of a validated
 paper or it is just electronic garbage. The fact is that a
 non-administered archiving system may cause a proliferation of
 non-validated, duplicated, misleading and even fraudulent information in
 the web and there will be no way to identify the valid information, so
 the readers will go to validating sites, v. g. the publisher site.

 Unless OAI included some kind of validation...

I hope you do not mind me adding to this discussion.

If I may clear up perhaps a confusion about the protocol OAI:

OAI is a protocol for the distribution of Metadata, much the same as
TCP/IP is a protocol used by the Internet to distribute information. I
would no more expect OAI to provide me with guarantees about the content
than I would TCP/IP about this email.

(As an aside, OAI does not provide any facility for the distribution of
full-text papers (it can merely distribute 'pointers' to papers).)

Therefore the validation, or otherwise, of papers and their heritage rests
with the application(s) that use OAI.

As an example of an Open Archive that has had ample opportunity to be
filled with rubbish; (correct me if I am quoting wrong), arXiv has, in its
ten years, only had to delete 2 papers out of 160,000. This would suggest
that either arXiv has a very efficient staff or this is not really a
problem (or, as I suspect, both).

Your suggestion, to me, does seem a rational one (and indeed currently
exists between arXiv and the APS - I believe the APS will accept
submissions using arXiv papers), that there are archives of pre-print
papers which are then picked up by validating services (i.e publishers)
which then repackage archives into validated subject/editorial content.

It would then be your choice as to whether you use the e-Print server or
the packaged (and pay-for) service of Publishers, and naturally the effect
of the publisher service would be to improve the e-Print content (...
invisible hand of peer review).

All the best,
Tim Brody.


Re: Number of pre-prints relative to journal literature?

2000-12-07 Thread Tim Brody
On Thu, 7 Dec 2000, Stevan Harnad wrote:

[ Origin of statistics about the coverage of scientific literature by XXX ]

 Here's one way to estimate it for the physics arXiv: percentage of
 current citations by papers in within arXiv not papers not within
 arXiv (courtesy of Les Carr, Zhuoan Jiao, Tim Brody  Ian Hickmen):

 http://www.ecs.soton.ac.uk/~harnad/Tp/Tim/sld003.htm

There are two questions:
1) What percentage of the _current_ output of literature is being arXived?
2) When looking for cited work, what percentage could I find in the
arXiv?

1) For High Energy Physics (for which statistics covering all published
work can be obtained from SPIRES), the percentage of papers arXived is
almost 100%. I have no data to cover other areas, but it must be noted
that most areas of XXX are seeing increased depositing, whereas HEP is
almost static. I would hypothesise that this is because other areas do not
have a high percentage of all literature being archived.

2) For the whole of the archive this is around 30-40% (with the HEPs
having a larger percent), with the result that it will be another 10 years
before all cited work has been archived (assuming the typical lifespan of
a paper is 5-7 years). This length of time could be reduced by authors
archiving existing literature.

All the best,
Tim Brody


Re: Central vs. Distributed Archives

2000-11-09 Thread Tim Brody
  Greg:
  As a rule, it is better for web sites to share the same archive than
  to each have fragments. It is better for Oxford and Cambridge to
  each have all of Shakespeare's plays than for Oxford to have only the
  comedies and Cambridge to have only the tragedies. That is why I favor
  shared interoperability, which is in some ways centralized, to fragmented
  interoperability, which is optimistically called decentralized. Massive
  redundancy is one of the few strengths of the existing paper-based system;

 Stevan:
 I am not an expert on digital storage, coding or preservation, but I am
 not at all sure that Greg is technically right above (and I'm certain
 that the Oxford/Cambridge hard-copy analogy is fallacious). I would
 like to hear from specialists in localized vs. distributed digital
 coding, redundancy, etc. -- bearing in mind that in the case of the

If I may separate the political issues from the technical.

Political:

There is a fear that a decentralised system will result in no overall
responsibility for archive continuity. But, equally, a centralised
body can decide that a system is no longer useful or is too expensive
to be free - what happens if XXX goes pay-per-view? What rights do
mirrors have to store XXX if they are told to remove their archive?

Technical:

The fear is that there will be only one copy of a paper stored in an
institution department or library and if that archive is lost that
paper disappears into digital oblivion.

Data storage is very cheap - there is little difference between storing
1 or 100 copies. Oxford and Cambridge could farm all world physics
archives and store their contents. This is not currently done because
Open Archives include pay-per-view archives, where only the abstract
can be farmed - and hence there is no provision for farming of texts.

I may also point out that there are already archives that perform
distributed mirroring - math arXiv is primarily made up of papers that
have been archived elsewhere (judging by the lack of associated meta
data and updates).

Tim Brody
Computer Science, University of Southampton
email: tdb...@soton.ac.uk
Web: http://www.ecs.soton.ac.uk/~tdb198/


Re: Publishing quote

2000-11-07 Thread Tim Brody
On Tue, 7 Nov 2000, Lynn C. Hattendorf Westney wrote:

 Thought I would share these words of wisdom with this listserv.

 You can publish the Journal of Left Earlobe Anatomy, and you can say it's
 free to the world, but if very few people come and look at it...then it
 doesn't make any difference.
 Robert D. Bovenschulte,
 ACS Publications, Division Director

But if those very few people are the only researchers of Left Earlobe
Anatomy then it makes all the difference in the world.

Are you improving research (and hence science) or improving your impact?

Tim Brody
Computer Science, University of Southampton
email: tdb...@soton.ac.uk
Web: http://www.ecs.soton.ac.uk/~tdb198/


Re: Why hep-th has 40% red-links

2000-09-12 Thread Tim Brody
(Note: Red-links = citations to LANL pre-print reference number, e.g.
hep-th/9906001, may or may not also contain published data)

  Tim:
  http://www.aip.org/pt/vol-53/iss-8/p35.html
 
Some of my colleagues in Santa Barbara--the string theorists, for
example, and several of my coworkers in condensed matter theory as
well--insist that they don't need The Physical Review. For research
purposes, they don't need refereed print journals at all. They are
producing remarkable results this way, so I take them very seriously.
 
What they are doing is using the Los Alamos e-print archive for all of
their research communications. They check it every day for new
information. They post all their papers there, cite references by archive
number, use the search engine to find other papers, and need little or no
other publication services.
 
  I don't know whether string theory is hep-th, but it would look like a
  credible explanation why hep-th has such high hits for red-links,
  compared to hep-ph (which is a area of similar size and lineage).
 
  Perhaps this is a rule that can be extended to all theoretical science -
  that theory does not demand the same level of invisible hand rigour
  as more practical research.

 Stevan:
 Elite string theorists are a small, specialized group. Their numbers
 and stature are about comparable with the scale of all of science in
 the 17th/18th century, where the few practitioners world-wide (Newton,
 Leibniz, etc.) at any time could communicate their research by simply
 writing letters to one another.

 This is neither representative of research as a whole today, nor will
 it scale (in my opinion).

hep-th has 6000 authors, hep-ph has 7500 authors, with 13000 and 17000
papers respectively. The difference in the number of red-links
identified is respectively 40% and 20%.

This represents a large and active group within LANL, although, as you
say, the quoted article only relates to a small group this could be
representative of the larger hep-th physicists who'se overall behaviour
results in double the number of red-link citations.

What other explanation(s) could there be for for a large difference in
citation patterns?

[cue argumentative as opposed to empirical]
This behaviour does not need to scale, hep-th and hep-ph have been
virtually static in the number of deposits since 1995 (growth has come
from other areas), and the citation patterns have been relatively static
since 1998. Although these red-link citations could also be citing
published articles, it would appear to be the settled behaviour of 40%
of citations being to LANL pre-prints, surely this must be a change away
from citations in the peer-review world to citations in the e-print,
pre-print world?

 Stevan:
 Remember Simon-says: We should definitely find out (but not
 necessarily believe) what people SAY they are doing, and why.
 We should also find out what they DO do, and what others do/say too.

 Then let's piece together the picture objectively.

 The string theorists are definitely a piece of the whole picture,
 but equally definitely not a representative microcosm of it!

hep-th and hep-ph are the most self-contained and long-standing areas of
LANL, the behaviour of HEP authors may not represent medics or computer
scientists but they may show the relative effect that instantly available,
unrefereed articles could have on the research world.

 Stevan:
 [Nor is theory in general the dividing line, I think, for there are
 more and less populated, more and less elite areas of theory too --
 in my (Stevan-says) opinion...]

But (feel free to correct me), hep-th is the primary digital source for
theoretical physicists, and that is where theoretical physics research is
being done.

Tim Brody
Computer Science, University of Southampton
email: tdb...@soton.ac.uk
Web: http://www.ecs.soton.ac.uk/~tdb198/