Re: [GOAL] [SCHOLCOMM] BLOG: Is CC-BY really a problem or are we boxing shadows?

2016-03-03 Thread Lee Giles
It's important to note that, at least in the US, facts, data and ideas
are not copyrightable.
> http://www.lib.umich.edu/copyright/facts-and-data
Lee
On 3/3/16 10:40 AM, Pippa Smart wrote:
> I believe moral rights (attribution and integrity) are upheld in UK law (
> http://www.legislation.gov.uk/ukpga/1988/48/part/I/chapter/IV)
>
> My own issue with CC BY is that its simplicity results in a clumsy catchall
> - for example, few authors would object to figures from their work being
> used in another work (=derivative work), but might be unhappy about a
> translation being produced without their knowledge (=derivative work).
>
> Your point about commercial use is well made since this is the area where I
> hear most complaints from authors - the fact that their publisher can make
> money is accepted in many cases, but the idea that a third party can
> "freeload" and make money out of their work is often considered
> unacceptable.
>
> Pippa
>
> *
> Pippa Smart
> Research Communication and Publishing Consultant
> PSP Consulting
> Oxford, UK
> Tel: +44 1865 864255 or +44 7775 627688
> email: pippa.sm...@gmail.com
> Web: www.pspconsulting.org
> @LearnedPublish
> 
> Editor-in-Chief of Learned Publishing:
> http://www.alpsp.org/Learned-Publishing
> Editor of the ALPSP Alert: http://www.alpsp.org/ALPSP-Alert
> 
>
> On 3 March 2016 at 13:46, Sandy Thatcher  wrote:
>
>> Klaus Graf and I debated this question in an article in the first issue of
>> the Journal of Librarianship and Scholarly Communication back in 2012:
>> https://www.researchgate.net/publication/254667054_Point_Counterpoint
>> _Is_CC_BY_the_Best_Open_Access_License
>>
>> I was particularly concerned about translations.  It should be noted, by
>> the way, that the CC BY license in existence at the time we wrote this
>> article contained a reference to distortion, mutilation, etc., as part of
>> the license terms. That part was dropped in later iterations, and the only
>> reference now is this: "Moral rights, such as the right of integrity, are
>> not licensed under this Public License, nor are publicity, privacy, and/or
>> other similar personality rights; however, to the extent possible, the
>> Licensor waives and/or agrees not to assert any such rights held by the
>> Licensor to the limited extent necessary to allow You to exercise the
>> Licensed Rights, but not otherwise." In other words, licensors do not give
>> up their moral rights by offering this license to users, but since moral
>> rights are not recognized under British or US law (with a very limited
>> exception under US law to works of fine art), that clause is of little
>> comfort or utility for Anglo-American authors.
>> https://creativecommons.org/licenses/by/4.0/legalcode
>>
>> I am glad to see that the Cambridge discussion continues to recognize that
>> translations may be a problem for HSS authors.
>>
>> There is one non sequitur in the Cambridge summary that needs to be
>> addressed: "Academics do not publish in journals for money, so the
>> originator of a work that is subsequently sold on is not personally losing
>> a revenue stream." Just because an academic author may not be motivated by
>> personal monetary gain does not mean that a personal revenue stream is not,
>> in fact, lost in some circumstances. As former director of Penn State
>> University Press, I can cite examples of authors who benefited to the tune
>> of thousands of dollars from the reprinting of their articles from some of
>> the journals we published.
>>
>> There is a general problem also with the definition of what is
>> "commercial." When Creative Commons itself conducted a survey several years
>> ago as to what people understand to be the meaning of this word in the
>> context of publishing, there was little consensus beyond a very small core
>> of shared understanding of what the term means.
>>
>> Sandy Thatcher
>>
>>
>>
>>
>> At 12:11 PM + 3/3/16, Danny Kingsley wrote:
>>
>> 
>>
>> Dear all,
>>
>> You might be interested in the outcomes of a roundtable discussion held at
>> Cambridge University earlier this week on the topic of Creative Commons
>> Attribution licences.
>>
>> Is CC-BY really a problem or are we boxing shadows?
>> https://unlockingresearch.blog.lib.cam.ac.uk/?p=555
>>
>> A taster:
>> ***
>>
>> Comments from researchers and colleagues have indicated some disquiet
>> about the Creative Commons (CC-BY) licence in some areas of the academic
>> community. However, in conversation with some legal people and
>> contemporaries at other institutions one of the observations was that
>> generally academics are not necessarily cognizant with what the licences
>> offer and indeed what protections are available under regular copyright.
>>
>> To try and determine whether this was an education and advocacy problem or
>> if there are real issues we had a roundtable discussion on 29 February at
>> Cambridge University attended by about 35 people who 

[GOAL] Re: Master theses as preprints

2015-05-01 Thread Lee Giles
It is fairly common in computer science, physics and economics to
preprint work in repositories
such as the arXiv or in their own tech report system. It does not seem
in anyway to effect publication
of that work in vetted venues.

In computer science, it is fairly common to see MS and PhD theses that
consist of previously
published work in conferences and/or journals. In particular a MS or PhD
defense that has
such previously published work referenced makes a much stronger case.

Lee Giles

On 4/30/15 3:29 AM, Longva Leif wrote:
 Question: How common is it that journals reject submitted manuscripts purely 
 because the paper is already available as a preprint in some repository?

 At our institution (UiT The Arctic University of Norway), master students' 
 supervisors very often advice their students not to make their thesis 
 available in our IR, because they intend to rewrite it into one or more 
 journal article(s). At the time of finished thesis, they do not know where 
 the paper version will be submitted. And they are afraid that having the 
 thesis openly available in our IR will severely limit their choice of 
 journals to submit to.

 I was attending the Emtacl15-conference in Trondheim last week, and there I 
 heard about the effort to build a preprint culture at the Erasmus 
 University in Rotterdam. And in response to my question the presenter said 
 that there is no problem with journals not accepting manuscripts already 
 freely available in their IR.

 So, to all our students and their supervisors, can we comfort them and say 
 that there is no need to hold the theses back from our IR, and that they need 
 not fear rejection? (The same fear is also common among doctoral students who 
 often submit PhD theses that include papers not yet submitted to a journal.)

 This matter, whether an available preprint is acceptable or not for the 
 journal editors, is not an information you find in Sherpa/RoMEO. In 
 Sherpa/RoMEO you find what you may do with your pre- and postprint if and 
 after it is accepted by the journal.

 Grateful for views on this.

 Yours,
 Leif Longva
 UiT The Arctic University of Norway



 ___
 GOAL mailing list
 GOAL@eprints.org
 http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal

___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal


[GOAL] Acknowledgement Search Engine - AckSeer

2012-03-25 Thread Lee Giles

AckSeer is a beta automatic acknowledgment indexing search engine that
explores automatic identification, entity extraction and indexing of
acknowledgements from papers. In addition acknowledged entities are
extracted within the acknowledgment passages.

Currently, AckSeer indexes acknowledgments from more than 500,000 papers
in CiteSeerX. These acknowledgements contain more than 4 million
acknowledged entities with approximately 2 million of them unique.
Entity extraction is based on AlchemyAPI and OpenCalais. Acknowledged
entities are ranked by citation. Feedback is most welcomed.

http://ackseer.ist.psu.edu



[GOAL] Re: How many researchers are there?

2012-01-03 Thread Lee Giles

I would like to ask a counting question since all of this is based on good
counting
and a great deal of faith is placed on the the counters. Even the US census
knows the issues with doing this and resorts to capture/recapture methods
to get things right.

Counting papers should be rather straightforward since there are databases
which are primarily populated manually mostly by the authors themselves.

However, counting unique authors is not so straightforward; clustering
same author mentions is nontrivial and just counting each mention leads
to terribly inaccurate results.

How does the UN or anyone disambiguate authors, such as the John Smith
at Harvard from the John Smith at Berkeley or, more interestingly, from
the H. Chen at many places? Is this a manual or an automatic method?
Manual methods do not scale (look at DBLP) and have many
errors - see PubMed. Do you know how many different you's there are in health
record or credit databases; you would be surprised.

As for automatic methods, the literature is rife with results.
Disambiguation algorithms are notoriously temperamental and,
depending on the parameters, can lead to over- or under- counting
by factors up to an order of magnitude.

Best regards,

Lee Giles

On 1/3/12 9:59 AM, Arif Jinha wrote:

Arthur,
Great work.  Just trying to save you some time.  Here's what I found after worki
ng on it for about 2 years. 
- # of researchers in the world is reported by UN data in the Science Report. 
- That figure directly relates to the number of journal titles which relates dir
ectly to the number of articles, and the growth rates of articles and researcher
s are 1:1.  So, even if you're not interested in the number of annual articles p
ublished, it's important to note as a check on data and possibly a challenge to 
the evidence thus far. 
- There are more researchers than annual articles - about 6 to 7.  Again, a chec
k on data or a challenge.

In the absence of any undertaking of reasonable time and expense to count resear
chers better than the UN, I've relied on that data not for great precision but b
ecause of the logical and empirical support for the internal consistency of the 
relationships (the self-organizing system of scholarly communication).  

I'm very confident in the precision of some estimates and growth rates for artic
les and not others, those done by Mabe (1 million annual articles in 2000, 3.4% 
growth of journals over 3 centuries and variability in Little Science, Big Scien
ce and Disillusionment periods) Tenopir and King (similar data in the late 1990s
) and Bjork.  The 2.5 million articles frequently cited by Harnad is way off bec
ause they failed to take into account the difference in article averages - they 
used the article average from ISI and the number of titles from Ulrich's. There 
is no excuse for that.  The other estimates that are way off occur before the to
ols were available to get the precision needed and those are older estimtes. 

In addition, Bjork's work continues to cite the 3.4% average annual growth of ac
tive journals, whereas I have noted a spike in article and journal output since 
2000 which is important to note.  The variations in article and journal growth a
re what defines Little Science (before WWII), Big Science (after WWII), Disillus
ionment (1970s to 2000) periods. Since the current growth rate is minimally 4.5%
, we currently see a) a reversal of disllusionment, b) the highest variation in 
history, and c) the highest annual increase in production.  Moreover, we see a m
assive 10% drop in the share to the West (NA, Europe, Australia and New Zealand)
, as a result of globalization.  We can also see from the data minimally 20% of 
articles being OA now, and the current growth rate (last 5 yrs) pointing towards
 50% in the next 20 years. So, I have named 2 new periods after Disillusionment 
- Global Science (2000 to current) and Open Science (current to future).  

Here are my frustrations with this research, it is rooted in the ancient researc
h paradigms of the 20th century, which I myself had to wade through.  It lacks R
EFLEXIVITY, and is hopelessly academic.  Academia is hopelessly unimaginative.

You cannot determine the future of OA by the trend alone, logically if the share
 of OA is already significant and growing rapidly, this alters the market, and p
uts pressure on publishers to react.  What will happen is that as the OA share i
ncreases, more journals will convert to OA, and more new journals will start OA.
 A quick look into Urlich's tells me that the increase in new OA journals is muc
h higher than the current growth rate of Gold OA articles.  Secondly, the growth
 of mandates is spectacular, but the effect takes 2 years to manifest so we are 
only going to start to see that in the next decade.  That means an acceleration 
of the trends that I've pointed out is likely, begging the question as to who in
 their right mind would publish a Toll Access journal in the year 2030, to a glo
bal

RefSeer, Citation Recommendation System

2011-07-01 Thread Lee Giles
Apologies for any cross-postings.

SYSTEM BETA RELEASE

===

REFSEER: Citation Recommendation based on Topic Modeling
(http://refseer.ist.psu.edu)

DESCRIPTION: RefSeer (http://refseer.ist.psu.edu), a beta release Citation
Recommendation System, is hosted at the Pennsylvania State University.
Given either text from an abstract/description or a pdf document of a paper
or part of a paper, RefSeer recommends documents from the CiteSeerX repository
as citations. RefSeer internally computes from the text a topical composition
based on topic modeling. Based upon this composition, recommended citations for
majority topics are ranked and shown in a faceted interface. Refseer uses 
documents
in CiteSeerX for both its offline training and online recommendation. It is
currently trained with over 1.5 million documents and new documents
are added monthly in its citation database.

CONTACT: Saurabh Kataria (skata...@ist.psu.edu); Prasenjit Mitra
(pmi...@ist.psu.edu); C. Lee Giles (gi...@ist.psu.edu)

Feedback most welcomed.

===
___


Disciplinary Repository workshop at JCDL2011 - Call for presentations

2011-04-19 Thread Lee Giles
ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2011: Workshop on 
Disciplinary Repositories and Field-Specific Digital Libraries

*** Call for Presentations ***

Disciplinary repositories (DR) are a very particular instance of digital 
libraries, focused on collections of documents (and increasingly additional 
material) pertinent to a particular subject area or discipline. Several 
disciplinary repositories have grown to be cornerstones of the scientific 
workflows of scholars in the areas they serve, more successfully than broadband 
tools such as the freely accessible Google Scholar or subscription based 
services such Web of Knowledge, SCOPUS, and INSPEC.

The large user bases of disciplinary repositories (sometimes all the scholars 
of a discipline) and their large corpuses (sometimes all scientific articles in 
a field) makes them unique computer science, information science and social 
laboratories.

This workshop will be held at JCDL 2011 (http://www.jcdl2011.org) from 1pm 
Thursday 16 June through 12noon Friday 17 June. The workshop will be of 
interest to anyone running or planning a DR, and anyone interested in data 
mining DR corpora. It will share secrets for success; allow discussions of 
technology, services, interoperability, and the engagement of users; and foster 
communication within the DR community. We call for proposals for short or 
lightening presentations on all aspects of disciplinary repositories and 
field-specific digital libraries. Some slots for longer talks may be made 
available for talks of particular interest and relevance for the audience. 
Topics may include:

* DR architecture, infrastructure and maintenance
* Social aspects: populating and growing DRs
* Sustainability through open access, proprietary access and hybrid models
* User interaction, interface design and usability
* Value-added and innovative services
* Interaction and integration with IRs, other DRs and proprietary systems
* DR as research corpus and platform for experiments

Please submit one page proposals in PDF to jcdl2011.dr.works...@gmail.com by 1 
May 2011. Notification of acceptance will follow by 9 May 2011 with indication 
of talk length (lightening or longer contribution). At that time a timetable 
will be posted on the workshop website 
(https://indico.cern.ch/event/JCDL2011-DR). All accepted proposals will be 
collected with outcomes from the workshop in a summary article outlining the 
status of digital repositories. The workshop will have no proceedings.

Workshop chairs:
C. Lee Giles (Pennsylvania State University),
Salvatore Mele (CERN),
Simeon Warner (Cornell University)



CiteSeerX indexes tables

2009-06-11 Thread Lee Giles
[ The following text is in the windows-1252 character set. ]
[ Your display is set for the iso-8859-1 character set.  ]
[ Some characters may be displayed incorrectly. ]

CiteSeerX now provides indexing and ranking of tables in documents. This
new feature
will soon be released in open source as part of the CiteSeerX open
source project.
Currently, nearly a million tables are indexed.

http://citeseerx.ist.psu.edu

In addition, a demo of the data extraction from tables in pdf files can
be found at:

http://chemxseer.ist.psu.edu/ChemXSeerTableExtractor/TableExtractorServlet

Of course, feedback is always welcomed.

Best

Lee Giles


SeerSuite

2008-11-26 Thread Lee Giles
The CiteSeerX and ChemXSeer team has put the tool kit
SeerSuite online and made it open source. It can be found at:
 http://citeseerx.sourceforge.net/
The SeerSuite tool kit allows others to build a CiteSeerX like
system. A brief description of SeerSuite can be found here:
 http://en.wikipedia.org/wiki/SeerSuite
This is the first version, beta-0.1 and it is intended that many
more will follow with corrections, updates and improvements.
Comments and feedback are most welcomed.

Best

Lee Giles


Re: Repositories using some form of automatically generated metadata

2008-08-11 Thread Lee Giles
CiteSeerX uses nothing but automated metadata extraction. You can try it
out at

http://citeseerx.ist.psu.edu

Best

Lee Giles

Mahendra Mahey wrote:
 I am trying to find the extent to which repositories are using some form
 of automatically generated metadata.
 
 This could be in the form of automatically inserting the depositors
 details into the author field as a suggestion (if they are indeed the
 author - as sometimes they are not),  a pick list appearing on a deposit
 form from an internal database, to the use of automatic classification
 systems that populate fields such as keywords, subject, title etc after
 an analysis of the item deposited.
 
 *Questions*
 
 If your repository is using auto metadata...
 
 What kind of auto metadata is being used and how? Has this been formally
 documented? Is this available? If not, could you provide me with a
 screnshot?
 
 If you are not using it, I am assuming that you would like to use some
 form of it, as long as it is reliable?  If any of you have objections or
 bad experiences to using auto generated metadata, please let me know why.
 
 Could you please *reply to me off list*?  I will of course provide the
 list with a summary of my findings.
 
 Thank you
 
 --
 ---
 Mr Mahendra Mahey
 
 Repositories Research Officer
 
 UKOLN,
 University of Bath,
 Bath,
 BA2 7AY
 Tel: ++44 (0) 1225 384594
 Fax: ++44 (0) 1225 386256
 email: m.ma...@ukoln.ac.uk
 skypeID: mr_mahendra_mahey
 Mobile: ++44 (0) 7896300820
 ---


Re: Proposed update of BOAI definition of OA: Immediate and Permanent

2005-03-17 Thread Lee Giles

I strongly agree with these sentiments. If you don't include us, we will
go elsewhere and create our own open access policies and movement.
What a waste.

Best

Lee Giles
Computer and Information Scientist and Scholar

Laurent Romary wrote:


Iwas not planning to answer this thread, but any statement that does not reflect
the practices in communities such as computer science is not likely to be
endorsed by multidisciplinary bodies such as CNRS.
Laurent Romary

Selon J.F.B.Rowland j.f.rowl...@lboro.ac.uk:




Having spent all morning at a meeting discussing various academics'
outputs and whether they are acceptable to the University's management
for Research Assessment Exercise purposes, I heard this very argument from a
computer scientist. The Pro Vice Chancellor for Research (an engineer, by
the way) would have none of it.  Journal articles only, please!

Fytton Rowland, Loughborough University




As a Computer Scientist, I automatically  read peer reviewed journal
as peer reviewed (journal/conference/workshop/symposium), because
that's the convention of my discipline, where a
conference/workshop/symposium is a peer review service provider.




Re: Google Scholar

2005-02-16 Thread Lee Giles

The Google scholar is outstanding, but I still feel there is a place
for specialized search in topical domains such as CiteSeer, which
I maintain. Our community still very much likes CiteSeer but
also uses the Google Scholar.

Best

Lee Giles

Thomas Walker wrote:


As T.S.Mahadevan recently pointed out on the BOAI Forum, what those
who are
searching for open archive and other scholarly literature really want
is a
single website where they can search the entire set of such literature.

Google is already accounting for a significant portion of the hits on the
OA journal articles I monitor.  Might Google Scholar be that website?

===

Google Scholar (beta version online at http://scholar.google.com)
restricts
Google searches to scholarly literature, including peer-reviewed papers,
theses, books, preprints, abstracts and technical reports from all fields
of research, and finds articles from a wide variety of academic
publishers,
professional societies, preprint repositories and universities, as
well as
scholarly articles available across the web.  Google Scholar ranks search
results by their relevance to the query, so the most useful references
should appear at the top of the page. The relevance ranking takes into
account the full text of each article as well as the article's author,
the
publication in which the article appeared and how often it has been cited
in scholarly literature. Google Scholar also automatically analyzes and
extracts citations and presents them as separate results, even if the
documents they refer to are not online. This means that search results
may
include citations of older works and seminal articles that appear only in
books or other offline publications. [Parts of this description taken
directly from http://scholar.google.com/scholar/about.html#about.]

===

Tom Walker




Thomas J. Walker
Department of Entomology  Nematology
PO Box 110620 (or Natural Area Drive)
University of Florida, Gainesville, FL 32611-0620
E-mail: t...@ufl.edu  (or tjwal...@ifas.ufl.edu)
FAX: (352)392-0190
Web: http://tjwalker.ifas.ufl.edu



CiteSeer OAI compliant

2004-11-15 Thread Lee Giles

We would like to announce that CiteSeer, a public digital library
and search engine in computer and information science, is now OAI
compliant. Currently, CiteSeer has over 700,000 documents, all from web
crawling or author submission and is being hosted at Penn State's School
of Information Sciences and Technology.

CiteSeer OAI metadata is automatically generated and does have errors
which we hope to correct in the future using new automated correcting
and checking algorithms.

For more details on harvesting the OAI metadata please see:

http://citeseer.ist.psu.edu/oai.html

Best regards,

Lee Giles
David Reese Professor
School of Information Sciences and Technology
The Pennsylvania State University,
University Park, PA
http://clgiles.ist.psu.edu/


Re: Open-access consciousness articles -- and how to find them on the Web

2002-11-04 Thread Lee Giles

How come you don't mention CiteSeer?

www.citeseer.org which has over 500,000 articles indexed?

Lee Giles

Thomas Zoega Ramsoy wrote:


Re: How many papers are there in the OAI-compliant archives?

2002-10-31 Thread Lee Giles

We hope to bring all of CiteSeer in compliance soon; maybe by the end of
the year.

Lee

Stevan Harnad wrote:


On Wed, 30 Oct 2002, Imre Simon wrote:




I would like to interest some people (and some Institutions too) in
Brazil to start self-archiving their work. It would be helpful to have
some statistics about the dimensions of the OAI-compliant archives and
at what rate are they growing? What is the proportion of the papers
whose full text is also available?

Do these statistics exist, and where are they? So far I couldn't find
them.

In case they do not exist, what would be the most interesting numbers
to measure? Don't you think that it would be wise to document the
evolution of the dimensions of the OAI-compliant library while
everybody is sweating to climb the mountain?




Kedves Imre,

The statistics exist, though they have not been collected
systematically. You are encouraged to gather the study, and
report the data (especially growth across time, which shows
some signs of picking up at last).

For the total annual number of peer reviewed articles (about 2,000,000,
in about 20,000 peer-reviewed journals worldwide) see, for example:
http://www.ulrichsweb.com/ulrichsweb/

For the proportion of them that institutional libraries can afford to
pay for toll-access to, see: http://www.arl.org/stats/index.html

For lists (not exhaustive) of OAI Archives, see:
http://oaisrv.nsdl.cornell.edu/Register/BrowseSites.pl
and
http://software.eprints.org/#sites

Among the OAI harvesters:
http://www.openarchives.org/service/listproviders.html
you will find, for example,
http://oaister.umdl.umich.edu/viewcolls.html
which indexes 858,709 records from 110 institutions (updated 3 October
2002), including 200,000 from the Phsyics ArXiv, which has tracked
its own growth statistics since 1991:
http://arxiv.org/show_monthly_submissions

But there are also huge non-OAI (or not-yet-OAI) open-access archives,
such as citeseer, with 600,000 harvested computer science papers:
http://citeseer.nj.nec.com/statistics.html

But some time-series data on the growth of all of these archives would
certainly be very welcome, the objective being to get all of the annual
2,000,000 in all disciplines worldwide self-archived and open-access as
soon as possible.

Udvozlettel,

Stevan Harnad

NOTE: A complete archive of the ongoing discussion of providing open
access to the peer-reviewed research literature online is available at
the American Scientist September Forum (98  99  00  01  02):

   
http://amsci-forum.amsci.org/archives/American-Scientist-Open-Access-Forum.html
   or
   http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/index.html

Discussion can be posted to: american-scientist-open-access-fo...@amsci.org

See also the Budapest Open Access Initiative:
   http://www.soros.org/openaccess

the Free Online Scholarship Movement:
   http://www.earlham.edu/~peters/fos/timeline.htm

the OAI site:
   http://www.openarchives.org

and the free OAI institutional archiving software site:
   http://www.eprints.org/




Entire Editoral Board Resigns En Masse

2001-10-09 Thread C. Lee Giles
 alternatives.  One salient example is the case of the
 journal Logic Programming.  In 1999, the editors and editorial
 advisors of this journal resigned to join Theory and Practice of
 Logic Programming, a Cambridge University Press journal that encourages
 electronic dissemination of papers.

 In summary, our resignation from the editorial board of MLJ reflects
 our belief that journals should principally serve the needs of the
 intellectual community, in particular by providing the immediate and
 universal access to journal articles that modern technology supports,
 and doing so at a cost that excludes no one.  We are excited about JMLR,
 which provides this access and does so unconditionally.  We feel that
 JMLR provides an ideal vehicle to support the near-term and long-term
 evolution of the field of machine learning and to serve as the flagship
 journal for the field.  We invite all of the members of the community
 to submit their articles to the journal and to contribute actively to
 its growth.

 Sincerely yours,

   Chris Atkeson
   Peter Bartlett
   Andrew Barto
   Jonathan Baxter
   Yoshua Bengio
   Kristin Bennett
   Chris Bishop
   Justin Boyan
   Carla Brodley
   Claire Cardie
   William Cohen
   Peter Dayan
   Tom Dietterich
   Jerome Friedman
   Nir Friedman
   Zoubin Ghahramani
   David Heckerman
   Geoffrey Hinton
   Haym Hirsh
   Tommi Jaakkola
   Michael Jordan
   Leslie Kaelbling
   Daphne Koller
   John Lafferty
   Sridhar Mahadevan
   Marina Meila
   Andrew McCallum
   Tom Mitchell
   Stuart Russell
   Lawrence Saul
   Bernhard Schoelkopf
   John Shawe-Taylor
   Yoram Singer
   Satinder Singh
   Padhraic Smyth
   Richard Sutton
   Sebastian Thrun
   Manfred Warmuth
   Chris Williams
   Robert Williamson

--
Dr. C. Lee Giles, David Reese Professor
School of Information Sciences and Technology
and Computer Science and Engineering
The Pennsylvania State University
504 Rider Building, 120 S Burrowes St
University Park, PA, 16801, USA
gi...@ist.psu.edu - 814 865 7884
http://ist.psu.edu/giles


Re: PostGutenberg Copyrights and Wrongs for Give-Away Research

2001-06-22 Thread Lee Giles
Standards are great and often make the difference between the success and
failure of an endeavor. But in some cases other standards can be used and not 
put
additional burdens on authors and users. It's possible to set up an open archive
that's useful and not require authors any additional work except putting
their papers on their web site in some eformat. This works because there are
already a few but widely used accepted standards for publishing documents - pdf,
doc, postscript, html, etc. (It would be easy to include new ones such as xml.)

The archive works by being active instead of passive. A smart crawler
spiders the web searching for manuscripts. After finding the edocuments,
an indexer converts the documents to text, indexes them and provides a
query engine that allows search based on key words, phrases and citations.
Other features such as cocitation, active bibliographies, collaborative
filtering, etc. can be installed. Links to the original papers can be 
maintained.

This is entire process is automated except for requirement that the authors
place their papers in some standard eformat in an accessible web site.
Because this is automated, some errors do occur. Subsequently, authors and
others can ask for corrections.

As an example, see researchindex.org and cora.whizbang.com which have
archives for computer science papers. These two archives
have over 300,000 papers, 500,000 unique authors and 3 million citations. In
addition, they receive about 100,000 page views a day. The researchindex
software is free for noncommercial use and cora has established a new
archive for statistics papers.

Best regards,

Lee Giles

Stevan Harnad wrote:

 On Fri, 22 Jun 2001, Thomas J. Walker wrote:

  sh [I might add only that the distinction between personal web home page
  sh and e-print servers is silly, incoherent, and hence untenable, but it
  sh makes no difference, if it makes some people happy to put it that 
  way...]
 
  There is distinction that to many authors may be important:
 
  E-print servers that are well stocked are a somewhat more convenient place
  to look for particular articles compared to hunting down the authors' home
  pages and looking there.  Of greater consequence, researchers who are not
  looking for articles by the authors in question may find articles by them
  on that well-stocked e-print server, like them, and use them.

 Quite right, and this is one of the principal rationales for the Open
 Archives Initiative (OAI) http://www.openarchives.org and Eprints
 archive-creating software http://www.eprints.org

 OAI provides a tagging standard that makes all registered OAI-compliant
 Archives interoperable, hence harvestable across archives
 http://oaisrv.nsdl.cornell.edu/Register/BrowseSites.pl
 so you need not know the URL of the paper or the
 author.

 You just search them like one big virtual archive in a centralized
 index: See http://cite-base.ecs.soton.ac.uk/cgi-bin/search
 and http://arc.cs.odu.edu/

 But the home-page/public distinction is moot, since authors can run
 their own eprints servers too, and register them as OAI-compliant!
 http://rocky.dlib.vt.edu/~oai/cgi-bin/Explorer/oai1.0/testoai

 
 Stevan Harnad har...@cogsci.soton.ac.uk
 Professor of Cognitive Sciencehar...@princeton.edu
 Department of Electronics and phone: +44 23-80 592-582
  Computer Science fax:   +44 23-80 592-865
 University of Southamptonhttp://www.ecs.soton.ac.uk/~harnad/
 Highfield, Southampton   http://www.princeton.edu/~harnad/
 SO17 1BJ UNITED KINGDOM

--
Dr. C. Lee Giles, David Reese Professor
School of Information Sciences and Technology
and Computer Science and Engineering
The Pennsylvania State University
504 Rider Building, 120 S Burrowes St
University Park, PA, 16801, USA
gi...@ist.psu.edu - 814 865 7884
http://ist.psu.edu/giles


Re: List of journals currently allowing self-archiving?

2001-04-24 Thread Lee Giles
To my knowledge all IEEE and ACM transactions and journals will accept
submissions that have been on reprint servers.

Lee Giles

hb...@tours.inra.fr wrote:

 Another list on Netprints  with this remark :* Despite our best efforts
 this list is not comprehensive nor necessarily up to date. Authors should
 check with editors of individual journals for the current status.

 http://intl-clinmed.netprints.org/misc/policies.shtml

 At 18:33 23/04/01 -0700, vous avez écrit:
  From: Declan Butler, Nature d.but...@nature-france.com
  To: ch...@cprince.com
  Sent: Friday, April 20, 2001 9:38 AM
  Subject: TR: List of journals currently allowing self-archiving?
 
 
   Dear Chris
   Did you find such a list (see below)?; I'm looking for one also.
   Many thanks
   Declan
 
 Dear Declan,
 
 I can't say as I've had much success on this. I have posted what I have
 found so far (see next link). I would greatly appreciate additions.
 
 http://www.cprince.com/projects/Eprints/self-archive.html
 
 Chris.
 
 -
 Christopher G. Prince, Ph.D.
 University of Minnesota Duluth
 Department of Computer Science
 320 Heller Hall, 10 University Drive
 Duluth, MN  55812 USA  (218) 726-6514
 ch...@cprince.com   http://www.cprince.com

 Helene Bosc
 Bibliotheque
 Unite Physiologie de la Reproduction
 et des Comportements
 UMR 6073 INRA-CNRS-Universite F. Rabelais
 37380 Nouzilly
  France

 http://www.tours.inra.fr/
 TEL : 02 47 42 78 00
 FAX : 02 47 42 77 43
 e-mail: hb...@tours.inra.fr

--
Dr. C. Lee Giles, David Reese Professor
School of Information Sciences and Technology
and Computer Science and Engineering
The Pennsylvania State University
504 Rider Building, 120 S Burrowes St
University Park, PA, 16801, USA
gi...@ist.psu.edu - 814 865 7884
http://ist.psu.edu/giles