[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-09 Thread Bosman, J.M. (Jeroen)
Marcin,

This is a great initiative. I had been hoping BASEsearch would take on this 
task, but it is good to see others are stepping in.

Congrats on the initiative. Still, a long way to go

Could you elaborate on how your technology is able to recognize “true peer 
reviewed papers” and what you consider to be “ true peer reviewed papers”?

Best,
Jeroen Bosman
@jeroenbosman
Utrecht University Library
From: goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] On Behalf Of 
Marcin Wojnarski
Sent: donderdag 9 oktober 2014 14:51
To: Global Open Access List (Successor of AmSci)
Subject: [GOAL] Paperity launched. The 1st multidisciplinary aggregator of OA 
journals & papers

(press release, apologies for cross-posting)

With the beginning of the new academic year, Paperity, the 
first multidisciplinary aggregator of Open Access journals and papers, has been 
launched. Paperity will connect authors with readers, boost dissemination of 
new discoveries and consolidate academia around open literature.

Right now, Paperity (http://paperity.org/) includes over 
160,000 open articles, "gold" and "hybrid", from 2,000 scholarly journals, and 
growing. The goal of the team is to cover - with the support of journal editors 
and publishers - 100% of Open Access literature in 3 years from now. In order 
to achieve this, Paperity utilizes an original technology for article indexing, 
designed by Marcin Wojnarski, a data geek from Poland and a medalist of the 
International Mathematical Olympiad. This technology indexes only true 
peer-reviewed scholarly papers and filters out irrelevant entries, which easily 
make it into other aggregators and search engines.

The amount of scholarly literature has grown enormously in the last decades. 
Successful dissemination became a big issue. New tools are needed to help 
readers access vast amounts of literature dispersed all over the web and to 
help authors reach their target audience. Moreover, research is 
interdisciplinary now and scholars need broad access to literature from many 
fields, also from outside of their core research area. This is the reason why 
Paperity covers all subjects, from Sciences, Technology, Medicine, through 
Social Sciences, to Humanities and Arts.

- There are lots of great articles out there which report new significant 
findings, yet attract no attention, only because they are hard to find. No more 
than top 10% of research institutions have good access to communication 
channels and can share their findings efficiently. The remaining 90%, 
especially authors from developing countries and early-career researchers, 
start from a much lower stand and often stay unnoticed despite high quality of 
their work – says Wojnarski. He adds that it is not by accident that Paperity 
partners right now with the EU Contest for Young Scientists, the biggest 
science fair in Europe. With the help of Paperity, the Contest wants to improve 
dissemination of discoveries authored by its participants – top young talents 
from all over the continent.

Paperity is the first service of this kind. The most similar existing website, 
PubMed Central, aggregates open journals, too, but is limited to life sciences 
alone. Another related service, the Directory of Open Access Journals, does 
index articles from multiple periodicals and different disciplines, but does 
not provide aggregation, only pure indexing: it shows metadata of articles, but 
for fulltext access redirects to external sites. Moreover, both PMC and DOAJ 
impose strict technical requirements on participating journals, which limits 
the scope of aggregation. Paperity adapts to whatever technology a given 
periodical employs.

Paperity website: http://paperity.org/




--

Marcin Wojnarski, Founder of Paperity, www.paperity.org

www.linkedin.com/in/marcinwojnarski

www.facebook.com/Paperity

www.twitter.com/Paperity



Paperity. Open science aggregated.
___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal


[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-09 Thread Marcin Wojnarski

Jeroen,

Thanks, it's great to hear that you like Paperity!

"True peer-reviewed" means published in a peer-reviewed journal, in 
contrast to a pdf just posted somewhere on the web (think Google 
Scholar), which can be anything: a peer-reviewed paper or not, published 
or not, even randomly generated to resemble a scholarly article, for 
example to pump up G Scholar citations (http://arxiv.org/abs/1212.0638).


The new technology is called REgular Document EXpressions (redex). It is 
a computer language for analyzing long and complex documents, 
particularly written in a markup, like HTML or XML. It facilitates 
analysis of web context where the paper occured, which is critical for 
maintaining the link between the paper and its journal. Redex builds on 
top of the very fundamental technology of regular expressions (regex), 
but redefines the language entirely to make it suitable for large 
structured texts.


Best,
Marcin

On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:


Marcin,

This is a great initiative. I had been hoping BASEsearch would take on 
this task, but it is good to see others are stepping in.


Congrats on the initiative. Still, a long way to go

Could you elaborate on how your technology is able to recognize “true 
peer reviewed papers” and what you consider to be “ true peer reviewed 
papers”?


Best,

Jeroen Bosman

@jeroenbosman

Utrecht University Library

*From:*goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] *On 
Behalf Of *Marcin Wojnarski

*Sent:* donderdag 9 oktober 2014 14:51
*To:* Global Open Access List (Successor of AmSci)
*Subject:* [GOAL] Paperity launched. The 1st multidisciplinary 
aggregator of OA journals & papers


(press release, apologies for cross-posting)

*With the beginning of the new academic year, Paperity 
, the first multidisciplinary aggregator of Open 
Access journals and papers, has been launched. Paperity will connect 
authors with readers, boost dissemination of new discoveries and 
consolidate academia around open literature.*


Right now, Paperity  (http://paperity.org/) 
includes over 160,000 open articles, "gold" and "hybrid", from 2,000 
scholarly journals, and growing. The goal of the team is to cover - 
with the support of journal editors and publishers - 100% of Open 
Access literature in 3 years from now. In order to achieve this, 
Paperity utilizes an original technology for article indexing, 
designed by Marcin Wojnarski, a data geek from Poland and a medalist 
of the International Mathematical Olympiad. This technology indexes 
only true peer-reviewed scholarly papers and filters out irrelevant 
entries, which easily make it into other aggregators and search engines.


The amount of scholarly literature has grown enormously in the last 
decades. Successful dissemination became a big issue. New tools are 
needed to help readers access vast amounts of literature dispersed all 
over the web and to help authors reach their target audience. 
Moreover, research is interdisciplinary now and scholars need broad 
access to literature from many fields, also from outside of their core 
research area. This is the reason why Paperity covers all subjects, 
from Sciences, Technology, Medicine, through Social Sciences, to 
Humanities and Arts.


- /There are lots of great articles out there which report new 
significant findings, yet attract no attention, only because they are 
hard to find. No more than top 10% of research institutions have good 
access to communication channels and can share their findings 
efficiently. The remaining 90%, especially authors from developing 
countries and early-career researchers, start from a much lower stand 
and often stay unnoticed despite high quality of their work/ – says 
Wojnarski. He adds that it is not by accident that Paperity partners 
right now with the EU Contest for Young Scientists, the biggest 
science fair in Europe. With the help of Paperity, the Contest wants 
to improve dissemination of discoveries authored by its participants – 
top young talents from all over the continent.


Paperity is the first service of this kind. The most similar existing 
website, PubMed Central, aggregates open journals, too, but is limited 
to life sciences alone. Another related service, the Directory of Open 
Access Journals, does index articles from multiple periodicals and 
different disciplines, but does not provide aggregation, only pure 
indexing: it shows metadata of articles, but for fulltext access 
redirects to external sites. Moreover, both PMC and DOAJ impose strict 
technical requirements on participating journals, which limits the 
scope of aggregation. Paperity adapts to whatever technology a given 
periodical employs.


Paperity website: http://paperity.org/ 




--
Marcin Wojnarski, Founder of Paperity,www.paperity.org  

www.linkedin.com/in/marcinwojnarski  

w

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-11 Thread BAUIN Serge
Marcin,

May I ask "what is the economic model of Paperity?"
I didn't find any information about that on your web site.

Cheers

Serge

Envoyé d'un téléphone portable, désolé pour le caractère inélégant...

Le 10 oct. 2014 à 08:22, "Marcin Wojnarski" 
mailto:mwojn...@ns.onet.pl>> a écrit :

Jeroen,

Thanks, it's great to hear that you like Paperity!

"True peer-reviewed" means published in a peer-reviewed journal, in contrast to 
a pdf just posted somewhere on the web (think Google Scholar), which can be 
anything: a peer-reviewed paper or not, published or not, even randomly 
generated to resemble a scholarly article, for example to pump up G Scholar 
citations (http://arxiv.org/abs/1212.0638).

The new technology is called REgular Document EXpressions (redex). It is a 
computer language for analyzing long and complex documents, particularly 
written in a markup, like HTML or XML. It facilitates analysis of web context 
where the paper occured, which is critical for maintaining the link between the 
paper and its journal. Redex builds on top of the very fundamental technology 
of regular expressions (regex), but redefines the language entirely to make it 
suitable for large structured texts.

Best,
Marcin

On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:
Marcin,

This is a great initiative. I had been hoping BASEsearch would take on this 
task, but it is good to see others are stepping in.

Congrats on the initiative. Still, a long way to go

Could you elaborate on how your technology is able to recognize “true peer 
reviewed papers” and what you consider to be “ true peer reviewed papers”?

Best,
Jeroen Bosman
@jeroenbosman
Utrecht University Library
From: goal-boun...@eprints.org 
[mailto:goal-boun...@eprints.org] On Behalf Of Marcin Wojnarski
Sent: donderdag 9 oktober 2014 14:51
To: Global Open Access List (Successor of AmSci)
Subject: [GOAL] Paperity launched. The 1st multidisciplinary aggregator of OA 
journals & papers

(press release, apologies for cross-posting)

With the beginning of the new academic year, Paperity, the 
first multidisciplinary aggregator of Open Access journals and papers, has been 
launched. Paperity will connect authors with readers, boost dissemination of 
new discoveries and consolidate academia around open literature.

Right now, Paperity (http://paperity.org/) includes over 
160,000 open articles, "gold" and "hybrid", from 2,000 scholarly journals, and 
growing. The goal of the team is to cover - with the support of journal editors 
and publishers - 100% of Open Access literature in 3 years from now. In order 
to achieve this, Paperity utilizes an original technology for article indexing, 
designed by Marcin Wojnarski, a data geek from Poland and a medalist of the 
International Mathematical Olympiad. This technology indexes only true 
peer-reviewed scholarly papers and filters out irrelevant entries, which easily 
make it into other aggregators and search engines.

The amount of scholarly literature has grown enormously in the last decades. 
Successful dissemination became a big issue. New tools are needed to help 
readers access vast amounts of literature dispersed all over the web and to 
help authors reach their target audience. Moreover, research is 
interdisciplinary now and scholars need broad access to literature from many 
fields, also from outside of their core research area. This is the reason why 
Paperity covers all subjects, from Sciences, Technology, Medicine, through 
Social Sciences, to Humanities and Arts.

- There are lots of great articles out there which report new significant 
findings, yet attract no attention, only because they are hard to find. No more 
than top 10% of research institutions have good access to communication 
channels and can share their findings efficiently. The remaining 90%, 
especially authors from developing countries and early-career researchers, 
start from a much lower stand and often stay unnoticed despite high quality of 
their work – says Wojnarski. He adds that it is not by accident that Paperity 
partners right now with the EU Contest for Young Scientists, the biggest 
science fair in Europe. With the help of Paperity, the Contest wants to improve 
dissemination of discoveries authored by its participants – top young talents 
from all over the continent.

Paperity is the first service of this kind. The most similar existing website, 
PubMed Central, aggregates open journals, too, but is limited to life sciences 
alone. Another related service, the Directory of Open Access Journals, does 
index articles from multiple periodicals and different disciplines, but does 
not provide aggregation, only pure indexing: it shows metadata of articles, but 
for fulltext access redirects to external sites. Moreover, both PMC and DOAJ 
impose strict technical requirements on participating journals, which limits 
the scope of aggregation. Paperity adapts to w

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-11 Thread Dana Roth
It would be nice if 'Paperity' would maintain a listing of the publishers of 
the journals they index.
T-R does this for Web of Science Journal Citation Reports, and it is very 
helpful.

Dana L. Roth
Millikan Library / Caltech 1-32
1200 E. California Blvd. Pasadena, CA 91125
626-395-6423 fax 626-792-7540
dzr...@library.caltech.edu<mailto:dzr...@library.caltech.edu>
http://library.caltech.edu/collections/chemistry.htm

From: goal-boun...@eprints.org [goal-boun...@eprints.org] on behalf of BAUIN 
Serge [serge.ba...@cnrs.fr]
Sent: Saturday, October 11, 2014 12:07 PM
To: Global Open Access List (Successor of AmSci)
Subject: [GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of 
OA journals & papers

Marcin,

May I ask "what is the economic model of Paperity?"
I didn't find any information about that on your web site.

Cheers

Serge

Envoyé d'un téléphone portable, désolé pour le caractère inélégant...

Le 10 oct. 2014 à 08:22, "Marcin Wojnarski" 
mailto:mwojn...@ns.onet.pl>> a écrit :

Jeroen,

Thanks, it's great to hear that you like Paperity!

"True peer-reviewed" means published in a peer-reviewed journal, in contrast to 
a pdf just posted somewhere on the web (think Google Scholar), which can be 
anything: a peer-reviewed paper or not, published or not, even randomly 
generated to resemble a scholarly article, for example to pump up G Scholar 
citations (http://arxiv.org/abs/1212.0638).

The new technology is called REgular Document EXpressions (redex). It is a 
computer language for analyzing long and complex documents, particularly 
written in a markup, like HTML or XML. It facilitates analysis of web context 
where the paper occured, which is critical for maintaining the link between the 
paper and its journal. Redex builds on top of the very fundamental technology 
of regular expressions (regex), but redefines the language entirely to make it 
suitable for large structured texts.

Best,
Marcin

On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:
Marcin,

This is a great initiative. I had been hoping BASEsearch would take on this 
task, but it is good to see others are stepping in.

Congrats on the initiative. Still, a long way to go

Could you elaborate on how your technology is able to recognize “true peer 
reviewed papers” and what you consider to be “ true peer reviewed papers”?

Best,
Jeroen Bosman
@jeroenbosman
Utrecht University Library
From: goal-boun...@eprints.org<mailto:goal-boun...@eprints.org> 
[mailto:goal-boun...@eprints.org] On Behalf Of Marcin Wojnarski
Sent: donderdag 9 oktober 2014 14:51
To: Global Open Access List (Successor of AmSci)
Subject: [GOAL] Paperity launched. The 1st multidisciplinary aggregator of OA 
journals & papers

(press release, apologies for cross-posting)

With the beginning of the new academic year, Paperity<http://paperity.org>, the 
first multidisciplinary aggregator of Open Access journals and papers, has been 
launched. Paperity will connect authors with readers, boost dissemination of 
new discoveries and consolidate academia around open literature.

Right now, Paperity<http://paperity.org> (http://paperity.org/) includes over 
160,000 open articles, "gold" and "hybrid", from 2,000 scholarly journals, and 
growing. The goal of the team is to cover - with the support of journal editors 
and publishers - 100% of Open Access literature in 3 years from now. In order 
to achieve this, Paperity utilizes an original technology for article indexing, 
designed by Marcin Wojnarski, a data geek from Poland and a medalist of the 
International Mathematical Olympiad. This technology indexes only true 
peer-reviewed scholarly papers and filters out irrelevant entries, which easily 
make it into other aggregators and search engines.

The amount of scholarly literature has grown enormously in the last decades. 
Successful dissemination became a big issue. New tools are needed to help 
readers access vast amounts of literature dispersed all over the web and to 
help authors reach their target audience. Moreover, research is 
interdisciplinary now and scholars need broad access to literature from many 
fields, also from outside of their core research area. This is the reason why 
Paperity covers all subjects, from Sciences, Technology, Medicine, through 
Social Sciences, to Humanities and Arts.

- There are lots of great articles out there which report new significant 
findings, yet attract no attention, only because they are hard to find. No more 
than top 10% of research institutions have good access to communication 
channels and can share their findings efficiently. The remaining 90%, 
especially authors from developing countries and early-career researchers, 
start from a much lower stand and often stay unnoticed despite high quality of 
their work – says Wojnarski. He adds that it is not by accident that Pap

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-12 Thread Peter Murray-Rust
On Sun, Oct 12, 2014 at 2:08 AM, Dana Roth 
wrote:

>  It would be nice if 'Paperity' would maintain a listing of the
> publishers of the journals they index.
> T-R does this for Web of Science Journal Citation Reports, and it is very
> helpful.
>
>
Is this listing
(a) publicly visible - or only available to WoS subscribers?
(b) re-usable without further permission from T-R? (CC-BY or weaker?)

If it's not re-usable then we need a fully Open equivalent for indexable
journals.



> Dana L. Roth
> Millikan Library / Caltech 1-32
> 1200 E. California Blvd. Pasadena, CA 91125
> 626-395-6423 fax 626-792-7540
> dzr...@library.caltech.edu
> http://library.caltech.edu/collections/chemistry.htm
>   --
>
>
-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal


[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-12 Thread Stevan Harnad
Harvesting Gold OA journal articles is a piece of cake. How will Paperity/redex 
harvest
Green OA articles published in non-OA journals but made OA somewhere on the
Web — via Google Scholar?

Sounds like a splendid idea if it can be done… But not if it is just 
Gold-biassed,
because most refereed research is not Gold, and the fastest growing form of
OA is Green (because of mandates, and absence of extra cost).

SH

On Oct 11, 2014, at 9:08 PM, Dana Roth  wrote:

> It would be nice if 'Paperity' would maintain a listing of the publishers of 
> the journals they index.
> T-R does this for Web of Science Journal Citation Reports, and it is very 
> helpful.
> 
> Dana L. Roth
> Millikan Library / Caltech 1-32
> 1200 E. California Blvd. Pasadena, CA 91125
> 626-395-6423 fax 626-792-7540
> dzr...@library.caltech.edu
> http://library.caltech.edu/collections/chemistry.htm
> From: goal-boun...@eprints.org [goal-boun...@eprints.org] on behalf of BAUIN 
> Serge [serge.ba...@cnrs.fr]
> Sent: Saturday, October 11, 2014 12:07 PM
> To: Global Open Access List (Successor of AmSci)
> Subject: [GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator 
> of OA journals & papers
> 
> Marcin,
> 
> May I ask "what is the economic model of Paperity?"
> I didn't find any information about that on your web site.
> 
> Cheers
> 
> Serge
> 
> Envoyé d'un téléphone portable, désolé pour le caractère inélégant...
> 
> Le 10 oct. 2014 à 08:22, "Marcin Wojnarski"  a écrit :
> 
>> Jeroen,
>> 
>> Thanks, it's great to hear that you like Paperity!
>> 
>> "True peer-reviewed" means published in a peer-reviewed journal, in contrast 
>> to a pdf just posted somewhere on the web (think Google Scholar), which can 
>> be anything: a peer-reviewed paper or not, published or not, even randomly 
>> generated to resemble a scholarly article, for example to pump up G Scholar 
>> citations (http://arxiv.org/abs/1212.0638).
>> 
>> The new technology is called REgular Document EXpressions (redex). It is a 
>> computer language for analyzing long and complex documents, particularly 
>> written in a markup, like HTML or XML. It facilitates analysis of web 
>> context where the paper occured, which is critical for maintaining the link 
>> between the paper and its journal. Redex builds on top of the very 
>> fundamental technology of regular expressions (regex), but redefines the 
>> language entirely to make it suitable for large structured texts.
>> 
>> Best,
>> Marcin
>> 
>> On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:
>>> Marcin,
>>>  
>>> This is a great initiative. I had been hoping BASEsearch would take on this 
>>> task, but it is good to see others are stepping in.
>>>  
>>> Congrats on the initiative. Still, a long way to go
>>>  
>>> Could you elaborate on how your technology is able to recognize “true peer 
>>> reviewed papers” and what you consider to be “ true peer reviewed papers”?
>>>  
>>> Best,
>>> Jeroen Bosman
>>> @jeroenbosman
>>> Utrecht University Library
>>> From: goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] On Behalf 
>>> Of Marcin Wojnarski
>>> Sent: donderdag 9 oktober 2014 14:51
>>> To: Global Open Access List (Successor of AmSci)
>>> Subject: [GOAL] Paperity launched. The 1st multidisciplinary aggregator of 
>>> OA journals & papers
>>>  
>>> (press release, apologies for cross-posting)
>>> 
>>> With the beginning of the new academic year, Paperity, the first 
>>> multidisciplinary aggregator of Open Access journals and papers, has been 
>>> launched. Paperity will connect authors with readers, boost dissemination 
>>> of new discoveries and consolidate academia around open literature.
>>> Right now, Paperity (http://paperity.org/) includes over 160,000 open 
>>> articles, "gold" and "hybrid", from 2,000 scholarly journals, and growing. 
>>> The goal of the team is to cover - with the support of journal editors and 
>>> publishers - 100% of Open Access literature in 3 years from now. In order 
>>> to achieve this, Paperity utilizes an original technology for article 
>>> indexing, designed by Marcin Wojnarski, a data geek from Poland and a 
>>> medalist of the International Mathematical Olympiad. This technology 
>>> indexes only true peer-reviewed scholarly papers and filters out irrelevant 
>>> entries, which easily make it into other aggregators and search engines.
>>

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-12 Thread Jan Velterop

On 12 Oct 2014, at 12:51, Stevan Harnad  wrote:

> Harvesting Gold OA journal articles is a piece of cake.

Indeed. Not just for Paperity, but for anybody else. It's one of the 
attractions and benefits of open access via the 'gold' route. Another is that 
most articles can be harvested in XML-format, which enables sophisticated and 
worthwhile services to be added to aggregations. And aggregations enable 
researchers to conveniently make large-scale pattern- and meta-analyses without 
first having to gather all the material from different and disparate sources. 
Few 'green' repositories that I'm aware of have XML-versions (correct me if I'm 
wrong – and should I be wrong, is there a list of such repositories?). 
Aggregations, by the way, cannot be made without clarity about rights and 
licences, since they are a form of re-use. Those rights are clear, and properly 
included in metadata, for proper 'gold', but often not for 'green' versions of 
paywalled articles in repositories.

> How will Paperity/redex harvest
> Green OA articles published in non-OA journals but made OA somewhere on the
> Web — via Google Scholar?

Indeed, how will they. Or anybody else?

JV

> 
> Sounds like a splendid idea if it can be done… But not if it is just 
> Gold-biassed,
> because most refereed research is not Gold, and the fastest growing form of
> OA is Green (because of mandates, and absence of extra cost).
> 
> SH
> 
> On Oct 11, 2014, at 9:08 PM, Dana Roth  wrote:
> 
>> It would be nice if 'Paperity' would maintain a listing of the publishers of 
>> the journals they index.
>> T-R does this for Web of Science Journal Citation Reports, and it is very 
>> helpful.
>> 
>> Dana L. Roth
>> Millikan Library / Caltech 1-32
>> 1200 E. California Blvd. Pasadena, CA 91125
>> 626-395-6423 fax 626-792-7540
>> dzr...@library.caltech.edu
>> http://library.caltech.edu/collections/chemistry.htm
>> From: goal-boun...@eprints.org [goal-boun...@eprints.org] on behalf of BAUIN 
>> Serge [serge.ba...@cnrs.fr]
>> Sent: Saturday, October 11, 2014 12:07 PM
>> To: Global Open Access List (Successor of AmSci)
>> Subject: [GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator 
>> of OA journals & papers
>> 
>> Marcin,
>> 
>> May I ask "what is the economic model of Paperity?"
>> I didn't find any information about that on your web site.
>> 
>> Cheers
>> 
>> Serge
>> 
>> Envoyé d'un téléphone portable, désolé pour le caractère inélégant...
>> 
>> Le 10 oct. 2014 à 08:22, "Marcin Wojnarski"  a écrit :
>> 
>>> Jeroen,
>>> 
>>> Thanks, it's great to hear that you like Paperity!
>>> 
>>> "True peer-reviewed" means published in a peer-reviewed journal, in 
>>> contrast to a pdf just posted somewhere on the web (think Google Scholar), 
>>> which can be anything: a peer-reviewed paper or not, published or not, even 
>>> randomly generated to resemble a scholarly article, for example to pump up 
>>> G Scholar citations (http://arxiv.org/abs/1212.0638).
>>> 
>>> The new technology is called REgular Document EXpressions (redex). It is a 
>>> computer language for analyzing long and complex documents, particularly 
>>> written in a markup, like HTML or XML. It facilitates analysis of web 
>>> context where the paper occured, which is critical for maintaining the link 
>>> between the paper and its journal. Redex builds on top of the very 
>>> fundamental technology of regular expressions (regex), but redefines the 
>>> language entirely to make it suitable for large structured texts.
>>> 
>>> Best,
>>> Marcin
>>> 
>>> On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:
>>>> Marcin,
>>>>  
>>>> This is a great initiative. I had been hoping BASEsearch would take on 
>>>> this task, but it is good to see others are stepping in.
>>>>  
>>>> Congrats on the initiative. Still, a long way to go
>>>>  
>>>> Could you elaborate on how your technology is able to recognize “true peer 
>>>> reviewed papers” and what you consider to be “ true peer reviewed papers”?
>>>>  
>>>> Best,
>>>> Jeroen Bosman
>>>> @jeroenbosman
>>>> Utrecht University Library
>>>> From: goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] On Behalf 
>>>> Of Marcin Wojnarski
>>>> Sent: donderdag 9 oktober 2014 14:51
>>>> To:

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-12 Thread Peter Murray-Rust
On Sun, Oct 12, 2014 at 1:44 PM, Jan Velterop  wrote:

>
> On 12 Oct 2014, at 12:51, Stevan Harnad  wrote:
>
> Harvesting Gold OA journal articles is a piece of cake.
>
>
> Indeed. Not just for Paperity, but for anybody else. It's one of the
> attractions and benefits of open access via the 'gold' route.
>

Yes,

It's noteworthy that almost all modern text and data mining exercises are
carried out on the Open Access subset of the literature. In some cases this
is an attempt to get the whole Open literature - in others it's a subsubset
such as EuropePubMedCentral. (The alternatives to this are (a) to ignore
rights and mine anyway - something we are legally allowed to do in the UK
but almost nowhere else or (b) do in in private hoping you won't be found
and scared of publishing your sources as a good scholar should).

Another is that most articles can be harvested in XML-format, which enables
> sophisticated and worthwhile services to be added to aggregations.
>

This is true for born-Open publishers such as BioMedCentral, PLOS*, eLIfe,
PeerJ, Ubiquity ... This is a straightforward sale - author payment =>
freedom for re-use. It works very well for text miners. (And please don't
tell us that mining is a minority sport which has to tread water for
another 5-10 years).

I have not systematically surveyed whether XML is offered in the "Gold"
Open Access journals of other major publishers nor whether the licence is
always permissive. Those people who argue that CC-NC-ND protects authors
(it doesn't) should realise that it has a massive negative impact on useful
re-use including mining.

Hybrid journals almost certainly do not offer XML. It's hard enough for
them to offer CC-BY for "Open Access".

It works less well for born-Closed publishers (such as Elsevier, NPG, ACS,
etc.). Rather than having the simple

And aggregations enable researchers to conveniently make large-scale
> pattern- and meta-analyses without first having to gather all the material
> from different and disparate sources.
>

Yes - we have built the apparatus to do this in contentmine.org


> Few 'green' repositories that I'm aware of have XML-versions (correct me
> if I'm wrong – and should I be wrong, is there a list of such
> repositories?). Aggregations, by the way, cannot be made without clarity
> about rights and licences, since they are a form of re-use. Those rights
> are clear, and properly included in metadata, for proper 'gold', but often
> not for 'green' versions of paywalled articles in repositories.
>

Exactly. Most "Green" repositories make it very hard to re-use material.
This is primarily due to copyright - the default library approach is to say
"this may be copyright and you cannot use it unless you write to the author
and get permission in writing with real ink". Then there is the technology.
University repositories are constructed on the basis that each document is
a priceless artefact that scholars will spend hours discovering and
reading. The reality of science is that most of these documents will
probably only be read by machines. Some counties (NL, FR for example) at
least aggregate some documents - such as theses - and the UK has CORE to
try to remedy the situation, but even so it's extremely difficult to index
and search repositories.

I wrote to Bernard Rentier offering to index his repository for scientific
terms but was told - sadly - that there was a new phase of investment
required before this would be possible.

Another problem with most repositories is that they insist on transforming
DOCX or LaTeX into PDF. Even for their own theses. This is an act of
barbarism. PDF has no semantics and it destroys about 50-75% of the science
in the document.

Anyway we expect to announce our own Open indexing of the literature RSN.


-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal


[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-12 Thread Marcin Wojnarski

Hi Serge,

We're working on this. Paperity started as a non-profit academic 
project, but yes, we need to develop a business model to make it 
sustainable and to achieve the goal of 100% OA aggregated. Most likely 
we'll expect participating journals to support our services, which we 
think is a fair solution when many of them charge APCs and we actually 
help them do their job (dissemination). We're aware however that there 
are also many small non-profit journals which don't charge APC at all, 
and we definitely want to aggregate them all, too. So the details are 
still to be sorted out, but I'm confident that over time we'll come up 
with a good solution: one that's fair, efficient and acceptable for 
everybody. Of course, there are also more traditional solutions that 
we'll investigate, like adverts.


Cheers
Marcin


On 10/11/2014 09:07 PM, BAUIN Serge wrote:

Marcin,

May I ask "what is the economic model of Paperity?"
I didn't find any information about that on your web site.

Cheers

Serge

Envoyé d'un téléphone portable, désolé pour le caractère inélégant...

Le 10 oct. 2014 à 08:22, "Marcin Wojnarski" > a écrit :



Jeroen,

Thanks, it's great to hear that you like Paperity!

"True peer-reviewed" means published in a peer-reviewed journal, in 
contrast to a pdf just posted somewhere on the web (think Google 
Scholar), which can be anything: a peer-reviewed paper or not, 
published or not, even randomly generated to resemble a scholarly 
article, for example to pump up G Scholar citations 
(http://arxiv.org/abs/1212.0638).


The new technology is called REgular Document EXpressions (redex). It 
is a computer language for analyzing long and complex documents, 
particularly written in a markup, like HTML or XML. It facilitates 
analysis of web context where the paper occured, which is critical 
for maintaining the link between the paper and its journal. Redex 
builds on top of the very fundamental technology of regular 
expressions (regex), but redefines the language entirely to make it 
suitable for large structured texts.


Best,
Marcin

On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:


Marcin,

This is a great initiative. I had been hoping BASEsearch would take 
on this task, but it is good to see others are stepping in.


Congrats on the initiative. Still, a long way to go

Could you elaborate on how your technology is able to recognize 
“true peer reviewed papers” and what you consider to be “ true peer 
reviewed papers”?


Best,

Jeroen Bosman

@jeroenbosman

Utrecht University Library

*From:*goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] 
*On Behalf Of *Marcin Wojnarski

*Sent:* donderdag 9 oktober 2014 14:51
*To:* Global Open Access List (Successor of AmSci)
*Subject:* [GOAL] Paperity launched. The 1st multidisciplinary 
aggregator of OA journals & papers


(press release, apologies for cross-posting)

*With the beginning of the new academic year, Paperity 
, the first multidisciplinary aggregator of 
Open Access journals and papers, has been launched. Paperity will 
connect authors with readers, boost dissemination of new discoveries 
and consolidate academia around open literature.*


Right now, Paperity  (http://paperity.org/) 
includes over 160,000 open articles, "gold" and "hybrid", from 2,000 
scholarly journals, and growing. The goal of the team is to cover - 
with the support of journal editors and publishers - 100% of Open 
Access literature in 3 years from now. In order to achieve this, 
Paperity utilizes an original technology for article indexing, 
designed by Marcin Wojnarski, a data geek from Poland and a medalist 
of the International Mathematical Olympiad. This technology indexes 
only true peer-reviewed scholarly papers and filters out irrelevant 
entries, which easily make it into other aggregators and search engines.


The amount of scholarly literature has grown enormously in the last 
decades. Successful dissemination became a big issue. New tools are 
needed to help readers access vast amounts of literature dispersed 
all over the web and to help authors reach their target audience. 
Moreover, research is interdisciplinary now and scholars need broad 
access to literature from many fields, also from outside of their 
core research area. This is the reason why Paperity covers all 
subjects, from Sciences, Technology, Medicine, through Social 
Sciences, to Humanities and Arts.


- /There are lots of great articles out there which report new 
significant findings, yet attract no attention, only because they 
are hard to find. No more than top 10% of research institutions have 
good access to communication channels and can share their findings 
efficiently. The remaining 90%, especially authors from developing 
countries and early-career researchers, start from a much lower 
stand and often stay unnoticed despite high quality of their work/ – 
says Wojnarski. He adds that it is not by 

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-12 Thread Marcin Wojnarski

Thanks Dana. On our to-do list. :)
Marcin

On 10/12/2014 03:08 AM, Dana Roth wrote:
It would be nice if 'Paperity' would maintain a listing of the 
publishers of the journals they index.
T-R does this for Web of Science Journal Citation Reports, and it is 
very helpful.


Dana L. Roth
Millikan Library / Caltech 1-32
1200 E. California Blvd. Pasadena, CA 91125
626-395-6423 fax 626-792-7540
dzr...@library.caltech.edu <mailto:dzr...@library.caltech.edu>
http://library.caltech.edu/collections/chemistry.htm

*From:* goal-boun...@eprints.org [goal-boun...@eprints.org] on behalf 
of BAUIN Serge [serge.ba...@cnrs.fr]

*Sent:* Saturday, October 11, 2014 12:07 PM
*To:* Global Open Access List (Successor of AmSci)
*Subject:* [GOAL] Re: Paperity launched. The 1st multidisciplinary 
aggregator of OA journals & papers


Marcin,

May I ask "what is the economic model of Paperity?"
I didn't find any information about that on your web site.

Cheers

Serge

Envoyé d'un téléphone portable, désolé pour le caractère inélégant...

Le 10 oct. 2014 à 08:22, "Marcin Wojnarski" <mailto:mwojn...@ns.onet.pl>> a écrit :



Jeroen,

Thanks, it's great to hear that you like Paperity!

"True peer-reviewed" means published in a peer-reviewed journal, in 
contrast to a pdf just posted somewhere on the web (think Google 
Scholar), which can be anything: a peer-reviewed paper or not, 
published or not, even randomly generated to resemble a scholarly 
article, for example to pump up G Scholar citations 
(http://arxiv.org/abs/1212.0638).


The new technology is called REgular Document EXpressions (redex). It 
is a computer language for analyzing long and complex documents, 
particularly written in a markup, like HTML or XML. It facilitates 
analysis of web context where the paper occured, which is critical 
for maintaining the link between the paper and its journal. Redex 
builds on top of the very fundamental technology of regular 
expressions (regex), but redefines the language entirely to make it 
suitable for large structured texts.


Best,
Marcin

On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:


Marcin,

This is a great initiative. I had been hoping BASEsearch would take 
on this task, but it is good to see others are stepping in.


Congrats on the initiative. Still, a long way to go

Could you elaborate on how your technology is able to recognize 
“true peer reviewed papers” and what you consider to be “ true peer 
reviewed papers”?


Best,

Jeroen Bosman

@jeroenbosman

Utrecht University Library

*From:*goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] 
*On Behalf Of *Marcin Wojnarski

*Sent:* donderdag 9 oktober 2014 14:51
*To:* Global Open Access List (Successor of AmSci)
*Subject:* [GOAL] Paperity launched. The 1st multidisciplinary 
aggregator of OA journals & papers


(press release, apologies for cross-posting)

*With the beginning of the new academic year, Paperity 
<http://paperity.org>, the first multidisciplinary aggregator of 
Open Access journals and papers, has been launched. Paperity will 
connect authors with readers, boost dissemination of new discoveries 
and consolidate academia around open literature.*


Right now, Paperity <http://paperity.org> (http://paperity.org/) 
includes over 160,000 open articles, "gold" and "hybrid", from 2,000 
scholarly journals, and growing. The goal of the team is to cover - 
with the support of journal editors and publishers - 100% of Open 
Access literature in 3 years from now. In order to achieve this, 
Paperity utilizes an original technology for article indexing, 
designed by Marcin Wojnarski, a data geek from Poland and a medalist 
of the International Mathematical Olympiad. This technology indexes 
only true peer-reviewed scholarly papers and filters out irrelevant 
entries, which easily make it into other aggregators and search engines.


The amount of scholarly literature has grown enormously in the last 
decades. Successful dissemination became a big issue. New tools are 
needed to help readers access vast amounts of literature dispersed 
all over the web and to help authors reach their target audience. 
Moreover, research is interdisciplinary now and scholars need broad 
access to literature from many fields, also from outside of their 
core research area. This is the reason why Paperity covers all 
subjects, from Sciences, Technology, Medicine, through Social 
Sciences, to Humanities and Arts.


- /There are lots of great articles out there which report new 
significant findings, yet attract no attention, only because they 
are hard to find. No more than top 10% of research institutions have 
good access to communication channels and can share their findings 
efficiently. The remaining 90%, especially authors from developing 
countries and early-career researchers, start from a 

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-12 Thread Heather Morrison
Thank you for providing the information, Marcin. Since there is a subset of the 
open access community that demands blanket permissions for commercial rights 
downstream (a position I strongly disagree with), it is important to discuss 
what the potential commercial uses might be to determine whether these actually 
advance open access or scholarly knowledge or not.

Some comments on these options for Paperity:

In the subscriptions model, aggregators (such as EBSCO and ProQuest), typically 
pay journals to include their content, or in the case of open access journals, 
at least do not charge the journals. Charging journals to include them in an 
aggregated service changes a revenue stream to an expense stream for the 
journals. This makes it harder to find the revenue to produce journals; a 
barrier to publishing journals in the first place is not in the interests of 
advancing scholarly knowledge. 

Advertising is one of the potential revenue streams for open access journals 
(and one that some journals are currently using). If Paperity is using journal 
content to sell advertising, then Paperity could easily be competing with the 
journals for this revenue. 

It is lovely to hear of Paperity's good intentions starting out to be fair, 
efficient and acceptable for everyone. But what can happen with services like 
this down the road when there are bills to be paid, journals are less than keen 
to pay for this service and advertisers continue to prefer Google?

The following is addressed to my fellow open access advocates as this is a good 
discussion about open access downstream, and these comments are not intended to 
apply to Paperity:

If the purpose of insisting on re-use and commercial rights downstream is 
designed to facilitate the design of services such as Paperity, let's discuss 
these possibilities downstream that I argue are facilitated by CC-BY and/or 
CC-BY-SA licenses:

-   aggregator takes CC-BY content and develops a toll-access value-added 
service 

By way of illustration of this: Elsevier's Scopus claims to include 2,800 gold 
open access journals. Scopus is a subscription-based service. 

-   aggregator takes CC-BY content, initially develops an open access 
value-added search service, then sells the service to a for-profit company that 
changes the business model to toll access

By way of illustration of the sales aspect, consider that Elsevier bought 
Mendeley and Springer bought BioMedCentral. Both are still free services, but 
offered by largely subscription-based companies; why would we assume that they 
would never change the business model? 

-   aggregator follows the Paperity suggestion of charging journals, but 
with a twist: does not include journals that do not pay and/or returns results 
based on payments by journals (i.e. pay-to-play)

Are these models seen as desirable by advocates of requiring CC-BY and/or 
CC-BY-SA licenses? Are any of these scenarios aligned with the Budapest vision? 
If you agree that they are not, can you explain why you think these are 
unlikely or how the licenses would prevent this from happening? For example, 
perhaps someone can explain how it is that Elsevier is able to charge to direct 
people to OA journals through Scopus? 

A comment on SA: although Sharealike is the most copyleft of the CC license 
elements, it does not come with an obligation to share in the same way, rather 
an obligation to use the same license when including re-used content. One can 
take a work that is licensed SA and is freely available on the web and include 
it in a work that is limited in any of a variety of fashions (part of a 
presentation to an audience limited to those who are willing and able to pay to 
attend; a toll access work, etc.) - as long the work downstream uses the 
license. In other words, CC-BY-SA does not do as much to protect OA downstream 
as one might think.

best,

Heather Morrison


On 2014-10-12, at 3:20 PM, Marcin Wojnarski wrote:

> Hi Serge,
> 
> We're working on this. Paperity started as a non-profit academic project, but 
> yes, we need to develop a business model to make it sustainable and to 
> achieve the goal of 100% OA aggregated. Most likely we'll expect 
> participating journals to support our services, which we think is a fair 
> solution when many of them charge APCs and we actually help them do their job 
> (dissemination). We're aware however that there are also many small 
> non-profit journals which don't charge APC at all, and we definitely want to 
> aggregate them all, too. So the details are still to be sorted out, but I'm 
> confident that over time we'll come up with a good solution: one that's fair, 
> efficient and acceptable for everybody. Of course, there are also more 
> traditional solutions that we'll investigate, like adverts.
> 
> Cheers
> Marcin
> 
> 
> On 10/11/2014 09:07 PM, BAUIN Serge wrote:
>> Marcin,
>> 
>> May I ask "what is the economic model of Paperity?"
>> I didn't find any informati

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-12 Thread Marcin Wojnarski
Heather,
Thank you for this deep analysis. I don't feel like an expert on 
licensing issues so I will let others comment, but every new idea on how 
in general to fund academic services like Paperity is more than welcome. 
The individual who finally discovers a satisfactory solution should get 
a Nobel Prize at the very least.

Best
Marcin


On 10/12/2014 10:22 PM, Heather Morrison wrote:
> Thank you for providing the information, Marcin. Since there is a subset of 
> the open access community that demands blanket permissions for commercial 
> rights downstream (a position I strongly disagree with), it is important to 
> discuss what the potential commercial uses might be to determine whether 
> these actually advance open access or scholarly knowledge or not.
>
> Some comments on these options for Paperity:
>
> In the subscriptions model, aggregators (such as EBSCO and ProQuest), 
> typically pay journals to include their content, or in the case of open 
> access journals, at least do not charge the journals. Charging journals to 
> include them in an aggregated service changes a revenue stream to an expense 
> stream for the journals. This makes it harder to find the revenue to produce 
> journals; a barrier to publishing journals in the first place is not in the 
> interests of advancing scholarly knowledge.
>
> Advertising is one of the potential revenue streams for open access journals 
> (and one that some journals are currently using). If Paperity is using 
> journal content to sell advertising, then Paperity could easily be competing 
> with the journals for this revenue.
>
> It is lovely to hear of Paperity's good intentions starting out to be fair, 
> efficient and acceptable for everyone. But what can happen with services like 
> this down the road when there are bills to be paid, journals are less than 
> keen to pay for this service and advertisers continue to prefer Google?
>
> The following is addressed to my fellow open access advocates as this is a 
> good discussion about open access downstream, and these comments are not 
> intended to apply to Paperity:
>
> If the purpose of insisting on re-use and commercial rights downstream is 
> designed to facilitate the design of services such as Paperity, let's discuss 
> these possibilities downstream that I argue are facilitated by CC-BY and/or 
> CC-BY-SA licenses:
>
> - aggregator takes CC-BY content and develops a toll-access value-added 
> service
>
> By way of illustration of this: Elsevier's Scopus claims to include 2,800 
> gold open access journals. Scopus is a subscription-based service.
>
> - aggregator takes CC-BY content, initially develops an open access 
> value-added search service, then sells the service to a for-profit company 
> that changes the business model to toll access
>
> By way of illustration of the sales aspect, consider that Elsevier bought 
> Mendeley and Springer bought BioMedCentral. Both are still free services, but 
> offered by largely subscription-based companies; why would we assume that 
> they would never change the business model?
>
> - aggregator follows the Paperity suggestion of charging journals, but 
> with a twist: does not include journals that do not pay and/or returns 
> results based on payments by journals (i.e. pay-to-play)
>
> Are these models seen as desirable by advocates of requiring CC-BY and/or 
> CC-BY-SA licenses? Are any of these scenarios aligned with the Budapest 
> vision? If you agree that they are not, can you explain why you think these 
> are unlikely or how the licenses would prevent this from happening? For 
> example, perhaps someone can explain how it is that Elsevier is able to 
> charge to direct people to OA journals through Scopus?
>
> A comment on SA: although Sharealike is the most copyleft of the CC license 
> elements, it does not come with an obligation to share in the same way, 
> rather an obligation to use the same license when including re-used content. 
> One can take a work that is licensed SA and is freely available on the web 
> and include it in a work that is limited in any of a variety of fashions 
> (part of a presentation to an audience limited to those who are willing and 
> able to pay to attend; a toll access work, etc.) - as long the work 
> downstream uses the license. In other words, CC-BY-SA does not do as much to 
> protect OA downstream as one might think.
>
> best,
>
> Heather Morrison
>
>
> On 2014-10-12, at 3:20 PM, Marcin Wojnarski wrote:
>
>> Hi Serge,
>>
>> We're working on this. Paperity started as a non-profit academic project, 
>> but yes, we need to develop a business model to make it sustainable and to 
>> achieve the goal of 100% OA aggregated. Most likely we'll expect 
>> participating journals to support our services, which we think is a fair 
>> solution when many of them charge APCs and we actually help them do their 
>> job (dissemination). We're aware however that there are also many small 
>> non-profit journals 

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-12 Thread Marcin Wojnarski

Dear Stevan,

We started with Gold, because we believe that journals play a 
fundamental role in the system of scholarly communication and every 
service that tries to facilitate access to literature must start with 
journals, not only with a flat collection of papers like the one found 
in repositories. For 400 years, journals have been the backbone of the 
system, the main structural element. They provide a brand name for 
papers, create consistent editoral policy and take responsibility for 
the quality and relevance of articles they publish - these features are 
of topmost importance for readers, without them navigating through 
millions of articles becomes infeasible.


That said, we're fully aware how much great unique content there is in 
repositories and we'd like very much to merge these two streams - Gold 
and Green - in Paperity at some point. Although there are some tensions 
inside OA community between the Gold and Green camps, I think they are 
unjustified, because these routes are complementary, not competitive. As 
to indexing, it is actually much easier to be done for repositories than 
for journals, because most repos expose standardized interfaces. So we 
don't need Google Scholar for this purpose, only as I said, we believe 
that the right order is journals first.


Best
Marcin


On 10/12/2014 01:51 PM, Stevan Harnad wrote:
Harvesting Gold OA journal articles is a piece of cake. How will 
Paperity/redex harvest
Green OA articles published in non-OA journals but made OA somewhere 
on the

Web — via Google Scholar?

Sounds like a splendid idea if it can be done… But not if it is just 
Gold-biassed,
because most refereed research is not Gold, and the fastest growing 
form of

OA is Green (because of mandates, and absence of extra cost).

SH




--
Marcin Wojnarski, Founder of Paperity, www.paperity.org
www.linkedin.com/in/marcinwojnarski
www.facebook.com/Paperity
www.twitter.com/Paperity

Paperity. Open science aggregated.

___
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal


[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-13 Thread Stevan Harnad
On Oct 12, 2014, at 4:50 PM, Marcin Wojnarski  wrote:

> Dear Stevan,
> We started with Gold, because we believe that journals play a fundamental 
> role in the system
> of scholarly communication and every service that tries to facilitate access 
> to literature must
> start with journals, not only with a flat collection of papers like the one 
> found in repositories.

Dear Marcin,

I think there may be a fundamental misunderstanding here.

Green OA consists of self-archived journal articles and their bibliographic 
metadata — including
journal name.

And institutional repositories consist of an institution’s journal article 
output.

Nothing “flat” about those!

Were you perhaps thinking that repositories just contain unpublished preprints 
and gray
literature?

> For 400 years, journals have been the backbone of the system, the main 
> structural element.

I don’t understand why you are pointing this out: From the very outset the Open 
Access movement 
has been very specifically about opening access to journal articles. Please see 
the original BOAI statement:
http://www.budapestopenaccessinitiative.org/read

"The literature that should be freely accessible online is that which scholars 
give to the world without expectation of payment. Primarily, this category 
encompasses their peer-reviewed journal articles…"

> They provide a brand name for papers, create consistent editoral policy and 
> take responsibility
> for the quality and relevance of articles they publish - these features are 
> of topmost importance
> for readers, without them navigating through millions of articles becomes 
> infeasible.

Marcin, it remains clear why you are telling us this. We all know it. What I 
asked you was:

>> Harvesting Gold OA journal articles is a piece of cake. How will 
>> Paperity/redex harvest

>> Green OA articles published in non-OA journals but made OA somewhere on the

>> Web 

> That said, we're fully aware how much great unique content there is in 
> repositories and we’d
> like very much to merge these two streams - Gold and Green - in Paperity at 
> some point.

The great unique content in repositories is the very same great unique content 
that there is in journals.
Gold OA and Green OA both consist of journal articles. There are many more 
non-Gold journals
and non-Gold journal-articles than Gold ones. 

Why is Paperity focusing on Gold?

Why is all the rest only to be merged "at some point”?

And how, exactly?

> Although there are some tensions inside OA community between the Gold and 
> Green camps,
> I think they are unjustified, because these routes are complementary, not 
> competitive.

You are quite right, the two roads to OA are complementary, not competitive.

But in order to complement one another they must both be clearly understood, 
and much
of the tension is about misunderstandings, for example, that OA = Gold OA while 
Green OA
is about something else (preprints, gray literature).

And another point of tension is about priorities: Which needs to come first, 
Gold or Green?

(My own reply is that it is for many important reasons Green that must come 
first: (1) because 
Green does not cost the author money, (2) because Green  can be mandated by 
institutions and 
funders, and (3) because by coming first Green will make subscriptions 
unsustainable, force
journals to cut obsolete costs, downsize to providing peer review alone, and 
convert to
to affordable, sustainable, Fair Gold instead of today’s over-priced, 
double-paid pre-Green Fools Gold.
http://j.mp/fairgoldOA

> As to indexing, it is actually much easier to be done for repositories than 
> for journals,
> because most repos expose standardized interfaces.

Then why is Paperity starting with Gold OA journal articles instead of Green OA 
journal
articles in repositories?

> So we don't need Google Scholar for this purpose, only as I said, we believe 
> that the
> right order is journals first.

What you have said it that you believe the right order is Gold OA first, but 
you have
certainly not explained why — apart from the fact that Gold OA is certainly much
easier to access and aggregate:

Gold OA journal article blibliographic data can be harvested from the journals’
websites using DOAJ to identify all the journals.

But how are you going to find all the Green OA journal articles, if not with
Google Scholar? (WoS or SCOPUS can find you all journal articles, but
but won’t tell you which ones are Green OA.)

(BASE provides some of these data; ROAR 2.0 will soon provide it all.)

Best wishes,
Stevan

> 
> Best
> Marcin
> 
> 
> On 10/12/2014 01:51 PM, Stevan Harnad wrote:
>> Harvesting Gold OA journal articles is a piece of cake. How will 
>> Paperity/redex harvest
>> Green OA articles published in non-OA journals but made OA somewhere on the
>> Web — via Google Scholar?
>> 
>> Sounds like a splendid idea if it can be done… But not if it is just 
>> Gold-biassed,
>> because most refereed research is not Gold, and the fastest gr

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-13 Thread BAUIN Serge
Many thanks, indeed

Your answer is clear, and I wish you success

Cheers
Serge

De : goal-boun...@eprints.org [mailto:goal-boun...@eprints.org] De la part de 
Marcin Wojnarski
Envoyé : dimanche 12 octobre 2014 21:20
À : Global Open Access List (Successor of AmSci)
Objet : [GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of 
OA journals & papers

Hi Serge,

We're working on this. Paperity started as a non-profit academic project, but 
yes, we need to develop a business model to make it sustainable and to achieve 
the goal of 100% OA aggregated. Most likely we'll expect participating journals 
to support our services, which we think is a fair solution when many of them 
charge APCs and we actually help them do their job (dissemination). We're aware 
however that there are also many small non-profit journals which don't charge 
APC at all, and we definitely want to aggregate them all, too. So the details 
are still to be sorted out, but I'm confident that over time we'll come up with 
a good solution: one that's fair, efficient and acceptable for everybody. Of 
course, there are also more traditional solutions that we'll investigate, like 
adverts.

Cheers
Marcin

On 10/11/2014 09:07 PM, BAUIN Serge wrote:
Marcin,

May I ask "what is the economic model of Paperity?"
I didn't find any information about that on your web site.

Cheers

Serge

Envoyé d'un téléphone portable, désolé pour le caractère inélégant...

Le 10 oct. 2014 à 08:22, "Marcin Wojnarski" 
mailto:mwojn...@ns.onet.pl>> a écrit :
Jeroen,

Thanks, it's great to hear that you like Paperity!

"True peer-reviewed" means published in a peer-reviewed journal, in contrast to 
a pdf just posted somewhere on the web (think Google Scholar), which can be 
anything: a peer-reviewed paper or not, published or not, even randomly 
generated to resemble a scholarly article, for example to pump up G Scholar 
citations (http://arxiv.org/abs/1212.0638).

The new technology is called REgular Document EXpressions (redex). It is a 
computer language for analyzing long and complex documents, particularly 
written in a markup, like HTML or XML. It facilitates analysis of web context 
where the paper occured, which is critical for maintaining the link between the 
paper and its journal. Redex builds on top of the very fundamental technology 
of regular expressions (regex), but redefines the language entirely to make it 
suitable for large structured texts.

Best,
Marcin
On 10/09/2014 05:02 PM, Bosman, J.M. (Jeroen) wrote:
Marcin,

This is a great initiative. I had been hoping BASEsearch would take on this 
task, but it is good to see others are stepping in.

Congrats on the initiative. Still, a long way to go

Could you elaborate on how your technology is able to recognize "true peer 
reviewed papers" and what you consider to be " true peer reviewed papers"?

Best,
Jeroen Bosman
@jeroenbosman
Utrecht University Library
From: goal-boun...@eprints.org<mailto:goal-boun...@eprints.org> 
[mailto:goal-boun...@eprints.org] On Behalf Of Marcin Wojnarski
Sent: donderdag 9 oktober 2014 14:51
To: Global Open Access List (Successor of AmSci)
Subject: [GOAL] Paperity launched. The 1st multidisciplinary aggregator of OA 
journals & papers

(press release, apologies for cross-posting)

With the beginning of the new academic year, Paperity<http://paperity.org>, the 
first multidisciplinary aggregator of Open Access journals and papers, has been 
launched. Paperity will connect authors with readers, boost dissemination of 
new discoveries and consolidate academia around open literature.

Right now, Paperity<http://paperity.org> (http://paperity.org/) includes over 
160,000 open articles, "gold" and "hybrid", from 2,000 scholarly journals, and 
growing. The goal of the team is to cover - with the support of journal editors 
and publishers - 100% of Open Access literature in 3 years from now. In order 
to achieve this, Paperity utilizes an original technology for article indexing, 
designed by Marcin Wojnarski, a data geek from Poland and a medalist of the 
International Mathematical Olympiad. This technology indexes only true 
peer-reviewed scholarly papers and filters out irrelevant entries, which easily 
make it into other aggregators and search engines.

The amount of scholarly literature has grown enormously in the last decades. 
Successful dissemination became a big issue. New tools are needed to help 
readers access vast amounts of literature dispersed all over the web and to 
help authors reach their target audience. Moreover, research is 
interdisciplinary now and scholars need broad access to literature from many 
fields, also from outside of their core research area. This is the reason why 
Paperity covers all subjects, from Sciences, Technology, Medicine, through 
Social Sciences, to Humanities and Arts

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-13 Thread Marcin Wojnarski

Stevan,

Repositories are not an authoritative source of metadata about 
paper-journal relation. Metadata is put there by authors themselves and 
it can be missing, incomplete or erroneous, in extreme cases even fake. 
Thus in practice repository collections are flat even if metadata is 
present.


If you think that finding Green articles is impossible, then you shall 
not be surprised that we focus on Gold first, right?


Best
Marcin


On 10/13/2014 02:14 PM, Stevan Harnad wrote:
On Oct 12, 2014, at 4:50 PM, Marcin Wojnarski > wrote:



Dear Stevan,
We started with Gold, because we believe that journals play a 
fundamental role in the system
of scholarly communication and every service that tries to facilitate 
access to literature must
start with journals, not only with a flat collection of papers like 
the one found in repositories.


Dear Marcin,

I think there may be a fundamental misunderstanding here.

Green OA consists of self-archived *journal articles* and their 
bibliographic metadata — including

journal name.

And institutional repositories consist of an institution’s *journal 
article* output.


Nothing “flat” about those!

Were you perhaps thinking that repositories just contain unpublished 
preprints and gray

literature?

For 400 years, journals have been the backbone of the system, the 
main structural element.


I don’t understand why you are pointing this out: From the very outset 
the Open Access movement
has been very specifically about opening access to *journal articles*. 
Please see the original BOAI statement:

http://www.budapestopenaccessinitiative.org/read

/"The literature that should be freely accessible online is that
which scholars /
/give to the world without expectation of payment. Primarily, this
category /
/encompasses their *peer-reviewed journal articles*…"/


They provide a brand name for papers, create consistent editoral 
policy and take responsibility
for the quality and relevance of articles they publish - these 
features are of topmost importance
for readers, without them navigating through millions of articles 
becomes infeasible.


Marcin, it remains clear why you are telling us this. We all know it. 
What I asked you was:



Harvesting Gold OA journal articles is a piece of cake. How will
Paperity/redex harvest
*Green OA articles published in non-OA journals* but made OA
somewhere on the
Web


That said, we're fully aware how much great unique content there is 
in repositories and we’d
like very much to merge these two streams - Gold and Green - in 
Paperity at some point.


The great unique content in repositories is the very same great unique 
content that there is in journals.
Gold OA and Green OA both consist of *journal articles*. There are 
many more non-Gold journals

and non-Gold journal-articles than Gold ones.

Why is Paperity focusing on Gold?

Why is all the rest only to be merged "at some point”?

And how, exactly?

Although there are some tensions inside OA community between the Gold 
and Green camps,
I think they are unjustified, because these routes are complementary, 
not competitive.


You are quite right, the two roads to OA are complementary, not 
competitive.


But in order to complement one another they must both be clearly 
understood, and much
of the tension is about misunderstandings, for example, that OA = Gold 
OA while Green OA

is about something else (preprints, gray literature).

And another point of tension is about priorities: Which needs to come 
first, Gold or Green?


(My own reply is that it is for many important reasons Green that must 
come first: (1) because
Green does not cost the author money, (2) because Green  can be 
mandated by institutions and
funders, and (3) because by coming first Green will make subscriptions 
unsustainable, force
journals to cut obsolete costs, downsize to providing peer review 
alone, and convert to
to affordable, sustainable, Fair Gold instead of today’s over-priced, 
double-paid pre-Green Fools Gold.

http://j.mp/fairgoldOA

As to indexing, it is actually much easier to be done for 
repositories than for journals,

because most repos expose standardized interfaces.


Then why is Paperity starting with Gold OA journal articles instead of 
Green OA journal

articles in repositories?

So we don't need Google Scholar for this purpose, only as I said, we 
believe that the

right order is journals first.


What you have said it that you believe the right order is Gold OA 
first, but you have
certainly not explained why — apart from the fact that Gold OA is 
certainly much

/easier/ to access and aggregate:

Gold OA journal article blibliographic data can be harvested from the 
journals’

websites using DOAJ to identify all the journals.

But how are you going to find all the Green OA journal articles, if 
not with

Google Scholar? (WoS or SCOPUS can find you all journal articles, but
but won’t tell you which ones are Green

[GOAL] Re: Paperity launched. The 1st multidisciplinary aggregator of OA journals & papers

2014-10-13 Thread Stevan Harnad
On Oct 13, 2014, at 1:06 PM, Marcin Wojnarski  wrote:

> Repositories are not an authoritative source of metadata about paper-journal 
> relation.
> Metadata is put there by authors themselves and it can be missing, incomplete 
> or
> erroneous, in extreme cases even fake. Thus in practice repository 
> collections are
> flat even if metadata is present.

Are you looking for “authoritative metadata” or metadata of OA journal articles?

The majority of OA journal articles are Green, not Gold. Focussing on the Gold
because it is more “authoritative” calls to mind the joke about the drunkard who
prefers to keep looking for his keys by the lamp-post because it is brighter 
there.

> If you think that finding Green articles is impossible, then you shall not be 
> surprised that
> we focus on Gold first, right?

I certainly did not say it was impossible! (We do it all the time! So does 
Google Scholar.) 
I only said it was not as easy as it is to just go to DOAJ journal websites 
(the lamp-post)
for only the Gold.

And I think the preoccupation with “authoritative” sources of metadata is 
monumentally
misplaced. (In fact, the notion of “aggregation” is probably obsolescent too): 
we have journal
articles all over the web, and all that’s needed is a way to find them. Google 
Scholar’s
pretty good, and can potentially be made even better. But what’s missing now is 
not
a better harvester or more “authoritative” metadata, but more OA articles 
(whether
Gold or Green). Only about 30% of journal articles published today are OA (the 
majority 
of it Green). The fastest and surest (and cheapest) way to provide the 
remaining 70% is 
to mandate and provide Green.

Stevan Harnad

> On 10/13/2014 02:14 PM, Stevan Harnad wrote:
>> On Oct 12, 2014, at 4:50 PM, Marcin Wojnarski  
>> wrote:
>> 
>>> Dear Stevan,
>>> We started with Gold, because we believe that journals play a fundamental 
>>> role in the system
>>> of scholarly communication and every service that tries to facilitate 
>>> access to literature must
>>> start with journals, not only with a flat collection of papers like the one 
>>> found in repositories.
>> 
>> Dear Marcin,
>> 
>> I think there may be a fundamental misunderstanding here.
>> 
>> Green OA consists of self-archived journal articles and their bibliographic 
>> metadata — including
>> journal name.
>> 
>> And institutional repositories consist of an institution’s journal article 
>> output.
>> 
>> Nothing “flat” about those!
>> 
>> Were you perhaps thinking that repositories just contain unpublished 
>> preprints and gray
>> literature?
>> 
>>> For 400 years, journals have been the backbone of the system, the main 
>>> structural element.
>> 
>> I don’t understand why you are pointing this out: From the very outset the 
>> Open Access movement 
>> has been very specifically about opening access to journal articles. Please 
>> see the original BOAI statement:
>> http://www.budapestopenaccessinitiative.org/read
>> 
>> "The literature that should be freely accessible online is that which 
>> scholars 
>> give to the world without expectation of payment. Primarily, this category 
>> encompasses their peer-reviewed journal articles…"
>> 
>>> They provide a brand name for papers, create consistent editoral policy and 
>>> take responsibility
>>> for the quality and relevance of articles they publish - these features are 
>>> of topmost importance
>>> for readers, without them navigating through millions of articles becomes 
>>> infeasible.
>> 
>> Marcin, it remains clear why you are telling us this. We all know it. What I 
>> asked you was:
>> 
 Harvesting Gold OA journal articles is a piece of cake. How will 
 Paperity/redex harvest
>> 
 Green OA articles published in non-OA journals but made OA somewhere on the
>> 
 Web 
>> 
>>> That said, we're fully aware how much great unique content there is in 
>>> repositories and we’d
>>> like very much to merge these two streams - Gold and Green - in Paperity at 
>>> some point.
>> 
>> The great unique content in repositories is the very same great unique 
>> content that there is in journals.
>> Gold OA and Green OA both consist of journal articles. There are many more 
>> non-Gold journals
>> and non-Gold journal-articles than Gold ones. 
>> 
>> Why is Paperity focusing on Gold?
>> 
>> Why is all the rest only to be merged "at some point”?
>> 
>> And how, exactly?
>> 
>>> Although there are some tensions inside OA community between the Gold and 
>>> Green camps,
>>> I think they are unjustified, because these routes are complementary, not 
>>> competitive.
>> 
>> You are quite right, the two roads to OA are complementary, not competitive.
>> 
>> But in order to complement one another they must both be clearly understood, 
>> and much
>> of the tension is about misunderstandings, for example, that OA = Gold OA 
>> while Green OA
>> is about something else (preprints, gray literature).
>> 
>> And another point of tension is about priorities: Whi