Hi All,

Anurag from Google Scholar is tracking this ongoing discussion. He's 
asked me to forward on this response from him.

-----------------------------------------------------------------------------------------------------------------------------------------------

Hi: I had reached out to Tim about DSpace support for PDF cover pages. I 
would like to thank him for initiating this discussion.

As I had mentioned in my talk, cover pages impact the effectiveness of 
all downstream automated systems that analyze PDF articles. This 
includes search services as well as personal collection managers such as 
Zotero, Mendeley, Papers etc that are widely used.

Given the diversity of layouts and structure, automated analysis & 
metadata extraction from PDF articles is always a challenge. This is 
even more of a challenge if you consider that indexing systems need to 
run their algorithms over not just documents in specific repositories 
but all documents on the web.

Handling of PDF articles with cover pages is far more error-prone than 
that for original PDF. What happens to work today may no longer be able 
to work as algorithms are updated to handle the expanding diversity of 
layouts. We have seen this happen for several repositories :(

The concern that is most frequently mentioned as a reason for keeping 
cover pages is that the original PDF may not have sufficient/suitable 
information about where/how it was published. Which would keep it from 
being cited/referred to. However, pretty much no one tracks or cites 
articles or references by looking at the article by hand. For new 
articles, researchers use referencing tools like EndNote, bibtex etc and 
save structured bibliographic info from the article source.  Publisher 
sites, repositories, A&Is, search services, all provide structured 
references in multiple formats. For older collections, the common 
approach, by far, is to use collection managers like Zotero, Mendeley, 
Papers. All of these depend on being able to analyze the PDF to automate 
the management. Making original PDF available as-is, with no changes, 
helps collection managers in the same way as it helps search services.

Based on my experience, I would recommend disabling automated cover 
pages - either front or back. The potential upside is no longer 
significant for most researchers. The potential downside, however, is 
pretty large -- impacting all downstream automated systems that make the 
research process easier.

I would like to thank the community for your consideration.

cheers,
anurag

------------------------------------------------------------------------------
_______________________________________________
Dspace-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-general

Reply via email to