Hi All, Anurag from Google Scholar is tracking this ongoing discussion. He's asked me to forward on this response from him.
----------------------------------------------------------------------------------------------------------------------------------------------- Hi: I had reached out to Tim about DSpace support for PDF cover pages. I would like to thank him for initiating this discussion. As I had mentioned in my talk, cover pages impact the effectiveness of all downstream automated systems that analyze PDF articles. This includes search services as well as personal collection managers such as Zotero, Mendeley, Papers etc that are widely used. Given the diversity of layouts and structure, automated analysis & metadata extraction from PDF articles is always a challenge. This is even more of a challenge if you consider that indexing systems need to run their algorithms over not just documents in specific repositories but all documents on the web. Handling of PDF articles with cover pages is far more error-prone than that for original PDF. What happens to work today may no longer be able to work as algorithms are updated to handle the expanding diversity of layouts. We have seen this happen for several repositories :( The concern that is most frequently mentioned as a reason for keeping cover pages is that the original PDF may not have sufficient/suitable information about where/how it was published. Which would keep it from being cited/referred to. However, pretty much no one tracks or cites articles or references by looking at the article by hand. For new articles, researchers use referencing tools like EndNote, bibtex etc and save structured bibliographic info from the article source. Publisher sites, repositories, A&Is, search services, all provide structured references in multiple formats. For older collections, the common approach, by far, is to use collection managers like Zotero, Mendeley, Papers. All of these depend on being able to analyze the PDF to automate the management. Making original PDF available as-is, with no changes, helps collection managers in the same way as it helps search services. Based on my experience, I would recommend disabling automated cover pages - either front or back. The potential upside is no longer significant for most researchers. The potential downside, however, is pretty large -- impacting all downstream automated systems that make the research process easier. I would like to thank the community for your consideration. cheers, anurag ------------------------------------------------------------------------------ _______________________________________________ Dspace-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-general
