Hi Andrea:
The case you cite is not as obvious to me: how can we assume that the single
PDF is the primary artifact (i.e. the one that the rest of the GS tags
describe)?
We have cases where (in an Item) the article is in Word, or LaTeX, and a
supplementary file is a PDF. In those cases the rule you propose would ask GS
to index the wrong bitstream.
Because of cases like these, we deliberately enshrined the most conservative
rule possible (if there is only one bitstream *and* it's a PDF) - since scholar
asked us to value accuracy over completeness.
But it is absolutely right that the rule can be too restrictive in many ways.
We kicked around (but didn't have time to implement for the 1st release) the
notion of a site-specific, user-configurable 'map' function or functions, that
would yield 0 or 1 bitstreams per item. The idea is that if there *is* a
consistent 'pattern' (like the one you mention), the page could dynamically
determine the value of the citation_pdf_url by calling the function. Design
questions include:
* should there be a site-wide mapping rule, or one per collection (per format
type, etc)?
* probably should be be a default (maybe just the current hard-coded one) - so
that we don't force additional configuration
* how should the rule be expressed?
* how to limit runtime penalties
etc.
I can probably dig up some notes on this if there is interest in that approach.
My 2 cents,
Richard
On Jan 13, 2013, at 11:38 PM, Andrea Schweer wrote:
Hi all,
I just discovered that DSpace (XMLUI, 1.8.2 but 3.0 has the same behaviour)
generates the citation_pdf_url header for Google Scholar on an item page if and
only if
* the item has exactly one bitstream in the ORIGINAL bundle (or the first
such bundle, to be precise); and
* this bitstream is of type application/pdf
Code in master here:
https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/app/util/GoogleMetadata.java#L1007
I found old discussion around this in Jira here:
https://jira.duraspace.org/browse/DS-396?focusedCommentId=17461&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17461
that explains what I assume is still the reasoning behind the current
explanation:
How does one choose, for instance, a.) which PDF in an item with multiple PDF
bitstreams, b.) what is specified for a URL when there is no PDF for an item,
c.) whether or not to specify a PDF if the only PDF available is not the main
representative bitstream of the item. Google Scholar has said they are not
interested in having citation tags for an item if this field is not provided
for.
I find this a bit counter-intuitive especially in the case of items with one
PDF file plus one more more files in a different format -- surely there it
should be fine to use the single PDF file in the citation_pdf_url? Are there
any other opinions around this?
cheers,
Andrea
--
Dr Andrea Schweer
IRR Technical Specialist, ITS Information Systems
The University of Waikato, Hamilton, New Zealand
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET<http://ASP.NET>, C# 2012, HTML5,
CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette