Extracting a DOI from a PDF is never fool proof, and moreover some  
PDFs (scanned PDFs) don't contain text, only images.

BibDesk gets the bibliography information from NCBI using their  
documented query methods, but this apparently drops diacritics. It  
seems that they also provide an XML format that retains the  
diacritics, but we don't know yet what syntax they use. Documentation  
about that is extremely sparse and unreadable. So perhaps in the  
future we will be able to use that instead and import diacritics.

Christiaan

On 30 Apr 2009, at 9:44 AM, Grant Jacobs wrote:

> Synopsis: I am looking for a way of obtaining the title, name, etc.
> from PDFs that retains the original diacritics in the names, titles,
> etc.
>
>
> I hope there is a simple solution to this that I have overlooked.
>
>
> Background:
>
> You can create new bibliographic entries in BibDesk by dragging PDF
> files of articles (scientific papers in my case) to the main window.
> I presume what happens is that BibDesk extracts the DOI from the file
> and uses this to obtain the information (authors, title, abstract,
> etc.) from the internet. This is an excellent feature, even though it
> isn't foolproof: it sometimes seems to simply fail despite there
> being a DOI in the article.
>
>
> Problem:
>
> However there is a catch! Despite BibDesk being able to handle
> diacritics (the accents or cedilla added to letters in some languages
> to indicate pronunciation differences), these are "dropped" somewhere
> along the way and the resulting bibliographic entries lack them.
>
>
> A little testing:
>
> This seems to apply to all articles. I've tried different journals,
> and it's always the same, no diacritics.
>
> The articles at Pubmed or the original sources the DOIs point to have
> the diacritics in the author's names, etc., despite that the the
> downloaded information obtained from the DOI has stripped them out.
>
>
> Queries:
>
> Is it that once the DOI information is obtained, the characters are
> "reduced" to their "plain" ASCII equivalents?
>
> Is there some option or something that I need to set to enable this
> to stop happening so that I might receive the names with their
> diacritics? (Or, rather, the internally corrected form; I understand
> that internally they are mapped into LaTeX equivalents.)
>
>
> Grant
>
> -- 
>
> ------------------------------------------------------------------------------
> Register Now & Save for Velocity, the Web Performance & Operations
> Conference from O'Reilly Media. Velocity features a full day of
> expert-led, hands-on workshops and two days of sessions from industry
> leaders in dedicated Performance & Operations tracks. Use code  
> vel09scf
> and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf
> _______________________________________________
> Bibdesk-users mailing list
> Bibdesk-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bibdesk-users


------------------------------------------------------------------------------
Register Now & Save for Velocity, the Web Performance & Operations 
Conference from O'Reilly Media. Velocity features a full day of 
expert-led, hands-on workshops and two days of sessions from industry 
leaders in dedicated Performance & Operations tracks. Use code vel09scf 
and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf
_______________________________________________
Bibdesk-users mailing list
Bibdesk-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bibdesk-users

Reply via email to