Hello Poppler project,

I have been working towards a solution for extracting text from PDF files that 
contain embedded Unicode values that do not match rendered glyphs. This idea 
was mentioned in the Poppler mailing lists back in 2012 
(https://lists.freedesktop.org/archives/poppler/2012-April/009035.html), but I 
couldn’t find any information suggesting that it was implemented and tested.

I have posted an experimental version of Poppler (“Poppler-science”; 
https://github.com/lanl/poppler-science) that has been modified to include a 
multilayer perceptron to decode font glyph symbols that are commonly used in 
the scientific literature. I would appreciate any feedback from the Poppler 
community and any suggestions for improvements!

Regards,

Jason Gans

Bioscience Division
Los Alamos National Laboratory



Reply via email to