Hello Poppler project, I have been working towards a solution for extracting text from PDF files that contain embedded Unicode values that do not match rendered glyphs. This idea was mentioned in the Poppler mailing lists back in 2012 (https://lists.freedesktop.org/archives/poppler/2012-April/009035.html), but I couldn’t find any information suggesting that it was implemented and tested.
I have posted an experimental version of Poppler (“Poppler-science”; https://github.com/lanl/poppler-science) that has been modified to include a multilayer perceptron to decode font glyph symbols that are commonly used in the scientific literature. I would appreciate any feedback from the Poppler community and any suggestions for improvements! Regards, Jason Gans Bioscience Division Los Alamos National Laboratory
