Hi Jason,

thank you for this code. Can you please post it as a merge request at

  https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests

? That way it can get a proper review.

Best,
Oliver

On 1/29/26 20:29, Gans, Jason David wrote:
Hello Poppler project,

I have been working towards a solution for extracting text from PDF files that contain embedded Unicode values that do not match rendered glyphs. This idea was mentioned in the Poppler mailing lists back in 2012 (https://lists.freedesktop.org/archives/poppler/2012- April/009035.html <https://lists.freedesktop.org/archives/poppler/2012- April/009035.html>), but I couldn’t find any information suggesting that it was implemented and tested.

I have posted an experimental version of Poppler (“Poppler-science”; https://github.com/lanl/poppler-science <https://github.com/lanl/ poppler-science>) that has been modified to include a multilayer perceptron to decode font glyph symbols that are commonly used in the scientific literature. I would appreciate any feedback from the Poppler community and any suggestions for improvements!

Regards,

Jason Gans

Bioscience Division
Los Alamos National Laboratory




Reply via email to