Re: Adding per-character OCR to Poppler

Albert Astals Cid Mon, 02 Feb 2026 15:42:51 -0800

El dijous, 29 de gener del 2026, a les 20:29:45 (Hora estàndard d’Europa 
central), Gans, Jason David va escriure:
> Hello Poppler project,
> 
> I have been working towards a solution for extracting text from PDF files
> that contain embedded Unicode values that do not match rendered glyphs.
> This idea was mentioned in the Poppler mailing lists back in 2012
> (https://lists.freedesktop.org/archives/poppler/2012-April/009035.html),
> but I couldn’t find any information suggesting that it was implemented and
> tested.
> 
> I have posted an experimental version of Poppler (“Poppler-science”;
> https://github.com/lanl/poppler-science) that has been modified to include
> a multilayer perceptron to decode font glyph symbols that are commonly used
> in the scientific literature. I would appreciate any feedback from the
> Poppler community and any suggestions for improvements!


Let's follow up in
https://gitlab.freedesktop.org/poppler/poppler/-/merge_requests/2111

Cheers,
  Albert

> 
> Regards,
> 
> Jason Gans
> 
> Bioscience Division
> Los Alamos National Laboratory

Re: Adding per-character OCR to Poppler

Reply via email to