[poppler] tweaking pdfto[html|xml] to avoid spaces within words + which spellcheck to use ...

Albretch Mueller Tue, 02 Jun 2020 04:54:07 -0700

 which option should be used to avoid such results

 <a href="...#183">Per cep tual  Re sponse .</a></text>


 or, which spellcheckers do you use in tandem with pdftohtml to
correct such spaces within words (and, optimally, spellcheck those
line).

 It appears to be something either within the pdf file or the text
extraction algorithm (based on phonemes?), because the starting and
ending characters of the words/meaningful sequences of characters are
never splitted.

 The spellcheck of libreoffice doesn't "correct all" such spaces
splitting words, which appear also, if you go: okular > export as >
text,

 lbrtchx
_______________________________________________
poppler mailing list
poppler@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/poppler

[poppler] tweaking pdfto[html|xml] to avoid spaces within words + which spellcheck to use ...

Reply via email to