On Tue, 2022-04-12 at 14:16 +0200, Francesco Pretto wrote:
> It's a complex task and PoDoFo doesn't expose a high level API to
> perform such text extraction. Also the handling of the different
> predefined/custom encodings that the PDF standard allows to use or
> define is incomplete and sometimes buggy.

        Hi,
while I cannot speak of the accuracy or completeness of the code, there
exists a text extract tool [1], which is supposed to, well, extract
text from the PDF files. It can give at least an idea of what to do
using the low level API of PoDoFo.
        Bye,
        zyx

[1] 
https://sourceforge.net/p/podofo/code/HEAD/tree/podofo/branches/PODOFO_0_9_7_BRANCH/tools/podofotxtextract/


_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to