On Tue, 2022-04-12 at 14:16 +0200, Francesco Pretto wrote: > It's a complex task and PoDoFo doesn't expose a high level API to > perform such text extraction. Also the handling of the different > predefined/custom encodings that the PDF standard allows to use or > define is incomplete and sometimes buggy.
Hi, while I cannot speak of the accuracy or completeness of the code, there exists a text extract tool [1], which is supposed to, well, extract text from the PDF files. It can give at least an idea of what to do using the low level API of PoDoFo. Bye, zyx [1] https://sourceforge.net/p/podofo/code/HEAD/tree/podofo/branches/PODOFO_0_9_7_BRANCH/tools/podofotxtextract/ _______________________________________________ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users