Hi,
If you share a link to your PDF sample, it would help for sure.
If you would like to understand what is happening in your case, I encourage you
to invest time in understanding how the “text” is stored in PDF files.
When getting the text from a PDF file there are many different cases and there
are cases where it is even not possible at all.
In summary:
- PDF is not containing the text as such: it contains encoded text (character
codes)
- Text has always a font associated to it. The character codes are normally
used to indicate which graphics representations from the font (which glyph) to
use in the rendering process. Next to that, text decoding is the process of
converting the character codes to "readable text” (Unicode 8 or Unicode 16 for
example).
- It may happen that the text encoding is a known encoding (WinAnsiEncoding,
MacRomanEncoding or even ASCII), but you cannot assume that it is always the
case. If this is the case, I think that PoDoFo is handling it correctly.
- There might exist a table in the PDF file, named the ToUnicode cmap. It
contains the conversion table from character codes to Unicode values. PoDoFo is
implementing the use of this table. But there might be some limitation, some
cases where it does not work correctly. You might be in such a case.
- The encoding might be specified as a known encoding + differences from it.
The code to handle this case is present in PoDoFo, but I’m not sure it works in
all cases. You might be is such a case.
- It may happen that the PDF file is not containing any direct information for
the text decoding. There are some tricks that can be used, but I don’t think
that PoDoFo is supporting this approach. So this is a third possibility for an
explanation.
In the TextExtractor sample, there a line
PdfString unicode = … ->ConvertToUnicode(…);
That’s a starting point...
Best regards,
Etienne
On 31 Jan 2017, at 11:25, frydery...@gmail.com<mailto:frydery...@gmail.com>
wrote:
Hi,
I tryed both, tool itself and includnig ‘TextExtractor’ to my project.
Maybe I build my podofo.dl wrong or freetype or zlib or something else?
I can give my pdf file here if needed.
Best regards.
Od: zyx<mailto:z...@litepdf.cz>
Wysłano: wtorek, 31 stycznia 2017 08:02
Do:
podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>
Temat: Re: [Podofo-users] reading polish characters using PoDoFo
On Mon, 2017-01-30 at 19:38 +0100,
frydery...@gmail.com<mailto:frydery...@gmail.com> wrote:
> It uses GetString() function, like I do.
> But it also uses this:
> pCurFont = pDocument->GetFont( pFont );
> and I get „Access violation reading location” error while it goes.
Hi,
did you run the tool on your PDF file, or you copy&pasted some portions
of the tool to your own code? In case you didn't try the tool itself, I
suggest to do it. Otherwise it would be interesting to have the PDF,
which causes the tool to that access violation.
Bye,
zyx
--
http://www.litePDF.cz<http://www.litepdf.cz>
i...@litepdf.cz<mailto:i...@litepdf.cz>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org<http://slashdot.org>! http://sdm.link/slashdot
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net<mailto:Podofo-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/podofo-users
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org<http://slashdot.org/>!
http://sdm.link/slashdot_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net<mailto:Podofo-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/podofo-users
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users