I have created a branch in SVN for working on this (textextraction), and both versions of your sample file are in the /test/text_extraction_PDFs folder.
 
This issue is going to have to sleep for a little while until Paulo and/or Bruno have a chance to take a look at my character ordering code to see what I might be doing wrong.
 
- K
 
 
----------------------- Original Message -----------------------
  
From: "Hoppe, Michael" <[email protected]>
To: "Post all your questions about iText here" <[email protected]>
Cc: 
Date: Fri, 16 Jan 2009 08:38:51 +0100
Subject: Re: [iText-questions] extracting text from pdfs with japanese data
  
Hi Kevin,

I tried to remove everything from the pdf-document except the first String on the first page. Maybe this helps?
See the pdf attached.

Greetings

Michael
> -----Ursprüngliche Nachricht-----
> Von: Kevin Day [mailto:[email protected]]
> Gesendet: Donnerstag, 15. Januar 2009 18:24
> An: IText Questions
> Betreff: Re: [iText-questions] extracting text from pdfs with japanese data
>
> OK - that means that either my code is bad, or the strategy of using UniJIS-UTF16-
> H when no ToUnicode map is provided is flawed.
>
> At this stage, one of the iText developers is going to need to take a look at things.
> I'll send an email and see how best to proceed on this.  It would be much easier if
> we had a significantly simpler source PDF to work from that uses that font.
>
> - K
> ;
>
>
>
>
>
> ----------------------- Original Message -----------------------
>
> From: "Hoppe, Michael" <[email protected]>
> To: "Post all your questions about iText here" <itext-
> [email protected]>
> Cc:
> Date: Mon, 12 Jan 2009 12:46:55 +0100
> Subject: Re: [iText-questions] extracting text from pdfs with japanese data
>
> Hi Kevin,
>
> Also sorry for the delay. I was in vacation until today.
> The txt-file you attached to your last mail does not show any japanese characters
> but only gibberish (i am using a unicode editor, so it should show up correctly).
> The output should look like the txt-file i attached to this mail.
> Or didnt i get you correctly?
>
> Thanks + greetings
>
> Michael
>
> Dr. Michael Hoppe
> ePublishing & eScience
> Development & Applied Research
> Phone +49 7247 808-251
> Fax +49 7247 808-133
> [email protected]
>
>
> FIZ Karlsruhe
> Hermann-von-Helmholtz-Platz 1
> 76344 Eggenstein-Leopoldshafen, Germany
>
> www.fiz-karlsruhe.de <http://www.fiz-karlsruhe.de/>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> SourcForge Community
> SourceForge wants to tell your story.
> http://p.sf.net/sfu/sf-spreadtheword
> _______________________________________________
> iText-questions mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> Buy the iText book: http://www.1t3xt.com/docs/book.php


-------------------------------------------------------

Fachinformationszentrum Karlsruhe, Gesellschaft für wissenschaftlich-technische Information mbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
Geschäftsführerin: Sabine Brünger-Weilandt.
Vorsitzender des Aufsichtsrats: MinR Hermann Riehl.


------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to