Kevin,
Unfortunately i cannot send you a shorter pdf. I got the pdf from people using our software in Japan and complaining about iText not working. They sayd their pdfs are generated with some software, so I cannot recreate a shorter pdf. But I attached the Unicode-file for the tic_dogu2 pdf (extracted with PDFlib, a commercial software). Thanks + Greetings Michael Dr. Michael Hoppe ePublishing & eScience Development & Applied Research Phone +49 7247 808-251 Fax +49 7247 808-133 [email protected] FIZ Karlsruhe Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen, Germany www.fiz-karlsruhe.de <http://www.fiz-karlsruhe.de/> Von: Kevin Day [mailto:[email protected]] Gesendet: Freitag, 19. Dezember 2008 01:43 An: IText Questions Betreff: Re: [iText-questions] extracting text from pdfs with japanese data Michael- Can you please send a PDF that uses the font in question, but is *simple* - maybe containing 2 lines with 3 or 4 words in each? Also, please send a unicode file that has the text for those files. I can't look at the fonts themselves and figure out whether the decoding I'm doing is actually working, but I can compare the results to a unicode file that has what the results should be. - K > > ----------------------- Original Message ----------------------- > > From: "Hoppe, Michael" <[email protected]> > <mailto:[email protected]> > <mailto:[email protected]> > <mailto:[email protected]> > To: "Post all your questions about iText here" > <[email protected]> > <mailto:[email protected]> > <mailto:[email protected]> > <mailto:[email protected]> > Cc: > Date: Wed, 17 Dec 2008 17:12:58 +010 0 > Subject: Re: [iText-questions] extracting text from > pdfs with japanese data > > Hi all, > > Attached see the Pdfs i had the problems with (I send > them once before) > content1.pdf gives : java.io.IOException: '>' not > expected at file pointer 39040 > tic_dogu2.pdf gives java.lang.NullPointerException > because font is not embedded in pdf > > text from content1.pdf can get extracted with the adobe > viewer bean (another open source library that we don't want > to use for our project for various reasons) so I don't think > there is something wrong with the file itself. > > ; Greetings > > Michael > > Dr. Michael Hoppe > ePublishing & eScience > Development & Applied Research > Phone +49 7247 808-251 > Fax +49 7247 808-133 > [email protected] > > > FIZ Karlsruhe > Hermann-von-Helmholtz-Platz 1 > 76344 Eggenstein-Leopoldshafen, Germany > > www.fiz-karlsruhe.de <http://www.fiz-karlsruhe.de/> > <http://www.f%0d%0a%20iz-karlsruhe.de/> > Von: Kevin Day [mailto:[email protected]] > <mailto:[email protected]> > Gesendet: Mittwoch, 17. Dezember 2008 15:31 > An: IText Questions > Betreff: Re: [iText-questions] extracting text from > pdfs with japanese data > > CMapAwareDocumentFont has this parsing via the CMap > class - this encapsulates the parsing behind an object, and > makes it a lot easier to deal with. > > I think that the biggest thing here is actually finding > the appropriate CMap data byte stream (either from embedded > data in the PDF, or from the file system) - right now, > loca ting the CMap information is a weak point in the content parser. > > If the cmap data is included in a jar on the classpath, > then the CMap could absolutely be read from the jar. > > Can the OP please send a PDF that demonstrates the > issue? I'll take a look at the font information and see how > tough it would be to add this type of lookup if TOUNICODE > isn't available. > > - K > > ----------------------- Original Message ----------------------- > > From: "Paulo Soares" <[email protected]> <mailto:[email protected]> > <mailto:[email protected]> <mailto:psoa...@consist%0d%0a%20e.pt> > To: "Post all your questions about iText here" > <[email protected]> > <mailto:[email protected]> > <mailto:[email protected]> > <mailto:[email protected]> > Cc: > Date: Tue, 16 Dec 2008 09:55:36 -0000 > Subject: Re: [iText-questions] extracting text from > pdfs with japanese data > > There's code in PdfEncodings to parse and convert > to/from Unicode the cmaps. > The font contains the cmap name. > > Paulo > &nb sp; > ----- Original Message ----- > From: "1T3XT info" <[email protected]> <mailto:[email protected]> > <mailto:[email protected]> <mailto:[email protected]> > To: "Post all your questions about iText here" > <[email protected]> > <mailto:[email protected]> > <mailto:[email protected]> > <mailto:[email protected]> > Sent: Tuesday, December 16, 2008 9:19 AM > Subject: Re: [iText-questions] extracting text from > pdfs with japanese data > > > H oppe, Michael wro te: > > The CMap-files are included in the > iTextAsianCmaps.jar. So couldn't they > > be read from that jar in case there is no font > information in the pdf? > > I'm just thinking out loud here, I didn't dive into the > problem yet, > but: do you think it's possible for iText to find which > CMap-file is t o > be inspected based on the font information availa ble > in the PDF? > > As Kevin already said: this part of iText is pretty > new. We're all > excited about it, but for the moment it's all highly > experimental. > -- > This answer is provided by 1T3XT BVBA > &nbs p; http://www.1t3xt.com/ - http://www.1t3xt.info Aviso Legal: Esta mensagem é destinada exclusivamente ao destinatário. Pode conter informação confidencial ou legalmente protegida. A incorrecta transmissão desta mensagem não significa a perca de confidencialidade. Se esta mensagem for recebida por engano, por favor envie-a de volta para o remetente e apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o destinatário de usar, revelar ou distribuir qualquer parte desta mensagem. Disclaimer: This message is destined exclusively to the intended receiver. It may contain confidential or legally protected information. The incorrect transmission of this message does not mean the loss of its confidentiality. If this message is received by mistake, pleas e send it back to the sender and delete it from your system immediately. It is forbidden to any person who is not the intended receiver to use, distribute or copy any part of this message. ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions <https://lists.sourceforge.net/lists/listinfo/itext-qu%0d%0a%20estions> Buy the iText book: http://www.1t3xt.com/docs/book.php ------------------------------------------------------- Fachinformationszentrum Karlsruhe, Gesellschaft für wissenschaftlich-technische Information mbH. Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892. Geschäftsführerin: Sabine Brünger-Weilandt. Vorsitzender des Aufsichtsrats: MinR Hermann Riehl.
㯠ããã âã¨ãã»ã¤ æè¡è ã®éå ·ç®± (2): Linux ã¨ã®å·¡ãåãã㯠èé ã®å¦ã è½ çå¸ ç©è³ªã»ææç ç©¶æ©æ§ å ææã»ã³ã¿ã¼ â Shin-ichi TODOROKI 大é¨ã®æãã¨ãããã¨ããããéã®éä¸ã§ã«ãã é¨ã«ããã¨ãæ¿¡ãã¦ã¯å°ãã¨ã°ããã«ãæ¥ãã§è»ä¸ ãªã©ã«èµ°ã£ããããããæ¿¡ãããã¨ã«ã¯å¤ãããªãã ã¯ããããæ¿¡ãã¦ããã¾ããªãã¨æã£ã¦ããã°ã㪠ãã®è¦ã«ãªããã¨ããããããããã¯ãã¹ã¦ã®ã㨠ã«å ±éããå¿å¾ã§ããã ãèé ãå±±æ¬å¸¸æãèæ¸ç¬¬ä¸ ä¸ä¹ ããã°ãMicrosoft Windows (ç¾ä»£èªè¨³ æ¬ç°ææ [1]) 1999 å¹´ 6 æãä»ããã®æ©ã ã¨ç¢ºä¿¡ãããå¦çæä»£ ããæç¨ãã¦ããçµçå¦çã·ã¹ãã ãL ATEX ãããã ããå¾ã«ä½¿ãå§ãã Windows ä¸ã§å©ç¨ãã¦ããã®ã ããã©ããã¦ãæä½ä¸ã®éåæãæãå»ããªãã§ã ãããããããããä¸åãåé¡ã«ç´é¢ããè¦æãæ±º ããã æ°å¹´åãããããå¦ä¼ã®æ å ±ãããã¯ã¼ã¯ã管ç ããå°å§å¡ä¼ã«åå ãã¦ããã夿 ªã®å 輩éã¯ãä¼ å¡åããã½ã³ã³éä¿¡ãµã¼ãã¹ã廿¢ãã代ããã«ã¤ ã³ã¿ã¼ãããã®ãã¼ã ãã¼ã¸ã«ãã£ã¦æ å ±çºä¿¡ãã ä½å¶ãçµã¿ããã¦ãå¼éãã¦ãã£ãã æãããããããã¯ã¼ã¯ã«æ¥ç¶ãããã³ã³ãã¥ã¼ ã¿ã®ã»ãã¥ãªãã£ç®¡çã«æ³¨ç®ãéã¾ãæ§ã«ãªãã管 çã®çããæ·±å»ãªåé¡ãå¼ãèµ·ãããã¨ãèªèãã å§ãã¦ãããå¼ãç¶ãã ãµã¼ãã¼ã¯ãå°å ¥å½åãã ç»æçãªåå¨ã ã£ããã®ã®ããããã¯ã¼ã¯ãç©é¨ã« ãªã£ã¦ãã¾ã£ããã®æç¹ã§ã¯ãã¨ã¦ãæºè¶³ã«ç®¡çã ãã¦ããã¨ã¯æããªããã¨ã夿ããã ç¾ç¶ã®æ©è½ãç¶æãã¦ç®¡çã夿³¨ããã¨ãªãã¨ã å°ãªãã¨ãå¹´é 300 ä¸åã¯æããã¨ããããã®å¦ä¼ ã«ãããªä½è£ã¯ç¡ããã¨ãªãã°ãèªåã管çæè¡ã â ã 305-0044 è¨åçã¤ãã°å¸ä¸¦æ¨ 1-1 fax 029-854-9060 URL: http://www.geocities.jp/tokyo 1406/ 身ã«ä»ãããããªããããããWindows ã¨ã¯åæã ç°ãªã OS(ãªãã¬ã¼ãã£ã³ã°ã·ã¹ãã ) ãåãã¦ãã ãµã¼ãã¼ãæãªãããããã ããã?ä¸å®ãæ±ãã ã¾ã¾è²¬ä»»ããã¶ãã®ã¯ç²¾ç¥è¡çä¸ãããããªããã ããªããã£ããé²ãã§é¨ã«æ¿¡ãã¦ãã¾ãããæ¥å¸¸ã® æ¸ãç©ãå®é¨ã§ä½¿ããã½ã³ã³ãåã OS ã«ãã¦ãã¾ ãã°ãè¦ããã¹ãæè¡ã¯ä¸ç¨®é¡ã§æ¸ã¿ãæ´»ç¨ã§ãã å±é¢ã¯åã«ãªãã OS 㯠Debian GNU/Linux ã«æ±ºãã¦ãããååã¨ã ã¦ããªã¼ã½ããã¦ã¨ã¢ã ãã§ã¾ã¨ãããã OS ã§ã ãããã®ç®¡çä½å¶ããã©ã³ãã£ã¢ãã¼ã¹ã§ã¯ããã ç¸å½ãã£ãããã¦ãããå ãã¦ãæ¥æ¬èªã®è§£èª¬æ¬ [2] ãåºçããå§ããé ã ã£ããæ©éãè·å ´ã¨èªå® ã®ã ã½ã³ã³ã«ã¤ã³ã¹ãã¼ã«ããç°å¢ãæ´ãã¦ãã£ãã Linux ã¨ã¯ä½ã? ãã㯠UNIX â ã¨äºææ§ãæãã OS ã§ããã1991 å¹´ã«ãã®æåã®ãã¼ã¸ã§ã³ãå ¬éããããéçºããã® ã¯ã彿ãã£ã³ã©ã³ãã®å¤§å¦çã ã£ã Linus Torvalds æ°ã§ããã å½¼ã¯ãªããæ¢åã® OS ã®ã¯ãã¼ã³ãããããä½ã£ ãã®ã?ããã¯ãUNIX ã®ã·ã³ãã«ãªç¾ããã«é ã ããã䏿¹ã§ãå½¼ã大æãã¯ããã¦è³¼å ¥ãã PC ã«ã å¯ä¸ã¤ã³ã¹ãã¼ã«ã§ãããã UNIX äºæ OS ã«æºè¶³ ã§ããªãã£ãããã ããã [3]ã ããªã¼ã½ããã¦ã¨ã¢ã¨ãã¦å ¬éãã Linux ã¯ä¸é¨ ã®ç±ççãªæ¯æãåããä¸çä¸ã«æ£ãã°ãæå¿ã«ã ãèã®æ ¹çå ±åéçºã®å½¢ã§æ¥éã«é²æ©ãã¦ãã£ãã ä»ã§ã¯ããã½ã³ã³ãè²·ãã®ã«ãLinux ãããªã¤ã³ã¹ ãã¼ã«ãããã¢ãã«ãé¸ã¶ãã¨ãã§ããã¾ã§ã«è³ã£ ã¦ããã â ååã®ã¨ãã»ã¤ (2008 å¹´ 7 æå·) ã§ãçè 㯠UNIX ã大å¦é¢ å¨ç±ä¸ã«ä½¿ã£ã¦ãããã¨ãè¿°ã¹ãã Materials Integration Vol.21 No.08 (2008) 65âã¨ãã»ã¤ å¦ä¼ Web ãµã¼ãã¼ã®äº¤æã¨æ¹è¯ å¦ä¼ã§ã¯æ°ãã PC ãè³¼å ¥ãã¦ããããDebian GNU/Linux ãã¤ã³ã¹ãã¼ã«ã㦠Web ãµã¼ãã¼ãç« ã¡ä¸ãããå¤ããµã¼ãã¼ã®ä¸èº«ãç§»ãæ¿ãã使¥ç ãéãã¦ããµã¼ã管çæè¡ã身ã«ä»ãã¦ãã£ãã念 é¡ã®ããã©ãã¯ããã¯ã¹ããã®è±å´ããæãããæã ãã¢ãã¦ã³ã¹ãããã»ãã¥ãªãã£åä¸ã®ããã®ã¢ã ããã¼ãã«ããããã«è¿½éã§ããä½å¶ãæ´ããã æ¬¡ã«åãçµãã ã®ããä¼å¡éå®ã§å ¬éã§ãããã¼ ã ãã¼ã¸ã®ä»çµã¿ã¥ããã ã£ããç¹å®ã®ãã¼ã¸ãé ããã¨ããã¨ãã¹ã¯ã¼ãã®å ¥åãæ±ããããæ§ã«ã ã¦ãéä¼å¡ã¨ã®å·®å¥åãéãã¦ãµã¼ãã¹ã®å å®ãå³ ãã®ã§ãããä¼å¡ ID ã¨ãã¹ã¯ã¼ãã¯æ¢ã«ä¼å¡ã«äº¤ ä»ãã¦ãããå¦ä¼ã®å¹´ä¼åå ç»é²ãæ¥è ãã¤ã³ã¿ã¼ ãããä¸ã§ã¨ãã¾ã¨ããã®ã«å©ç¨ããã¦ããããã¡ ãå´ã§è¿½å ãã¹ãæ©è½ã¯ãID æ å ±ãæ¥è ããèªåç ã«åãåã£ã¦ãã¹ã¯ã¼ãèªè¨¼ã«å©ç¨ãããã¨ã ã£ãã Linux ä¸ã§ä½¿ããç§é¸ãªããªã¼ã½ããã¦ã¨ã¢éã®ã èã§ãé£ãªãå®è£ ãããã¨ãã§ããã ãµã¼ã管çæè¡ç¿å¾ã®ãå©ç èªè¨¼æ©è½ä»ã Web ãµã¼ãã¼ãç«ã¡ä¸ããã¹ãã«ã 身ã«ä»ããã®ã§ããããèªåã®ä»äºã«æ´»ç¨ãã¹ããå® é¨ãã¼ãã®é»ååã«åãçµãã ãç ç©¶æã® LAN å ã«ããã° â¡ ãµã¼ãã¼ãè¨ç½®ããèªåããã¢ã¯ã»ã¹åºæ¥ ãªãããã«ãã¹ã¯ã¼ãèªè¨¼æ©è½ã追å ãããèªå® ã® PC ã¨ãåæããä»çµã¿ãæ´ãããã¤ã§ãã©ãã§ãå® é¨ãã¼ããè¨å ¥ã»åç §ã§ããããã«ãªã£ãã ããã¾ã§ã¯ãå®é¨ãã¼ããã©ããã«ç½®ãå¿ãããã æã®è¨è¿°ãæãåºãã®ã«æéãæãã£ããã¨ãä¸ä¾¿ ãæãã¦ãããé»ååå®é¨ãã¼ãã§ã¯å ¨ææ¤ç´¢æ©è½ ã使ããã®ã§ãå¿ãã¦å°ããã¨ã¯ä½ã§ãæ¸ãçã㦠ããç¿æ £ãä»ããã ãã®ãã¦ãã¦ãã¾ã¨ãã¦ãææé¢ä¿ã®å½éã¯ã¼ã¯ ã·ã§ããã§çºè¡¨ãããã¡ããã©ä¸ççã«ããã°ãæ³¨ç® ãããææã¨éãªãããªã³ã©ã¤ã³å ¬éãããè±æäº ç¨¿ã«ãèªç¶ã¨é¢å¿ãéã¾ã£ã (å訳ç㯠[4])ãçµæã¨ ãã¦ããã®éèªã®ãã¦ã³ãã¼ãã©ã³ãã³ã° (2006 å¹´ 第 1 ååæ) ã® 11 ä½ã«é£ãè¾¼ãã«è³ãããããçºç«¯ â¡ 2000 年彿ãããã°ã¯ã¾ã æ¥æ¬ã«ä¸é¸ãã¦ããããæ¥æ¬ã§ç¬ èªã«çºå±ãã¦ããå ¬éæ¥è¨ã·ã¹ãã ãã¤ã³ã¹ãã¼ã«ããã å³ 1: çè ã管çãã¦ããç¾å¨ã®å¦ä¼ Web ãµã¼ãã¼ã ã¨ãªã£ã¦ãããªã¼ç¾ç§äºå ¸ãã¦ã£ãããã£ã¢ãã®å® é¨ãã¼ãã®é ç®ã§ç´¹ä»ãããäºæ ã«ã¾ã§çºå±ããã å æ¥ãè·å ´ã®è¥æç ç©¶è ããã¼ãããèããã ãããéç¨ãåãã¨ãã«ã°ã£ããéã£ã¦ãããã§ã ããã¼ãã 説æãã¿ã¦ãã¾ãã®ã§å£ã«ã¯ããªãã£ãããçè ã®æè¦ã¯ã²ã¨ã¤ãè °ãæ®ãã¦é¨ã«æ¿¡ãã¦ã¿ãã®ãã¾ ãä¸èã [åèæç®] [1] æ¬ç° ææ:âãã¿ãªäººçè«ããèé â, æ²³åºæ¸æ¿æ° 社 (2004). [2] è³å°¾ æ¡:â仿¥ãã Debian GNU/Linuxâ, ãªã¼ã 社 (1999). [3] ãªã¼ãã¹ ãã¼ãã«ãº, ãã¤ããã ãã¤ã¢ã¢ã³ã: âãããã¼ãã«ã¯æ¥½ããã£ãããâ, å°å¦é¤¨ããã ã¯ã·ã§ã³ (2001). (é¢¨è¦ æ½¤ 訳ãä¸å³¶ æ´ ç£è¨³). [4] è½ çå¸, å°è¥¿ æºä¹, äºä¸ æ:âããã°ãåºã«ãã å®é¨ãã¼ã: å人ã®ç ç©¶æ´»åãå¹çåããæ å ±ç° å¢â. http://www.geocities.jp/tokyo 1406/04WCMST J.pdf 66 ãããªã¢ã«ã¤ã³ãã°ã¬ã¼ã·ã§ã³ Vol.21 No.08 (2008)
------------------------------------------------------------------------------
_______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php
