Hi, I am trying to extract some text content from a PDF file. If I use a PDF file with western content everything works perfect. If I try to do the same with a PDF file, which contains some asian characters, I get an exception (see below). As far as I can see is the cmap "UniJIS-UCS2-H" in the "Resources/cmap" folder. Do I have to load the cmap or is this map automatically loaded? Does PdfBox supports asian languages? What have I to do to support such languages? Any hint is welcome. Thanks Regards Bernd 28.09.2009 13:45:55 org.apache.pdfbox.util.PDFStreamEngine processOperator WARNUNG: java.io.IOException: Unknown encoding for 'UniJIS-UCS2-H' java.io.IOException: Unknown encoding for 'UniJIS-UCS2-H' at org.apache.pdfbox.encoding.EncodingManager.getEncoding(EncodingManager.java:68) at org.apache.pdfbox.pdmodel.font.PDFont.getEncoding(PDFont.java:566) at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:439) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:343) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:50) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:516) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:229) at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:70) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:516) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:229) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:188) at de.softvision.job.Job.getContentFromPDF(Job.java:264) at de.softvision.job.Job.loadPDF(Job.java:184) at invoiceclearing.Main.main(Main.java:30)
