Hi, I have to index PDF files. For that I am using pdfbox. But when I try to extract text from pdf file using pdfbox I get the following error:
java.io.IOException: Error: No 'ToUnicode' and no 'Encoding' for Font at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:347) at org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:169) at org.pdfbox.util.PDFTextStripper.showString(PDFTextStripper.java:461) at org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:692) at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:128) at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:268) at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:200) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:172) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:120) at org.pdfbox.ExtractText.main(ExtractText.java:213) at test.LuceneExampleIndexer.indexFile(LuceneExampleIndexer.java:67) at test.LuceneExampleIndexer.indexDirectory(LuceneExampleIndexer.java:47) at test.LuceneExampleIndexer.index(LuceneExampleIndexer.java:30) at test.LuceneExampleIndexer.main(LuceneExampleIndexer.java:118) Please tell me how to go about it. Thanks, Ankur --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]