Hi, 

I have to index PDF files. For that I am using pdfbox. But when I try to
extract text from pdf file using pdfbox I get the following error:

java.io.IOException: Error: No 'ToUnicode' and no 'Encoding' for Font

        at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:347)

        at
org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:169)

        at
org.pdfbox.util.PDFTextStripper.showString(PDFTextStripper.java:461)

        at
org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:692)

        at
org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:128)

        at
org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:268)

        at
org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:200)

        at
org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:172)

        at
org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:120)

        at org.pdfbox.ExtractText.main(ExtractText.java:213)

        at test.LuceneExampleIndexer.indexFile(LuceneExampleIndexer.java:67)

        at
test.LuceneExampleIndexer.indexDirectory(LuceneExampleIndexer.java:47)

        at test.LuceneExampleIndexer.index(LuceneExampleIndexer.java:30)

        at test.LuceneExampleIndexer.main(LuceneExampleIndexer.java:118)


Please tell me how to go about it.

Thanks,
Ankur 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to