Hi, On Wed, Jan 21, 2009 at 10:56 AM, Natraj Kadur <[email protected]> wrote: > I am using the PDFBox for one of the application. What I am doing is I > am extracting the PDF text from the PDF and generating the TOC entries. But > I am facing one problem, that is, if the PDF contains these two > characters "✠"(✠) and "Ⓔ"(Ⓔ) then the processpage(PDPage, > COSStream) gives an IOException "Unknown encoding for 'UniJIS-UCS2-H' ". Can > you let us know is there any way as to overcome this problem?
Unfortunately not. Unless someone else has a good answer, you'll probably need to look at the relevant source code in PDFBox to figure out what to do with this. If you do that, we'd be happy to apply any fix you may come up with. BR, Jukka Zitting
