2011/9/12 ahmad ajiloo <[email protected]>: > Hello > I used Tika (of course in Nutch) to parse some persian pdf files. some of > the files clearly transformed to a plain text. but about some of them, > output was corrupted. I used ICU4J v4 library and the text changed to > right-to-left mode. but the mentioned problem didn't resolve. insofar as > Tika can not understand any charachter of input persian pdf file!
Maybe you can upload one of your PDF files to a Tika or PDFBox JIRA issue so they can investigate the problem? -- lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
