Re: Tika can not parse all of the persian pdf files

Robert Muir Sun, 11 Sep 2011 23:13:37 -0700

2011/9/12 ahmad ajiloo <[email protected]>:
> Hello
> I used Tika (of course in Nutch) to parse some persian pdf files. some of
> the files clearly transformed to a plain text. but about some of them,
> output was corrupted. I used ICU4J v4 library and the text changed to
> right-to-left mode. but the mentioned problem didn't resolve. insofar as
> Tika can not understand any charachter of input persian pdf file!


Maybe you can upload one of your PDF files to a Tika or PDFBox JIRA
issue so they can investigate the problem?

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Tika can not parse all of the persian pdf files

Reply via email to