Is it possible for you to create a new JIRA issue at https://issues.apache.org/jira/browse/TIKA and upload the file (checking the box for "Grant license to ASF for inclusion in ASF works") ?
Checking this box is really important: if there is a bug in TIKA/PDFBox with your persian document, it would allow those projects to add the PDF file to regression tests. On Mon, Sep 12, 2011 at 3:47 AM, ahmad ajiloo <ahmad.aji...@gmail.com> wrote: > yes, of course! > please find the attachment. > > On Mon, Sep 12, 2011 at 9:42 AM, Robert Muir <rcm...@gmail.com> wrote: >> >> 2011/9/12 ahmad ajiloo <ahmad.aji...@gmail.com>: >> > Hello >> > I used Tika (of course in Nutch) to parse some persian pdf files. some >> > of >> > the files clearly transformed to a plain text. but about some of them, >> > output was corrupted. I used ICU4J v4 library and the text changed to >> > right-to-left mode. but the mentioned problem didn't resolve. insofar as >> > Tika can not understand any charachter of input persian pdf file! >> >> Maybe you can upload one of your PDF files to a Tika or PDFBox JIRA >> issue so they can investigate the problem? >> >> -- >> lucidimagination.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > -- lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org