Yes there is a difference. In Nutch we have a ICU4J library in lib directory. but there is no ICU4J lib or class file in a single tika jar file. for example in pdfbox jar file we have this path: com.ibm.icu . but there is no com.ibm path in a tika jar file. How can i add ICU4J library to the tika jar file?
On Mon, Oct 31, 2011 at 10:49 PM, Robert Muir <[email protected]> wrote: > Do you have ICU4J jar in your classpath in both situations? > > On Mon, Oct 31, 2011 at 1:35 PM, ahmad ajiloo <[email protected]> > wrote: > > Hello > > When I use Tika for extracting my persian pdf files, all the characters > will > > be extracted vice versa. I mean that the characters showed from > beginning of > > the line to the end, but from left to right. However when I use Tika gui > via > > Nutch there is no mistake and the output text is right-to-left !! > > > > Following text is the first line of attached file in first mode (running > > Tika independently): > > ﻲﻠﻋ ﺎﻳ ﻮﺗ ﻝﻼﺟ ﺯﺍ ﻢﻧﺯ ﻡﺩ ﻪﻜﻧﺁ ﺕﺭﺪﻗ ﺖﺳﺍﺮﻣ ﻪﻧ ﻲﻣﺮﻜﻣ ﺩﻮﺟ ﺩﻮﺟﻭ ﻪﺑ ﺖﻤﻳﻮﮔ ﻪﻛ > ﺖﺳﺍ > > ﺲﺑ ﻦﻴﻤﻫ ﻪﻧ ﻱﺪﺑﻮﻣ ﺖﺨﺗ ﻪﺑ ﻱﺍ ﻩﺩﺯ ﺖﻨﻄﻠﺳ ﻪﻴﻜﺗ ﻪﻜﻧﺁ ﻲﺋﻮﺗ > > > > and this is in second mode (running Tika gui via Nutch) and this is a > clear > > persian text: > > نه مراست قدرت آنكه دم زنم از جلال تو يا علي نه همين بس است كه گويمت > به > > وجود جود مكرمي توئي آنكه تكيه سلطنت زده اي به تخت موبدي > > > > Thanks for your attention > > > > > > > > > > > > > > -- > lucidimagination.com >
