A problem in the right-to-left languages

ahmad ajiloo Mon, 31 Oct 2011 10:35:35 -0700

Hello
When I use Tika for extracting my persian pdf files, all the characters
will be extracted vice versa. I mean that the characters showed from
beginning of the line to the end, but from left to right. However when I
use Tika gui via Nutch there is no mistake and the output text is
right-to-left !!


Following text is the first line of attached file in first mode (running
Tika independently):
   ﻲﻠﻋ ﺎﻳ ﻮﺗ ﻝﻼﺟ ﺯﺍ ﻢﻧﺯ ﻡﺩ ﻪﻜﻧﺁ ﺕﺭﺪﻗ ﺖﺳﺍﺮﻣ ﻪﻧ ﻲﻣﺮﻜﻣ ﺩﻮﺟ ﺩﻮﺟﻭ ﻪﺑ ﺖﻤﻳﻮﮔ ﻪﻛ
ﺖﺳﺍ ﺲﺑ ﻦﻴﻤﻫ ﻪﻧ ﻱﺪﺑﻮﻣ ﺖﺨﺗ ﻪﺑ ﻱﺍ ﻩﺩﺯ ﺖﻨﻄﻠﺳ ﻪﻴﻜﺗ ﻪﻜﻧﺁ ﻲﺋﻮﺗ

and this is in second mode (running Tika gui via Nutch) and this is a clear
persian text:
نه مراست قدرت آنكه دم زنم از جلال تو يا علي      نه همين بس است كه گويمت به
وجود جود مكرمي توئي آنكه تكيه سلطنت زده اي به تخت موبدي

Thanks for your attention

A problem in the right-to-left languages

Reply via email to