I think the problem is the lack of cube files in persian. Does anyone know 
how to add cube files to be used by tesseract? There is a 'fas' folder in 
'langdata' that contains some cube related data, but I don't know how to 
use it with tesseract.

On Monday, August 17, 2015 at 4:25:23 PM UTC+4:30, shree wrote:
>
>
>> On Mon, Aug 17, 2015 at 6:07 AM, ShreeDevi Kumar <shree...@gmail.com 
>> <javascript:>> wrote:
>>
>>> Ray was looking for comparative feedback regarding the new traineddata 
>>> for RTL languages, so this will be useful.
>>>
>>
> ​>>>> Ray - 
> https://groups.google.com/forum/#!msg/tesseract-dev/qcFtWCAAlT8/SZ4xBS5DHwwJ
>
> Another caveat worth noting is that I only tested a small fraction of 
> these languages - maybe 25?
> I suspect, for instance, that all the Arabic-based langages except ara 
> don't work very well.
> I would be interested in an more feedback on how bad it is in any of them, 
> and will take suggestions into account for the next version after 3.04.
>
>
>>> As far as I know, Google Docs does not use tesseract OCR engine for 
>>> recognizing the text. 
>>>
>>
>> Interesting. Can you please clarify source of your knowledge? 
>>
>  
>>
>>> Its OCR accuracy is better than Tesseract for some Indian languages 
>>> also. However, it doesn't seem to handle tifs, and processes only first 10 
>>> pages of a pdf.
>>>
>>
> ​
> ​https://support.google.com/drive/answer/176692?hl=en
>
> ​
>  
>
>>
>>>
>>> On Sun, Aug 16, 2015 at 7:14 PM, Hossein Razizadeh <sm.h...@gmail.com 
>>> <javascript:>> wrote:
>>>
>>>> It seems 'fas' is for Persian, but there are no cube files, resulting 
>>>> in poor results. Arabic language files work much better for Persian 
>>>> images. 
>>>> There is another 'per' folder for Persian, but there isn't even 
>>>> '.traieddata' file for it. Does anyone know if 'Google Doc' has used 
>>>> 'Tesseract' for its OCR engine? Google Docs performs OCR for Persian 
>>>> images 
>>>> with good accuracy!
>>>>
>>>> On Saturday, July 18, 2015 at 8:14:07 AM UTC+4:30, Jeff Breidenbach 
>>>> wrote:
>>>>>
>>>>> I think 'fas' is the language code for Persian.
>>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com <javascript:>.
>>>> To post to this group, send email to tesser...@googlegroups.com 
>>>> <javascript:>.
>>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/edd64e28-9e52-4b44-80cc-0aaa442caa85%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/edd64e28-9e52-4b44-80cc-0aaa442caa85%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com <javascript:>.
>>> To post to this group, send email to tesser...@googlegroups.com 
>>> <javascript:>.
>>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX%2B9UqeXbWr-E7sADWK3SeyjiyUiJBH6wSJoMy_E2geuQ%40mail.gmail.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX%2B9UqeXbWr-E7sADWK3SeyjiyUiJBH6wSJoMy_E2geuQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com <javascript:>.
>> To post to this group, send email to tesser...@googlegroups.com 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wxnq4BBwAZD%2BL-7rg80z2FmRpCQg4b8QMaXi-SLUoUcQ%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wxnq4BBwAZD%2BL-7rg80z2FmRpCQg4b8QMaXi-SLUoUcQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b78921f6-5f81-47ea-bef3-a8a72e5a641e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to