I think the problem is the lack of cube files in persian. Does anyone know how to add cube files to be used by tesseract? There is a 'fas' folder in 'langdata' that contains some cube related data, but I don't know how to use it with tesseract.
On Monday, August 17, 2015 at 4:25:23 PM UTC+4:30, shree wrote: > > >> On Mon, Aug 17, 2015 at 6:07 AM, ShreeDevi Kumar <shree...@gmail.com >> <javascript:>> wrote: >> >>> Ray was looking for comparative feedback regarding the new traineddata >>> for RTL languages, so this will be useful. >>> >> > >>>> Ray - > https://groups.google.com/forum/#!msg/tesseract-dev/qcFtWCAAlT8/SZ4xBS5DHwwJ > > Another caveat worth noting is that I only tested a small fraction of > these languages - maybe 25? > I suspect, for instance, that all the Arabic-based langages except ara > don't work very well. > I would be interested in an more feedback on how bad it is in any of them, > and will take suggestions into account for the next version after 3.04. > > >>> As far as I know, Google Docs does not use tesseract OCR engine for >>> recognizing the text. >>> >> >> Interesting. Can you please clarify source of your knowledge? >> > >> >>> Its OCR accuracy is better than Tesseract for some Indian languages >>> also. However, it doesn't seem to handle tifs, and processes only first 10 >>> pages of a pdf. >>> >> > > https://support.google.com/drive/answer/176692?hl=en > > > > >> >>> >>> On Sun, Aug 16, 2015 at 7:14 PM, Hossein Razizadeh <sm.h...@gmail.com >>> <javascript:>> wrote: >>> >>>> It seems 'fas' is for Persian, but there are no cube files, resulting >>>> in poor results. Arabic language files work much better for Persian >>>> images. >>>> There is another 'per' folder for Persian, but there isn't even >>>> '.traieddata' file for it. Does anyone know if 'Google Doc' has used >>>> 'Tesseract' for its OCR engine? Google Docs performs OCR for Persian >>>> images >>>> with good accuracy! >>>> >>>> On Saturday, July 18, 2015 at 8:14:07 AM UTC+4:30, Jeff Breidenbach >>>> wrote: >>>>> >>>>> I think 'fas' is the language code for Persian. >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com <javascript:>. >>>> To post to this group, send email to tesser...@googlegroups.com >>>> <javascript:>. >>>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/edd64e28-9e52-4b44-80cc-0aaa442caa85%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/edd64e28-9e52-4b44-80cc-0aaa442caa85%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com <javascript:>. >>> To post to this group, send email to tesser...@googlegroups.com >>> <javascript:>. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX%2B9UqeXbWr-E7sADWK3SeyjiyUiJBH6wSJoMy_E2geuQ%40mail.gmail.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduX%2B9UqeXbWr-E7sADWK3SeyjiyUiJBH6wSJoMy_E2geuQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com <javascript:>. >> To post to this group, send email to tesser...@googlegroups.com >> <javascript:>. >> Visit this group at http://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wxnq4BBwAZD%2BL-7rg80z2FmRpCQg4b8QMaXi-SLUoUcQ%40mail.gmail.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wxnq4BBwAZD%2BL-7rg80z2FmRpCQg4b8QMaXi-SLUoUcQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b78921f6-5f81-47ea-bef3-a8a72e5a641e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.