Hi,

The most critical part is this:
https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html, but I need to
stress: tesseract is OCR *engine *not OCR *suite*.
Unless your input page is not a book page scan without a
difficult structure, you need to do your part like image processing and
document segmentation (detection of text block).

This is the reason why you get "unsatisfactory" results if you send
complicated images with non uniform texts, with graphics etc.
However if you will use only text part of the image for recognition you can
get very good results.

Best regards,

Zdenko


po 22. 1. 2024 o 19:42 L ht <lhtao0...@gmail.com> napísal(a):

> Hi Zdenko,
>
> Thanks for your response.
> I read the Tesseract User Manual (https://tesseract-ocr.github.io/tessdoc/),
> but not read the code
>
> I tried both tessdata_best and tessdata, tried different parameters of
> --psm, still can not get more detections.
>
> To provide some context, when I applied Tesseract to the entire image, it
> managed to identify only a few words, such as "Log in," "Username,"
> "Password," and "Cancel," primarily within the central, well-lit portion.
> However, when I cropped the image to retain either the upper or left
> portions, Tesseract exhibited improved performance, successfully detecting
> numerous words in those respective areas.
>
> Best,
> Haitao
>
> On Sun, Jan 21, 2024 at 3:02 AM Zdenko Podobny <zde...@gmail.com> wrote:
>
>> Did you read the documentation or did you just set your expectations?
>>
>>
>> Zdenko
>>
>>
>> ne 21. 1. 2024 o 12:00 L ht <lhtao0...@gmail.com> napísal(a):
>>
>>> I am new to use tesseract. I found tesseract does not work as expected.
>>> I attach one example.
>>>
>>> tesseract 5.3.2
>>> tesseract 272525030292764523137280353496213864766.png - -l eng --psm 3
>>> quiet
>>> can only detect those words
>>> "Log in
>>> Username
>>> Password
>>> Cancel"
>>>
>>> I submit this picture to several online pic->txt converters. they work
>>> well, detecting most of the text in the pic.
>>> For example, https://www.imagetotext.info/ it claims that it use
>>> tesseract
>>>
>>> I am not sure if I use tesseract correctly.
>>> Does another can help test what's your detection result of this
>>> picture?
>>> Thanks
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/e95fa7c6-7afb-4a08-8b11-a63a024c3c9bn%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/e95fa7c6-7afb-4a08-8b11-a63a024c3c9bn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y9abBL2T7wEiWB9KDAuOqkVY4DZcuqpc7u9PbY3jxfEg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y9abBL2T7wEiWB9KDAuOqkVY4DZcuqpc7u9PbY3jxfEg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CANmU3o_UAK6Qi_4SGxDQeRdRYWaHbdpQh%3DbHW-VM_S3yhJaXzQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CANmU3o_UAK6Qi_4SGxDQeRdRYWaHbdpQh%3DbHW-VM_S3yhJaXzQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zc4pyY%2BGJfVGrJ-yDMTo1tLn9DA502FJeB_V%3DLKi5p%2BQ%40mail.gmail.com.

Reply via email to