Can you please elaborate on:

Nevertheless, user patterns is not working in the way described above.



Zdenko


so 2. 3. 2024 o 10:45 Roman Seidel <roman.seide...@gmail.com> napísal(a):

> Yes, sure, the input file is a snippet with a capital letter followed by 9
> digits. The correct user pattern, corresponding to [1] is:
>
> ``\A\d\d\d\d\d\d\d\d\d``
>
> The result of Tesseract (psm 8) is fully correct. Nevertheless, user
> patterns is not working in the way described above.
>
> For instance, I have tried to extract only the capital character with user
> patterns (not with whitelist), which is:
>
> \A
>
> In this case, the capital letter and all digits are given back by
> tesseract.
>
> I've attached my input file and the corresponding Python snippet for
> reading and proessing the image with tesserocr from [2]
>
>
> [1]
> https://github.com/tesseract-ocr/tesseract/blob/main/src/dict/trie.h#L197
> [2] https://github.com/sirfz/tesserocr
>
>
>
> Am Fr., 1. März 2024 um 18:59 Uhr schrieb René JM Clais <
> reneclai...@gmail.com>:
>
>> Can you send an example of an input document and the output of tesseract
>> as well of what should be your expectation using the pattern file.
>>
>> Le jeu. 29 févr. 2024 à 21:40, Roman Seidel <roman.seide...@gmail.com> a
>> écrit :
>>
>>> Hi all,
>>>
>>> I am currently try to use user-patterns on the PyTessBaseAPI from
>>> tesserocr [1].
>>>
>>> What I've done is to initialize the API with:
>>>
>>> with PyTessBaseAPI(path='/usr/share/tesseract-ocr/4.00/tessdata', lang=
>>> LANGUAGE, psm=int(psm), oem=int(TOEM)) as api:
>>>
>>> setting the user patterns file with:
>>>
>>> api.SetVariable('user_patterns_file',
>>> '/home/roman/Dev_d/playground/user_patterns/deu.patterns')
>>>
>>> Where the user patterns file contains a pattern, e.g.:
>>>
>>> \A\A\A
>>>
>>> (which means three characters in capital letters.
>>>
>>>
>>> The result, independently ,whether I use the user_patterns_file argument
>>> or not, are the same. This brings me to the question if tesserocr supports
>>> user (and word) patterns?
>>>
>>> My versions:
>>>
>>> tesserocr 2.6.2
>>> tesseract 5.3.3
>>>  leptonica-1.83.1
>>>   libpng 1.6.34 : zlib 1.2.11
>>>
>>> Thanks a lot for your help and best wishes,
>>> Roman
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-ocr+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/767cc60f-5325-43d7-a6ef-9cf879f82950n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/767cc60f-5325-43d7-a6ef-9cf879f82950n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/tesseract-ocr/MMtdkQu3vSM/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> tesseract-ocr+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAPJAo_ok%2BQec6cJ1fxfb5NOqLVr8MAovZMNdXT-N3QS3di%2B%3Dng%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAPJAo_ok%2BQec6cJ1fxfb5NOqLVr8MAovZMNdXT-N3QS3di%2B%3Dng%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAL%3DSc5v%3DLm8Bf_5qE2yaFGb7sY99%3DLceSWTqEk8DMMR_GYWjeg%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAL%3DSc5v%3DLm8Bf_5qE2yaFGb7sY99%3DLceSWTqEk8DMMR_GYWjeg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xiLJ4ud%2B3hH1Jp0F-9z5ep_NwLyUUtwbcqreGbA81JTg%40mail.gmail.com.

Reply via email to