To be more precise with my questions:

- Is the user-patterns functiontionality implemented in the tesserocr
Python API of tesseract?
- How exact is the syntax of specifying user patterns with the tesserocr
Python API. Is SetVariable() correct and how is the path (Linux) and the
attribute specified?
- is there a default path, where it is lookes for the *.patterns /
*.user-patterns file

With the attached code from my last message, I've tested different
constellations with/without the combination of whitelist, different
atrributes and path notations, which was not successfull.

If I use the following notation for user patterns, it has no effect on the
results independently from the entries of the *.patterns file:

api.SetVariable('user_patterns_file',
'/home/roman/Dev_d/playground/user_patterns/deu.patterns')

Does anyone has (successfully) used user patterns with the tesserocr Python
API of tesseract?

best wishes and thanks, Roman


Am Sa., 2. März 2024 um 13:08 Uhr schrieb Zdenko Podobny <zde...@gmail.com>:

> Can you please elaborate on:
>
> Nevertheless, user patterns is not working in the way described above.
>
>
>
> Zdenko
>
>
> so 2. 3. 2024 o 10:45 Roman Seidel <roman.seide...@gmail.com> napísal(a):
>
>> Yes, sure, the input file is a snippet with a capital letter followed by
>> 9 digits. The correct user pattern, corresponding to [1] is:
>>
>> ``\A\d\d\d\d\d\d\d\d\d``
>>
>> The result of Tesseract (psm 8) is fully correct. Nevertheless, user
>> patterns is not working in the way described above.
>>
>> For instance, I have tried to extract only the capital character with
>> user patterns (not with whitelist), which is:
>>
>> \A
>>
>> In this case, the capital letter and all digits are given back by
>> tesseract.
>>
>> I've attached my input file and the corresponding Python snippet for
>> reading and proessing the image with tesserocr from [2]
>>
>>
>> [1]
>> https://github.com/tesseract-ocr/tesseract/blob/main/src/dict/trie.h#L197
>> [2] https://github.com/sirfz/tesserocr
>>
>>
>>
>> Am Fr., 1. März 2024 um 18:59 Uhr schrieb René JM Clais <
>> reneclai...@gmail.com>:
>>
>>> Can you send an example of an input document and the output of tesseract
>>> as well of what should be your expectation using the pattern file.
>>>
>>> Le jeu. 29 févr. 2024 à 21:40, Roman Seidel <roman.seide...@gmail.com>
>>> a écrit :
>>>
>>>> Hi all,
>>>>
>>>> I am currently try to use user-patterns on the PyTessBaseAPI from
>>>> tesserocr [1].
>>>>
>>>> What I've done is to initialize the API with:
>>>>
>>>> with PyTessBaseAPI(path='/usr/share/tesseract-ocr/4.00/tessdata', lang=
>>>> LANGUAGE, psm=int(psm), oem=int(TOEM)) as api:
>>>>
>>>> setting the user patterns file with:
>>>>
>>>> api.SetVariable('user_patterns_file',
>>>> '/home/roman/Dev_d/playground/user_patterns/deu.patterns')
>>>>
>>>> Where the user patterns file contains a pattern, e.g.:
>>>>
>>>> \A\A\A
>>>>
>>>> (which means three characters in capital letters.
>>>>
>>>>
>>>> The result, independently ,whether I use the user_patterns_file
>>>> argument or not, are the same. This brings me to the question if tesserocr
>>>> supports user (and word) patterns?
>>>>
>>>> My versions:
>>>>
>>>> tesserocr 2.6.2
>>>> tesseract 5.3.3
>>>>  leptonica-1.83.1
>>>>   libpng 1.6.34 : zlib 1.2.11
>>>>
>>>> Thanks a lot for your help and best wishes,
>>>> Roman
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAL%3DSc5uQAOGF7dD%2BtP2xt93Phv9OYy6anDGLdar4gxZxEDwjYQ%40mail.gmail.com.

Reply via email to