Can you please elaborate on: Nevertheless, user patterns is not working in the way described above.
Zdenko so 2. 3. 2024 o 10:45 Roman Seidel <roman.seide...@gmail.com> napísal(a): > Yes, sure, the input file is a snippet with a capital letter followed by 9 > digits. The correct user pattern, corresponding to [1] is: > > ``\A\d\d\d\d\d\d\d\d\d`` > > The result of Tesseract (psm 8) is fully correct. Nevertheless, user > patterns is not working in the way described above. > > For instance, I have tried to extract only the capital character with user > patterns (not with whitelist), which is: > > \A > > In this case, the capital letter and all digits are given back by > tesseract. > > I've attached my input file and the corresponding Python snippet for > reading and proessing the image with tesserocr from [2] > > > [1] > https://github.com/tesseract-ocr/tesseract/blob/main/src/dict/trie.h#L197 > [2] https://github.com/sirfz/tesserocr > > > > Am Fr., 1. März 2024 um 18:59 Uhr schrieb René JM Clais < > reneclai...@gmail.com>: > >> Can you send an example of an input document and the output of tesseract >> as well of what should be your expectation using the pattern file. >> >> Le jeu. 29 févr. 2024 à 21:40, Roman Seidel <roman.seide...@gmail.com> a >> écrit : >> >>> Hi all, >>> >>> I am currently try to use user-patterns on the PyTessBaseAPI from >>> tesserocr [1]. >>> >>> What I've done is to initialize the API with: >>> >>> with PyTessBaseAPI(path='/usr/share/tesseract-ocr/4.00/tessdata', lang= >>> LANGUAGE, psm=int(psm), oem=int(TOEM)) as api: >>> >>> setting the user patterns file with: >>> >>> api.SetVariable('user_patterns_file', >>> '/home/roman/Dev_d/playground/user_patterns/deu.patterns') >>> >>> Where the user patterns file contains a pattern, e.g.: >>> >>> \A\A\A >>> >>> (which means three characters in capital letters. >>> >>> >>> The result, independently ,whether I use the user_patterns_file argument >>> or not, are the same. This brings me to the question if tesserocr supports >>> user (and word) patterns? >>> >>> My versions: >>> >>> tesserocr 2.6.2 >>> tesseract 5.3.3 >>> leptonica-1.83.1 >>> libpng 1.6.34 : zlib 1.2.11 >>> >>> Thanks a lot for your help and best wishes, >>> Roman >>> >>> >>> >>> >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-ocr+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/767cc60f-5325-43d7-a6ef-9cf879f82950n%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/767cc60f-5325-43d7-a6ef-9cf879f82950n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "tesseract-ocr" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/tesseract-ocr/MMtdkQu3vSM/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> tesseract-ocr+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAPJAo_ok%2BQec6cJ1fxfb5NOqLVr8MAovZMNdXT-N3QS3di%2B%3Dng%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAPJAo_ok%2BQec6cJ1fxfb5NOqLVr8MAovZMNdXT-N3QS3di%2B%3Dng%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAL%3DSc5v%3DLm8Bf_5qE2yaFGb7sY99%3DLceSWTqEk8DMMR_GYWjeg%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAL%3DSc5v%3DLm8Bf_5qE2yaFGb7sY99%3DLceSWTqEk8DMMR_GYWjeg%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xiLJ4ud%2B3hH1Jp0F-9z5ep_NwLyUUtwbcqreGbA81JTg%40mail.gmail.com.