Re: [tesseract-ocr] tesseract is reading passport mrz text from image incorrectly, its identifying <<<<<<<< as kkkk or cccc

Zdenko Podobny Sat, 27 Jan 2024 02:26:35 -0800

What about reading docs and a little bit googling?

tesseract two-page-passport-mrz-detected.jpeg - --psm 6 -l mrz


IDAUT10000999<6<<<<<<<<<<<<<<<
7109094F1112315AUT<<<<<<<<<<<6
MUSTERFRAU<<ISOLDE<<<<<<<<<<<<


Zdenko


so 27. 1. 2024 o 11:19 sara waheed <sarawaheed3...@gmail.com> napísal(a):

> I am trying to read the passport mrz string from the image i am using
> Tesseract and OpenCV for image processing i have tried three different ways
>  none of them worked
>
> **Attempt 1**
> I have this image  when i do ocr on it teseract read as
>
>     IDAUT10000999<6<<<<<<<<<<<<<<<
>     7109094F1112315AUT<<<<<<xcc<<6
>     MUSTERFRAU<<ISOLDE<<<<<<<<cc<<
>
> which is incorrect it treats <<< as x or c or k when I use the `mrz-java`
> library to read the details from the string it gives the following error
>
>     [error] Error parsing MRZ string: Failed to parse MRZ MRTD_TD1
> IDAUT10000999<6<<<<<<<<<<<<<<<
>     [error] 7109094F1112315AUT<<<<<<xcc<<6
>     [error] MUSTERFRAU<<ISOLDE<<<<<<<<cc<<
>     [error]  at 24-25,1: Invalid character in MRZ record: x
>
> **Attempt 2**
>
> then I converted the image to grayscale and binarized it using `OpenCV`
> Here is the below code
>
>         val roiImagePath =
> "src/main/resources/ocr/passport/two-page-passport-mrz-detected.jpeg"
>
>         val grayScaleROI = new Mat()
>           val roiImage = Imgcodecs.imread(roiImagePath)
>           Imgproc.cvtColor(roiImage, grayScaleROI, Imgproc.COLOR_BGR2GRAY)
>           val roiGaryImagePath =
> "src/main/resources/ocr/passport/two-page-passport-mrz-detected-gray.jpeg"
>
>           Imgcodecs.imwrite(roiGaryImagePath, grayScaleROI)
>           val binary = new Mat()
>           Imgproc.adaptiveThreshold(grayScaleROI, binary, 255,
> Imgproc.ADAPTIVE_THRESH_MEAN_C, Imgproc.THRESH_BINARY , 15, 25)
>           val roiBinaryImagePath =
> "src/main/resources/ocr/passport/two-page-passport-mrz-detected-binary.jpeg"
>           Imgcodecs.imwrite(roiBinaryImagePath, binary)
>
>      val tesseract = new Tesseract()
>       tesseract.setDatapath("/usr/share/tesseract-ocr/4.00/tessdata")
>       tesseract.setVariable("user_defined_dpi", "600")
>       val result = tesseract.doOCR(new File(roiBinaryImagePath))
>       val mrzStr = result.replace(" ", "")
>       println(s"two page passport mrz string is: "+mrzStr)
>
> it created the following binary image
>
> and the code output is
> tesseract reads mrz string from the binary image as
>
>     IDAUT1DODD999<E<KK<KKKKEKEKEK
>     7AD9D9GF1TEZSISAUTKKKKKKKKKEKG
>     MUSTERFRAUSKISOLDEKKKKKKKKKKK
> and `mrz-java` reads the string and generates the following error
>
>     [error] Error parsing MRZ string: Failed to parse MRZ null
> IDAUT1DODD999<E<KK<KKKKEKEKEK
>     [error] 7AD9D9GF1TEZSISAUTKKKKKKKKKEKG
>     [error] MUSTERFRAUSKISOLDEKKKKKKKKKKK
>     [error]  at 0-0,0: Different row lengths: 0: 29 and 1: 30
>
> **Attempt 3**
>
> then I resized the image
>
>     Val width = 1000 // Increase width proportionately (adjust based on
> your needs)
>       val height = (width * binary.rows()) / binary.cols() // Maintain
> aspect ratio
>
>       val resizedRoiImage = new Mat()
>       Imgproc.resize(binary, resizedRoiImage, new Size(width, height),
> 0.0, 0.0, Imgproc.INTER_NEAREST)
>
>       val resizedImageROIPath =
>  
> "src/main/resources/ocr/passport/two-page-passport-mrz-detected-binary-resized_image.jpg"
>       Imgcodecs.imwrite(resizedImageROIPath, resizedRoiImage)
>
> mrz string read by Tesseract
>
>     TOAUTIOOOOIISKhcceccccddddddce
>     FIOPOSAFIFESSISAUTReececeececs
>     MUSTERFRAUCCKISOLDECKccccdcddd
>
> and the error is
>
>     [info] 15:54:04.200 633 [main] MrzParser INFO - Check digit
> verification failed for document number: expected 0 but got h
>     [error] Error parsing MRZ string: Failed to parse MRZ MRTD_TD1
> TOAUTIOOOOIISKhcceccccddddddce
>     [error] FIOPOSAFIFESSISAUTReececeececs
>     [error] MUSTERFRAUCCKISOLDECKccccdcddd
>     [error]  at 15-16,0: Invalid character in MRZ record: c
>
>
> can anyone please help how I read the text properly also I have tried one
> regex to convert c or k back to <<< it did not work either if anyone can
> suggest some workaround or any improvement in code please help me with that
> thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/440788ab-1d76-4612-a4b5-a1a4c2cd09a5n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/440788ab-1d76-4612-a4b5-a1a4c2cd09a5n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xbT8jWSOveXeSRCHE_Vr%2Bx%3DoXo0k4yuqtL_MUH%2BN6rRA%40mail.gmail.com.

Re: [tesseract-ocr] tesseract is reading passport mrz text from image incorrectly, its identifying <<<<<<<< as kkkk or cccc

Reply via email to