What about reading docs and a little bit googling? tesseract two-page-passport-mrz-detected.jpeg - --psm 6 -l mrz
IDAUT10000999<6<<<<<<<<<<<<<<< 7109094F1112315AUT<<<<<<<<<<<6 MUSTERFRAU<<ISOLDE<<<<<<<<<<<< Zdenko so 27. 1. 2024 o 11:19 sara waheed <sarawaheed3...@gmail.com> napĂsal(a): > I am trying to read the passport mrz string from the image i am using > Tesseract and OpenCV for image processing i have tried three different ways > none of them worked > > **Attempt 1** > I have this image when i do ocr on it teseract read as > > IDAUT10000999<6<<<<<<<<<<<<<<< > 7109094F1112315AUT<<<<<<xcc<<6 > MUSTERFRAU<<ISOLDE<<<<<<<<cc<< > > which is incorrect it treats <<< as x or c or k when I use the `mrz-java` > library to read the details from the string it gives the following error > > [error] Error parsing MRZ string: Failed to parse MRZ MRTD_TD1 > IDAUT10000999<6<<<<<<<<<<<<<<< > [error] 7109094F1112315AUT<<<<<<xcc<<6 > [error] MUSTERFRAU<<ISOLDE<<<<<<<<cc<< > [error] at 24-25,1: Invalid character in MRZ record: x > > **Attempt 2** > > then I converted the image to grayscale and binarized it using `OpenCV` > Here is the below code > > val roiImagePath = > "src/main/resources/ocr/passport/two-page-passport-mrz-detected.jpeg" > > val grayScaleROI = new Mat() > val roiImage = Imgcodecs.imread(roiImagePath) > Imgproc.cvtColor(roiImage, grayScaleROI, Imgproc.COLOR_BGR2GRAY) > val roiGaryImagePath = > "src/main/resources/ocr/passport/two-page-passport-mrz-detected-gray.jpeg" > > Imgcodecs.imwrite(roiGaryImagePath, grayScaleROI) > val binary = new Mat() > Imgproc.adaptiveThreshold(grayScaleROI, binary, 255, > Imgproc.ADAPTIVE_THRESH_MEAN_C, Imgproc.THRESH_BINARY , 15, 25) > val roiBinaryImagePath = > "src/main/resources/ocr/passport/two-page-passport-mrz-detected-binary.jpeg" > Imgcodecs.imwrite(roiBinaryImagePath, binary) > > val tesseract = new Tesseract() > tesseract.setDatapath("/usr/share/tesseract-ocr/4.00/tessdata") > tesseract.setVariable("user_defined_dpi", "600") > val result = tesseract.doOCR(new File(roiBinaryImagePath)) > val mrzStr = result.replace(" ", "") > println(s"two page passport mrz string is: "+mrzStr) > > it created the following binary image > > and the code output is > tesseract reads mrz string from the binary image as > > IDAUT1DODD999<E<KK<KKKKEKEKEK > 7AD9D9GF1TEZSISAUTKKKKKKKKKEKG > MUSTERFRAUSKISOLDEKKKKKKKKKKK > and `mrz-java` reads the string and generates the following error > > [error] Error parsing MRZ string: Failed to parse MRZ null > IDAUT1DODD999<E<KK<KKKKEKEKEK > [error] 7AD9D9GF1TEZSISAUTKKKKKKKKKEKG > [error] MUSTERFRAUSKISOLDEKKKKKKKKKKK > [error] at 0-0,0: Different row lengths: 0: 29 and 1: 30 > > **Attempt 3** > > then I resized the image > > Val width = 1000 // Increase width proportionately (adjust based on > your needs) > val height = (width * binary.rows()) / binary.cols() // Maintain > aspect ratio > > val resizedRoiImage = new Mat() > Imgproc.resize(binary, resizedRoiImage, new Size(width, height), > 0.0, 0.0, Imgproc.INTER_NEAREST) > > val resizedImageROIPath = > > "src/main/resources/ocr/passport/two-page-passport-mrz-detected-binary-resized_image.jpg" > Imgcodecs.imwrite(resizedImageROIPath, resizedRoiImage) > > mrz string read by Tesseract > > TOAUTIOOOOIISKhcceccccddddddce > FIOPOSAFIFESSISAUTReececeececs > MUSTERFRAUCCKISOLDECKccccdcddd > > and the error is > > [info] 15:54:04.200 633 [main] MrzParser INFO - Check digit > verification failed for document number: expected 0 but got h > [error] Error parsing MRZ string: Failed to parse MRZ MRTD_TD1 > TOAUTIOOOOIISKhcceccccddddddce > [error] FIOPOSAFIFESSISAUTReececeececs > [error] MUSTERFRAUCCKISOLDECKccccdcddd > [error] at 15-16,0: Invalid character in MRZ record: c > > > can anyone please help how I read the text properly also I have tried one > regex to convert c or k back to <<< it did not work either if anyone can > suggest some workaround or any improvement in code please help me with that > thanks > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/440788ab-1d76-4612-a4b5-a1a4c2cd09a5n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/440788ab-1d76-4612-a4b5-a1a4c2cd09a5n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xbT8jWSOveXeSRCHE_Vr%2Bx%3DoXo0k4yuqtL_MUH%2BN6rRA%40mail.gmail.com.