Roman created TIKA-4270: --------------------------- Summary: wrong skew angle in tika-parser-ocr-module Key: TIKA-4270 URL: https://issues.apache.org/jira/browse/TIKA-4270 Project: Tika Issue Type: Bug Affects Versions: 2.9.1 Reporter: Roman Attachments: for_issue
We use tika to extract text from different sources, including images with text that is rotated at a certain angle. To extract text from image with ocr, tika first deskew image. The skew angle is not calculated correctly. In example [^for_issue] , the text is rotated at an angle of ~40 degrees. But the skew angle function (org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle) returns an angle of about 15. The slope angle calculation flag is enabled. The documentation (https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation) does not have sufficient information for this version of tika, there is a todo box and some relevant information for tika 1 (requires python and its libraries, but in the version of tika we use, angle calculations are implemented only using java) -- This message was sent by Atlassian Jira (v8.20.10#820010)