Roman created TIKA-4270:
---------------------------

             Summary: wrong skew angle in tika-parser-ocr-module
                 Key: TIKA-4270
                 URL: https://issues.apache.org/jira/browse/TIKA-4270
             Project: Tika
          Issue Type: Bug
    Affects Versions: 2.9.1
            Reporter: Roman
         Attachments: for_issue

We use tika to extract text from different sources, including images with text 
that is rotated at a certain angle. To extract text from image with ocr, tika 
first deskew image. The skew angle is not calculated correctly. In example 
[^for_issue] , the text is rotated at an angle of ~40 degrees. But the skew 
angle function (org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle) 
returns an angle of about 15. The slope angle calculation flag is enabled.

The documentation 
(https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation)
 does not have sufficient information for this version of tika, there is a todo 
box and some relevant information for tika 1 (requires python and its 
libraries, but in the version of tika we use, angle calculations are 
implemented only using java)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to