[
https://issues.apache.org/jira/browse/TIKA-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-4270:
--
Description:
We use tika to extract text from different sources, including images with text
that is rotated at a certain angle. To extract text from image with ocr, tika
first deskew image. The skew angle is not calculated correctly. In example
[^for_issue] (PNG file), the text is rotated at an angle of ~40 degrees. But
the skew angle function
(org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle) returns an angle
of about 15. The slope angle calculation flag is enabled.
The documentation
(https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation)
does not have sufficient information for this version of tika, there is a todo
box and some relevant information for tika 1 (requires python and its
libraries, but in the version of tika we use, angle calculations are
implemented only using java)
was:
We use tika to extract text from different sources, including images with text
that is rotated at a certain angle. To extract text from image with ocr, tika
first deskew image. The skew angle is not calculated correctly. In example
[^for_issue] , the text is rotated at an angle of ~40 degrees. But the skew
angle function (org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle)
returns an angle of about 15. The slope angle calculation flag is enabled.
The documentation
(https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation)
does not have sufficient information for this version of tika, there is a todo
box and some relevant information for tika 1 (requires python and its
libraries, but in the version of tika we use, angle calculations are
implemented only using java)
> wrong skew angle in tika-parser-ocr-module
> --
>
> Key: TIKA-4270
> URL: https://issues.apache.org/jira/browse/TIKA-4270
> Project: Tika
> Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Roman
>Priority: Major
> Attachments: for_issue
>
>
> We use tika to extract text from different sources, including images with
> text that is rotated at a certain angle. To extract text from image with ocr,
> tika first deskew image. The skew angle is not calculated correctly. In
> example [^for_issue] (PNG file), the text is rotated at an angle of ~40
> degrees. But the skew angle function
> (org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle) returns an angle
> of about 15. The slope angle calculation flag is enabled.
> The documentation
> (https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation)
> does not have sufficient information for this version of tika, there is a
> todo box and some relevant information for tika 1 (requires python and its
> libraries, but in the version of tika we use, angle calculations are
> implemented only using java)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)