[jira] [Updated] (TIKA-4270) wrong skew angle in tika-parser-ocr-module

2024-06-20 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated TIKA-4270:
--
Description: 
We use tika to extract text from different sources, including images with text 
that is rotated at a certain angle. To extract text from image with ocr, tika 
first deskew image. The skew angle is not calculated correctly. In example 
[^for_issue] (PNG file), the text is rotated at an angle of ~40 degrees. But 
the skew angle function 
(org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle) returns an angle 
of about 15. The slope angle calculation flag is enabled.

The documentation 
(https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation)
 does not have sufficient information for this version of tika, there is a todo 
box and some relevant information for tika 1 (requires python and its 
libraries, but in the version of tika we use, angle calculations are 
implemented only using java)

  was:
We use tika to extract text from different sources, including images with text 
that is rotated at a certain angle. To extract text from image with ocr, tika 
first deskew image. The skew angle is not calculated correctly. In example 
[^for_issue] , the text is rotated at an angle of ~40 degrees. But the skew 
angle function (org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle) 
returns an angle of about 15. The slope angle calculation flag is enabled.

The documentation 
(https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation)
 does not have sufficient information for this version of tika, there is a todo 
box and some relevant information for tika 1 (requires python and its 
libraries, but in the version of tika we use, angle calculations are 
implemented only using java)


> wrong skew angle in tika-parser-ocr-module
> --
>
> Key: TIKA-4270
> URL: https://issues.apache.org/jira/browse/TIKA-4270
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 2.9.1
>Reporter: Roman
>Priority: Major
> Attachments: for_issue
>
>
> We use tika to extract text from different sources, including images with 
> text that is rotated at a certain angle. To extract text from image with ocr, 
> tika first deskew image. The skew angle is not calculated correctly. In 
> example [^for_issue] (PNG file), the text is rotated at an angle of ~40 
> degrees. But the skew angle function 
> (org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle) returns an angle 
> of about 15. The slope angle calculation flag is enabled.
> The documentation 
> (https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation)
>  does not have sufficient information for this version of tika, there is a 
> todo box and some relevant information for tika 1 (requires python and its 
> libraries, but in the version of tika we use, angle calculations are 
> implemented only using java)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (TIKA-4270) wrong skew angle in tika-parser-ocr-module

2024-06-20 Thread Roman (Jira)
Roman created TIKA-4270:
---

 Summary: wrong skew angle in tika-parser-ocr-module
 Key: TIKA-4270
 URL: https://issues.apache.org/jira/browse/TIKA-4270
 Project: Tika
  Issue Type: Bug
Affects Versions: 2.9.1
Reporter: Roman
 Attachments: for_issue

We use tika to extract text from different sources, including images with text 
that is rotated at a certain angle. To extract text from image with ocr, tika 
first deskew image. The skew angle is not calculated correctly. In example 
[^for_issue] , the text is rotated at an angle of ~40 degrees. But the skew 
angle function (org.apache.tika.parser.ocr.tess4j.ImageDeskew#getSkewAngle) 
returns an angle of about 15. The slope angle calculation flag is enabled.

The documentation 
(https://cwiki.apache.org/confluence/display/TIKA/TikaOCR#:~:text=To%20identify%20rotation)
 does not have sufficient information for this version of tika, there is a todo 
box and some relevant information for tika 1 (requires python and its 
libraries, but in the version of tika we use, angle calculations are 
implemented only using java)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)