[
https://issues.apache.org/jira/browse/TIKA-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028805#comment-18028805
]
Hudson commented on TIKA-4511:
------------------------------
SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk17 #950 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk17/950/])
TIKA-4511 -- detected compressed bmp (#2361) (github:
[https://github.com/apache/tika/commit/616c35fdbcb0303852237c5f2dbf493d0f34f287])
* (edit) CHANGES.txt
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
> Detect compressed bmp
> ---------------------
>
> Key: TIKA-4511
> URL: https://issues.apache.org/jira/browse/TIKA-4511
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Trivial
> Fix For: 4.0.0, 3.3.0
>
>
> tesseract at least as recently as 5.4.1 with leptonica-1.82.0 cannot process
> compressed bmp. See: [https://github.com/tesseract-ocr/tesseract/issues/2558]
> For OCR to work on these images, we'd have to use imagemagick or something
> similar to convert these to uncompressed bmp. As a first step, we'd need to
> detect compressed bmp vs uncompressed.
> This ticket focuses solely on detection.
> It looks like if the byte at 30 is non-zero, then we have a compressed bmp:
> [https://en.wikipedia.org/wiki/BMP_file_format]
>
> I propose: {{image/bmp;format=compressed}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)