[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

Hudson (JIRA) Fri, 03 May 2019 09:39:56 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832646#comment-16832646
 ]


Hudson commented on TIKA-2749:
------------------------------

UNSTABLE: Integrated in Jenkins build tika-2.x-windows #409 (See 
[https://builds.apache.org/job/tika-2.x-windows/409/])
TIKA-2749 -- add initial, optional "AUTO" mode for OCR'ing of PDF pages 
(tallison: rev f72841353c30ba0ece3bdd40570ccdb03c3f8994)
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java
* (edit) CHANGES.txt
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java


> OCR on PDFs should "just work" out of the box
> ---------------------------------------------
>
>                 Key: TIKA-2749
>                 URL: https://issues.apache.org/jira/browse/TIKA-2749
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: 06.Qui peut réduire vos amendes TVA bis (1).pdf
>
>
> There are now two different ways (with various parameters) to trigger OCR on 
> inline images within PDFs.  The user has to 1) understand that these are 
> available and then 2) elect to turn one of those on.
> I think we should make OCR'ing on PDFs "just work" perhaps with a hybrid 
> strategy between the 2 options.  Users should still be allowed to configure 
> as they wish, of course. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

Reply via email to