[jira] [Commented] (TIKA-3571) Add an interface for rendering engines

Tim Allison (Jira) Thu, 28 Apr 2022 07:30:05 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529465#comment-17529465
 ]


Tim Allison commented on TIKA-3571:
-----------------------------------

The other thing we need to account for is multiple renderings per page.  I'd 
rather not add this complexity from the beginning, but the API should be able 
to handle this.

> Add an interface for rendering engines
> --------------------------------------
>
>                 Key: TIKA-3571
>                 URL: https://issues.apache.org/jira/browse/TIKA-3571
>             Project: Tika
>          Issue Type: Wish
>            Reporter: Tim Allison
>            Priority: Major
>
> We've now seen a few requests for extracting text _and_ rendering PDFs, and 
> certainly it might be useful to have alternatives for rendering files (e.g. 
> this [Alfresco 
> study|https://hub.alfresco.com/t5/alfresco-content-services-blog/pdf-rendering-engine-performance-and-fidelity-comparison/ba-p/287618]),
>  including MSOffice or at least PPTx...
> And there are cases where users don't want the rendered images, but they do 
> want OCR to be run against the rendered images.
> I doubt I'll have a chance to work on this for a while, but I wanted to open 
> an issue for discussion.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (TIKA-3571) Add an interface for rendering engines

Reply via email to