Tim Allison created TIKA-3571:
---------------------------------

             Summary: Add an interface for a rendering engine
                 Key: TIKA-3571
                 URL: https://issues.apache.org/jira/browse/TIKA-3571
             Project: Tika
          Issue Type: Wish
            Reporter: Tim Allison


We've now seen a few requests for extracting text _and_ rendering PDFs, and 
certainly it might be useful to have alternatives for rendering files (e.g. 
this [Alfresco 
study|https://hub.alfresco.com/t5/alfresco-content-services-blog/pdf-rendering-engine-performance-and-fidelity-comparison/ba-p/287618]),
 including MSOffice or at least PPTx...

And there are cases where users don't want the rendered images, but they do 
want OCR to be run against the images.

I doubt I'll have a chance to work on this for a while, but I wanted to open an 
issue for discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to