Andrew Skiba created TIKA-1344: ---------------------------------- Summary: Ability to generate self-contained HTML with images Key: TIKA-1344 URL: https://issues.apache.org/jira/browse/TIKA-1344 Project: Tika Issue Type: Improvement Components: parser Reporter: Andrew Skiba
n the current code, the images from Word documents are referenced by "embedded:xxx" links in the generated HTML. This causes the browsers display "x" icon instead of the image. The proposed patch encodes the images using Data URI, if there is -Dtika.parsers.urlimages system property. http://en.wikipedia.org/wiki/Data_URI_scheme So the default behavior is the same, but users of the library can optionally generate self-contained HTML with correct images. -- This message was sent by Atlassian JIRA (v6.2#6252)