[
https://issues.apache.org/jira/browse/TIKA-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison reassigned TIKA-3157:
---------------------------------
Assignee: Tim Allison
> Missing content from .docx file with hyperlinked shape
> ------------------------------------------------------
>
> Key: TIKA-3157
> URL: https://issues.apache.org/jira/browse/TIKA-3157
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.24.1
> Reporter: Robert Kaulbach
> Assignee: Tim Allison
> Priority: Minor
>
> The attached .docx file was created in MS Office, simply drew a rectangle and
> then added a hyperlink to it. While the hyperlink doesn't show inside
> LibreOffice, it's still there and clickable when opened with MS Office.
> When parsing with Tika, the hyperlink attached to the shape is nowhere to be
> found in the output. Enabling all Office/OOXML parse options in the context
> has not helped.
>
> When debugging, I can see the "a:hlinkClick" tag with the link inside is
> being skipped at
> org/apache/tika/parser/microsoft/ooxml/OOXMLWordAndPowerPointTextHandler.java
> in the StartElement method, because "inACChoiceDepth" is greater than 0.
> And then the fallback tag, which separately has the link inside a "v:rect"
> tag, doesn't seem to get processed and doesn't save the link content.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)