Tim Allison created TIKA-1376:
---------------------------------
Summary: Improve embedded file name extraction in PDFParser
Key: TIKA-1376
URL: https://issues.apache.org/jira/browse/TIKA-1376
Project: Tika
Issue Type: Improvement
Components: parser
Reporter: Tim Allison
Assignee: Tim Allison
Priority: Trivial
Fix For: 1.6
When we extract embedded files from PDFs, we are currently using the key in the
PDEmbeddedFilesNameTreeNode as the file name that we store as the value of
Metadata.RESOURCE_NAME_KEY in the embedded document's metadata.
I think we should try to get the file name from PDComplexFileSpecification's
getFilename() first. If that is null, then we should fall back to the key
value.
--
This message was sent by Atlassian JIRA
(v6.2#6252)