[ https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519802#comment-17519802 ]
Hudson commented on TIKA-3711: ------------------------------ SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #512 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/512/]) TIKA-3711 -- allow configuration of EmbeddedDocumentExtractors via tika-config.xml (tallison: [https://github.com/apache/tika/commit/ccc7bd841e097c3aa6d0c7c8494ddc5fa7596619]) * (edit) tika-core/src/main/java/org/apache/tika/extractor/ParsingEmbeddedDocumentExtractor.java * (edit) tika-core/src/main/java/org/apache/tika/parser/AutoDetectParserConfig.java * (add) tika-core/src/main/java/org/apache/tika/extractor/ParsingEmbeddedDocumentExtractorFactory.java * (edit) tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java * (edit) tika-core/src/main/java/org/apache/tika/extractor/EmbeddedDocumentUtil.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/configs/tika-config-with-names.xml * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/configs/tika-config-no-names.xml * (add) tika-core/src/main/java/org/apache/tika/extractor/EmbeddedDocumentExtractorFactory.java TIKA-3711 -- allow configuration of EmbeddedDocumentExtractors via tika-config.xml -- review and correct places where outputHtml should be false. (tallison: [https://github.com/apache/tika/commit/6552b076f0b4987423710b72b8917150422ea112]) * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ExcelExtractor.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/pst/OutlookPSTParser.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/pkg/ZipParserTest.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/onenote/OneNoteTreeWalker.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/xml/WordMLParser.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/AbstractPOIFSExtractor.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/JackcessExtractor.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/microsoft/XML2003ParserTest.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/ImageGraphicsEngine.java * (edit) CHANGES.txt * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java > Image file names included in parsed Word Document text > ------------------------------------------------------ > > Key: TIKA-3711 > URL: https://issues.apache.org/jira/browse/TIKA-3711 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 2.3.0 > Reporter: Sam Stephens > Priority: Major > Fix For: 2.4.0 > > Attachments: word-doc-with-image-from-word-365.docx, > word-doc-with-image.docx > > > The attached Word document includes nothing but a single image. Running it > through the Tika 2.2.0 AutoDetectParser correctly returns null. Running it > through the Tika 2.3.0 AutoDetectParser returns the text: > {{image1.png}} > -- This message was sent by Atlassian Jira (v8.20.1#820001)