[ https://issues.apache.org/jira/browse/TIKA-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690573#comment-17690573 ]
Hudson commented on TIKA-3976: ------------------------------ SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk8 #1028 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/1028/]) TIKA-3976 (#972) (github: [https://github.com/apache/tika/commit/e48b10fe917b47cb7660227e558c8be4e15a84dd]) * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/AutoDetectParserConfigTest.java * (edit) CHANGES.txt * (edit) tika-core/src/main/java/org/apache/tika/parser/AutoDetectParser.java * (edit) tika-core/src/main/java/org/apache/tika/parser/AutoDetectParserConfig.java * (edit) tika-core/src/test/java/org/apache/tika/TikaTest.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/resources/configs/tika-config-digests.xml > Allow users to configure behavior for zero-byte files > ----------------------------------------------------- > > Key: TIKA-3976 > URL: https://issues.apache.org/jira/browse/TIKA-3976 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Minor > Fix For: 2.7.1 > > > We currently throw a ZeroByteFileException whenever the stream is empty in > AutoDetectParser. > I _think_ the reason we did this was for use cases in search systems, where > it would be exceptional to send in a zero-byte file. > For other use cases, though, especially for embedded files, it is kind of > normal to have zero-byte contents but have meaningful metadata. > So, embedded files generally are one place (as in .ppt, etc.), but WARC > redirects and HTTPResponse files would be other types of containers that may > include meaningful metadata in the embedded file, but the embedded file has a > zero-byte stream. -- This message was sent by Atlassian Jira (v8.20.10#820010)