[ https://issues.apache.org/jira/browse/TIKA-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842471#comment-17842471 ]
Hudson commented on TIKA-4248: ------------------------------ SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk11 #1617 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1617/]) TIKA-4248 -- improve handling of attachments in PST (#1738) (github: [https://github.com/apache/tika/commit/de282d2861009895eecdb07784dceb5d777f372a]) * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/pst/OutlookPSTParserTest.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-module/src/main/java/org/apache/tika/parser/html/JSoupParser.java * (edit) tika-core/src/main/java/org/apache/tika/metadata/Office.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/pst/OutlookPSTParser.java * (add) tika-core/src/main/java/org/apache/tika/metadata/PST.java * (edit) CHANGES.txt * (add) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/pst/PSTMailItemParser.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/resources/META-INF/services/org.apache.tika.parser.Parser > Improve PST handling of attachments > ----------------------------------- > > Key: TIKA-4248 > URL: https://issues.apache.org/jira/browse/TIKA-4248 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > > The PST parser doesn't handle attachments in quite the same way as other > parsers which hinders analysis of attachments. > The problem is that the PST parser handles the text content of an email and > the embedded attachments. And, the PST parser processes attachments before > the main body. These two features make the normal patterns for embedded > attachments break down in the RecursiveParserWrapper. For example, when the > attachments are being processed, the RecursiveParserWrapper can't figure out > what the path will be through the "body" because that hasn't been parsed yet. > We should probably create a PSTMailItemParser that handles the content and > the attachments like other parsers so that embedded paths can be maintained. > This will be a breaking change, and I'm targeting it only to the 3.x branch. -- This message was sent by Atlassian Jira (v8.20.10#820010)