[ 
https://issues.apache.org/jira/browse/TIKA-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842471#comment-17842471
 ] 

Hudson commented on TIKA-4248:
------------------------------

SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk11 #1617 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1617/])
TIKA-4248 -- improve handling of attachments in PST (#1738) (github: 
[https://github.com/apache/tika/commit/de282d2861009895eecdb07784dceb5d777f372a])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/java/org/apache/tika/parser/microsoft/pst/OutlookPSTParserTest.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-module/src/main/java/org/apache/tika/parser/html/JSoupParser.java
* (edit) tika-core/src/main/java/org/apache/tika/metadata/Office.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/pst/OutlookPSTParser.java
* (add) tika-core/src/main/java/org/apache/tika/metadata/PST.java
* (edit) CHANGES.txt
* (add) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/pst/PSTMailItemParser.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/resources/META-INF/services/org.apache.tika.parser.Parser


> Improve PST handling of attachments
> -----------------------------------
>
>                 Key: TIKA-4248
>                 URL: https://issues.apache.org/jira/browse/TIKA-4248
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> The PST parser doesn't handle attachments in quite the same way as other 
> parsers which hinders analysis of attachments.
> The problem is that the PST parser handles the text content of an email and 
> the embedded attachments. And, the PST parser processes attachments before 
> the main body. These two features make the normal patterns for embedded 
> attachments break down in the RecursiveParserWrapper. For example, when the 
> attachments are being processed, the RecursiveParserWrapper can't figure out 
> what the path will be through the "body" because that hasn't been parsed yet.
> We should probably create a PSTMailItemParser that handles the content and 
> the attachments like other parsers so that embedded paths can be maintained.
> This will be a breaking change, and I'm targeting it only to the 3.x branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to