[ 
https://issues.apache.org/jira/browse/TIKA-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842432#comment-17842432
 ] 

ASF GitHub Bot commented on TIKA-4248:
--------------------------------------

tballison merged PR #1738:
URL: https://github.com/apache/tika/pull/1738




> Improve PST handling of attachments
> -----------------------------------
>
>                 Key: TIKA-4248
>                 URL: https://issues.apache.org/jira/browse/TIKA-4248
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> The PST parser doesn't handle attachments in quite the same way as other 
> parsers which hinders analysis of attachments.
> The problem is that the PST parser handles the text content of an email and 
> the embedded attachments. And, the PST parser processes attachments before 
> the main body. These two features make the normal patterns for embedded 
> attachments break down in the RecursiveParserWrapper. For example, when the 
> attachments are being processed, the RecursiveParserWrapper can't figure out 
> what the path will be through the "body" because that hasn't been parsed yet.
> We should probably create a PSTMailItemParser that handles the content and 
> the attachments like other parsers so that embedded paths can be maintained.
> This will be a breaking change, and I'm targeting it only to the 3.x branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to