[ 
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843798#comment-17843798
 ] 

Tim Allison edited comment on TIKA-4250 at 5/6/24 5:02 PM:
-----------------------------------------------------------

So, I caught an example of libpst not exporting an attachment in an msg file 
via our unit test file (testPST.pst). The attached msg should contain an 
embedded msg that includes a docx. Via a hex editor, I can see that there is no 
embedded msg in 8.msg, whereas the structure is correctly maintained in 8.eml.


was (Author: talli...@mitre.org):
So, I caught an example of libpst not reading an attachment in our unit test 
file (testPST.pst). The attached msg should contain an embedded msg that 
includes a docx. Via a hex editor, I can see that there is no embedded msg in 
8.msg, whereas the structure is correctly maintained in 8.eml.

> Add a libpst-based parser
> -------------------------
>
>                 Key: TIKA-4250
>                 URL: https://issues.apache.org/jira/browse/TIKA-4250
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: 8.eml, 8.msg
>
>
> We currently use the com.pff Java-based PST parser for PST files. It would be 
> useful to add a wrapper for libpst as an optional parser. 
> One of the benefits of libpst is that it creates .eml or .msg files from the 
> PST records. This is critical for those who want the original bytes from 
> embedded files. Obv, PST doesn't store eml or msg, but some users want the 
> "original" emails even if they are constructed from PST records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to