[ 
https://issues.apache.org/jira/browse/TIKA-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558431#comment-16558431
 ] 

Ross Johnson edited comment on TIKA-2694 at 7/26/18 3:31 PM:
-------------------------------------------------------------

Just adding some extra info. I checked the attached .msg file, and indeed the 
sender MAPI properties only contain the x500 sender address:
{code:java}
PidTagSenderName (0x0C1A) String (0x001F)
"Berger, Eric"

PidTagSenderAddressType (0x0C1E) String (0x001F)
"EX"

PidTagSenderEmailAddress (0x0C1F) String (0x001F)
"/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP 
(FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER"
{code}
 

However, the normal email addresses are present within the 
PidTagTransportMessageHeaders property. 
{code:java}
From: "Berger, Eric" <eric_ber...@spe.sony.com>
{code}
 

It may be possible to use the information from PidTagTransportMessageHeaders as 
a backup or alternative, but in my experience, resolving the header information 
with the MAPI properties is a bit of a rabbit hole. Care must be taken to match 
up the "From:" and "Sender:" headers with PidTagSender and 
PidTagSentRepresenting properties which aren't 1:1, and furthermore there may 
be multiple "From:" addresses whereas the MAPI properties will just store one 
of them. I've also seen MSG files where the stored headers seem totally 
unrelated to the stored MAPI properties, although this is (hopefully) a very 
rare occurrence.


was (Author: rossj):
Just adding some extra info. I checked the attached .msg file, and indeed the 
sender MAPI properties only contain the x500 sender address:
{code:java}
PidTagSenderName (0x0C1A) String (0x001F)
"Berger, Eric"

PidTagSenderAddressType (0x0C1E) String (0x001F)
"EX"

PidTagSenderEmailAddress (0x0C1F) String (0x001F)
"/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP 
(FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER"
{code}
 

 

However, the normal email addresses are present within the 
PidTagTransportMessageHeaders property. 

 
{code:java}
From: "Berger, Eric" <eric_ber...@spe.sony.com>
{code}
 

 

It may be possible to use the information from PidTagTransportMessageHeaders as 
a backup or alternative, but in my experience, resolving the header information 
with the MAPI properties is a bit of a rabbit hole. Care must be taken to match 
up the "From:" and "Sender:" headers with PidTagSender and 
PidTagSentRepresenting properties which aren't 1:1, and furthermore there may 
be multiple "From:" addresses whereas the MAPI properties will just store one 
of them. I've also seen MSG files where the stored headers seem totally 
unrelated to the stored MAPI properties, although this is (hopefully) a very 
rare occurrence.

> "From" headers is not always extracted correctly on msg mails
> -------------------------------------------------------------
>
>                 Key: TIKA-2694
>                 URL: https://issues.apache.org/jira/browse/TIKA-2694
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.17
>         Environment: CentOS 7
> Windows 10
>            Reporter: Celpan Valeria
>            Priority: Major
>         Attachments: Fw Anime User Analysis.msg
>
>
> For some emails we get instead of the email address for "From" field a value 
> which looks like `/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP 
> (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER`.
>  The issue seems to be connected to the library 
> `org.apache.poi:poi-scratchpad:3.17` as when running   
> `org.apache.tika.parser.microsoft.OutlookExtractor::OutlookExtractor(DirectoryNode,
>  ParserContext)` we get `this.msg.mainChunks.allChunks.SenderEmailAddress = 
> "/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP 
> (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER"`.
>  Check attachment to reproduce this defect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to