[ https://issues.apache.org/jira/browse/TIKA-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558431#comment-16558431 ]
Ross Johnson edited comment on TIKA-2694 at 7/26/18 3:31 PM: ------------------------------------------------------------- Just adding some extra info. I checked the attached .msg file, and indeed the sender MAPI properties only contain the x500 sender address: {code:java} PidTagSenderName (0x0C1A) String (0x001F) "Berger, Eric" PidTagSenderAddressType (0x0C1E) String (0x001F) "EX" PidTagSenderEmailAddress (0x0C1F) String (0x001F) "/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER" {code} However, the normal email addresses are present within the PidTagTransportMessageHeaders property. {code:java} From: "Berger, Eric" <eric_ber...@spe.sony.com> {code} It may be possible to use the information from PidTagTransportMessageHeaders as a backup or alternative, but in my experience, resolving the header information with the MAPI properties is a bit of a rabbit hole. Care must be taken to match up the "From:" and "Sender:" headers with PidTagSender and PidTagSentRepresenting properties which aren't 1:1, and furthermore there may be multiple "From:" addresses whereas the MAPI properties will just store one of them. I've also seen MSG files where the stored headers seem totally unrelated to the stored MAPI properties, although this is (hopefully) a very rare occurrence. was (Author: rossj): Just adding some extra info. I checked the attached .msg file, and indeed the sender MAPI properties only contain the x500 sender address: {code:java} PidTagSenderName (0x0C1A) String (0x001F) "Berger, Eric" PidTagSenderAddressType (0x0C1E) String (0x001F) "EX" PidTagSenderEmailAddress (0x0C1F) String (0x001F) "/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER" {code} However, the normal email addresses are present within the PidTagTransportMessageHeaders property. {code:java} From: "Berger, Eric" <eric_ber...@spe.sony.com> {code} It may be possible to use the information from PidTagTransportMessageHeaders as a backup or alternative, but in my experience, resolving the header information with the MAPI properties is a bit of a rabbit hole. Care must be taken to match up the "From:" and "Sender:" headers with PidTagSender and PidTagSentRepresenting properties which aren't 1:1, and furthermore there may be multiple "From:" addresses whereas the MAPI properties will just store one of them. I've also seen MSG files where the stored headers seem totally unrelated to the stored MAPI properties, although this is (hopefully) a very rare occurrence. > "From" headers is not always extracted correctly on msg mails > ------------------------------------------------------------- > > Key: TIKA-2694 > URL: https://issues.apache.org/jira/browse/TIKA-2694 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.17 > Environment: CentOS 7 > Windows 10 > Reporter: Celpan Valeria > Priority: Major > Attachments: Fw Anime User Analysis.msg > > > For some emails we get instead of the email address for "From" field a value > which looks like `/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP > (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER`. > The issue seems to be connected to the library > `org.apache.poi:poi-scratchpad:3.17` as when running > `org.apache.tika.parser.microsoft.OutlookExtractor::OutlookExtractor(DirectoryNode, > ParserContext)` we get `this.msg.mainChunks.allChunks.SenderEmailAddress = > "/O=SONY/OU=EXCHANGE ADMINISTRATIVE GROUP > (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=EBERGER"`. > Check attachment to reproduce this defect. -- This message was sent by Atlassian JIRA (v7.6.3#76005)