[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168945#comment-15168945 ]
Tim Allison edited comment on TIKA-1865 at 2/26/16 1:17 PM: ------------------------------------------------------------ With the handful of MSG files in our "test-documents", I get this: {noformat} test-outlook2003.msg emailFromChunk:olt...@microsoft.com header_from:null testMSG.msg emailFromChunk:jukka.zitt...@gmail.com header_from:From: Jukka Zitting <jukka.zitt...@gmail.com> testMSG_att_doc.msg emailFromChunk:nicolas1.23...@free.fr header_from:null testMSG_att_msg.msg emailFromChunk:/O=PHILLIPS ORMONDE AND FITZPATRICK/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=NICK.BOOTH header_from:From: Nick Booth <nick.bo...@pof.com.au> testMSG_chinese.msg emailFromChunk:/O=FT GROUP/OU=FT/CN=RECIPIENTS/CN=LYDIACHANG header_from:null testMSG_forwarded.msg emailFromChunk:/O=OEXCH018/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=PAUL_METAJURE header_from:From: Paul Allan Hill <p...@metajure.com> {noformat} Perhaps a strategy of try emailFromChunk and then back off to a regex on the header {{From}} if that's there? That would get a "regular" email address from the above except for {{testMSG_chinese.msg}}. Or, is the exchange info useful to you if that's all we can get, as well? was (Author: talli...@mitre.org): With the handful of MSG files in our "test-documents", I get this: {noformat} test-outlook2003.msg : olt...@microsoft.com testMSG.msg : jukka.zitt...@gmail.com testMSG_att_doc.msg : nicolas1.23...@free.fr testMSG_att_msg.msg : /O=PHILLIPS ORMONDE AND FITZPATRICK/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=NICK.BOOTH testMSG_chinese.msg : /O=FT GROUP/OU=FT/CN=RECIPIENTS/CN=LYDIACHANG testMSG_forwarded.msg : /O=OEXCH018/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=PAUL_METAJURE {noformat} > Save sender email address in Outlook MSG metadata > ------------------------------------------------- > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 > Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)