[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890989#comment-15890989 ] Tim Allison commented on TIKA-1865: --- With the most recent commit, I think I made the equivalent changes in PSTParser, mbox parser and the RFCParser. If anyone has recommendations for better clarity among the metadata keys or any other improvements, please reopen. > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > Attachments: report.xlsx > > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890800#comment-15890800 ] Matthew Caruana Galizia commented on TIKA-1865: --- Thank you, this is a big improvement. > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > Attachments: report.xlsx > > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890789#comment-15890789 ] Tim Allison commented on TIKA-1865: --- [~lfcnassif] and [~jeremybmerrill], if you have a chance, please take a look at the modifications and see what you think. > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > Attachments: report.xlsx > > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890786#comment-15890786 ] Tim Allison commented on TIKA-1865: --- [~mcaruanagalizia], I've added quite a few more Metadata keys under Office and Message for the sender, and I've updated the MSG parser. I still need to update the other message parsers. I'm not thrilled with putting the MAPI specific metadata items in the Office object...perhaps a separate class to handle them?, and I don't like the divide between MAPI and Message, but there really are some things that are specific to MAPI but don't apply to RFC. I added individual keys for the components of exchange addresses {{"/o=blah/ou=blah/cn=recipients/cn=actual name"}}. Let me know what you think. As a side note, we just switched from Apache's git to GitHub. We haven't re-calibrated Jenkins so there isn't a nightly build yet. You'll have to grab from GitHub and build yourself for now. > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > Attachments: report.xlsx > > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172685#comment-15172685 ] Tim Allison commented on TIKA-1865: --- Y, that's my guess exactly. If anyone has actual knowledge or has found something else, that'd be great information. The specs I read were very helpful on some things, not as helpful on this. > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > Attachments: report.xlsx > > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172633#comment-15172633 ] Tim Allison commented on TIKA-1865: --- Y, that's my guess exactly. If anyone has actual knowledge or has found something else, that'd be great information. The specs I read were very helpful on some things, not as helpful on this. > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > Attachments: report.xlsx > > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172614#comment-15172614 ] Luis Filipe Nassif commented on TIKA-1865: -- Great research and testing [~talli...@apache.org]! Basead on the results, I think your proposal is the best that can be done. Maybe the sender's email is stored only with Exchange and pulled on demand by clients using the exchange id... > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > Attachments: report.xlsx > > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170601#comment-15170601 ] Tim Allison commented on TIKA-1865: --- http://download.microsoft.com/download/5/D/D/5DD33FDF-91F5-496D-9884-0A0B0EE698BB/[MS-OXMSG].pdf If anyone has time and the inclination... > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170599#comment-15170599 ] Tim Allison commented on TIKA-1865: --- Outlook shows part of a name, but no address. Couldn't see address w hex editor. POI has a really useful msg dumper to display chunks...next step... > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170086#comment-15170086 ] Luis Filipe Nassif commented on TIKA-1865: -- I do not know if including the email into MESSAGE_TO will break backwards compatibility, because currently when there is no nickname, the email already goes there. The docs say nothing about the expected value and at least the RFC822Parser and MboxParser already put both name and email into that key. So, I think putting the email info into MESSAGE_(TO/CC/BCC) of MSG files will make things more consistent across parsers. > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170035#comment-15170035 ] Luis Filipe Nassif commented on TIKA-1865: -- Does Outlook display the sender's name or email for testMSG_chinese.msg? I think all msg files should keep the sender's email somewhere, not necessarily in header_from. It looks like POI must be patched for a complete solution, as Nick said. And I do not know anything about POI source code, unfortunately... > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169968#comment-15169968 ] Jeremy B. Merrill commented on TIKA-1865: - My heart wants to say yes, but my calendar says no. :) Or at least not with any time super soon. You're right that this is a ticket that's interesting to me, though. I did just get my own dump of real-life .msg files (not shareable, unfortunately) and I've noticed how senders' email addresses seem to get lost, which is a pain... Is this just a feature that is not yet implemented? Or is there an underlying reason why? (Funnily enough, it matches the behavior of Outlook printouts, which gives you only the sender's alias, not their address -- including, most annoyingly for me, in the dumps of Hillary Clinton's emails that the State Dept. has been releasing.) Do we know if all the various email formats include the sender's email address, so it'd be theoretically accessible to Tika somehow? What even are all the formats for emails that Tika handles? Outlook (PST/MSG), .eml/rfc822, mbox, anything else? > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169009#comment-15169009 ] Tim Allison commented on TIKA-1865: --- Completely agree on all counts. Did not mean to suggest breaking backwards compat! And, y, this will require mods to mbox, etc. Thank you! bq. find a suitable metadata scheme Any recommendations? bq. add additional keys that hold the email addresses and the names in a way that they can be helpfully associated together? Until TIKA-1607 is solved, perhaps parallel arrays for something like these metadata keys: "MESSAGE_TO_EMAIL", "MESSAGE_TO_NAME"? > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169005#comment-15169005 ] Nick Burch commented on TIKA-1865: -- Whatever we do, matching changes should be made to the other Email file format parsers to keep things consistent I'm not sure we should be changing the existing keys to suddenly hold different values, that'll break backwards compatibility and likely confuse existing users Maybe we should find a suitable metadata scheme for this, and add additional keys that hold the email addresses and the names in a way that they can be helpfully associated together? > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168996#comment-15168996 ] Tim Allison commented on TIKA-1865: --- [~jeremybmerrill], any interest in this? Want to contribute? > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168951#comment-15168951 ] Tim Allison commented on TIKA-1865: --- And if you are interested in working on a patch for this, we now have ~3800 msg files that I pulled with [~centic]'s CommonCrawlDocumentDownload tool...in addition to what we had in our slice of CommonCrawl and govdocs1. > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168946#comment-15168946 ] Tim Allison commented on TIKA-1865: --- Yes and yes...any interest in submitting a patch? If you're interested in this info, you might also be interested TIKA-1759, a low priority for me at the time, but that could change if there was interest from the community. > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168949#comment-15168949 ] Tim Allison commented on TIKA-1865: --- Thank you, Nick. > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168945#comment-15168945 ] Tim Allison commented on TIKA-1865: --- With the handful of MSG files in our "test-documents", I get this: {noformat} test-outlook2003.msg : olt...@microsoft.com testMSG.msg : jukka.zitt...@gmail.com testMSG_att_doc.msg : nicolas1.23...@free.fr testMSG_att_msg.msg : /O=PHILLIPS ORMONDE AND FITZPATRICK/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=NICK.BOOTH testMSG_chinese.msg : /O=FT GROUP/OU=FT/CN=RECIPIENTS/CN=LYDIACHANG testMSG_forwarded.msg : /O=OEXCH018/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=PAUL_METAJURE {noformat} > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168291#comment-15168291 ] Luis Filipe Nassif commented on TIKA-1865: -- Also, what do you think about including in MESSAGE_TO, MESSAGE_CC and MESSAGE_BCC metadata the recipient names AND their email addresses, so users could know the recipient type (to, cc, bcc) of each email? It is not possible with current approach, including all recipient adresses together in MESSAGE_RECIPIENT_ADDRESS. > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167988#comment-15167988 ] Luis Filipe Nassif commented on TIKA-1865: -- Hi [~talli...@apache.org]! I think MAPIMessage.getMainChunks().emailFromChunk already have that info, or not for all cases? It worked with my small corpus. > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167231#comment-15167231 ] Nick Burch commented on TIKA-1865: -- IIRC it needs the "fixed length properties" support to be completed to be able to get out > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167211#comment-15167211 ] Tim Allison commented on TIKA-1865: --- Good to hear from you, [~lfcnassif]! I've only looked at this very briefly, but it looks like POI does not currently make the sender email address available. I think the best next step would be to figure out how to modify POI to make this info available. Any interest in looking into this? I did see that the email address exists _sometimes_ in the header {{From:}}, and we could pull it out via regex, but several of our test MSG files clearly have the sender email in the bytes but have no headers. > Save sender email address in Outlook MSG metadata > - > > Key: TIKA-1865 > URL: https://issues.apache.org/jira/browse/TIKA-1865 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.11 > Environment: Windows 7 x64, jre 1.8.0_60 x64 >Reporter: Luis Filipe Nassif > > Sender email address is lost when extracting metadata from Outlook msg files. > Currently only sender name is extracted. That is an important information to > be extracted for search engines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)