[ 
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169968#comment-15169968
 ] 

Jeremy B. Merrill commented on TIKA-1865:
-----------------------------------------

My heart wants to say yes, but my calendar says no. :) Or at least not with any 
time super soon.

You're right that this is a ticket that's interesting to me, though. I did just 
get my own dump of real-life .msg files (not shareable, unfortunately) and I've 
noticed how senders' email addresses seem to get lost, which is a pain... Is 
this just a feature that is not yet implemented? Or is there an underlying 
reason why?

(Funnily enough, it matches the behavior of Outlook printouts, which gives you 
only the sender's alias, not their address -- including, most annoyingly for 
me, in the dumps of Hillary Clinton's emails that the State Dept. has been 
releasing.) 

Do we know if all the various email formats include the sender's email address, 
so it'd be theoretically accessible to Tika somehow? What even are all the 
formats for emails that Tika handles? Outlook (PST/MSG), .eml/rfc822, mbox, 
anything else?

> Save sender email address in Outlook MSG metadata
> -------------------------------------------------
>
>                 Key: TIKA-1865
>                 URL: https://issues.apache.org/jira/browse/TIKA-1865
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.11
>         Environment: Windows 7 x64, jre 1.8.0_60 x64
>            Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. 
> Currently only sender name is extracted. That is an important information to 
> be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to