[jira] [Commented] (TIKA-1860) Tika 2.0 - Create Module OSGi implementations to replace tika-bundle

2016-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170399#comment-15170399
 ] 

Hudson commented on TIKA-1860:
--

UNSTABLE: Integrated in tika-2.x #34 (See 
[https://builds.apache.org/job/tika-2.x/34/])
TIKA-1860 - Added Bundle config to advanced, cad, code, crypto (bob: rev 
8a5923dd6a42f4b4c09ec186ef357f446e9ae599)
* 
tika-parser-modules/tika-parser-crypto-module/src/test/java/org/apache/tika/module/BundleIT.java
* 
tika-parser-modules/tika-parser-advanced-module/src/test/java/org/apache/tika/module/BundleIT.java
* 
tika-parser-modules/tika-parser-cad-module/src/test/java/org/apache/tika/module/BundleIT.java
* tika-parser-modules/tika-parser-crypto-module/pom.xml
* tika-parser-modules/tika-parser-multimedia-module/pom.xml
* 
tika-parser-modules/tika-parser-advanced-module/src/main/java/org/apache/tika/module/advanced/internal/Activator.java
* tika-parser-modules/tika-parser-code-module/pom.xml
* 
tika-parser-modules/tika-parser-cad-module/src/main/java/org/apache/tika/module/cad/internal/Activator.java
* tika-parser-modules/tika-parser-advanced-module/pom.xml
* 
tika-parser-modules/tika-parser-code-module/src/test/java/org/apache/tika/module/BundleIT.java
* 
tika-parser-modules/tika-parser-crypto-module/src/main/java/org/apache/tika/module/crypto/internal/Activator.java
* tika-parser-modules/tika-parser-cad-module/pom.xml
* tika-parser-modules/pom.xml
* 
tika-parser-modules/tika-parser-code-module/src/main/java/org/apache/tika/module/code/internal/Activator.java


> Tika 2.0 - Create Module OSGi implementations to replace tika-bundle
> 
>
> Key: TIKA-1860
> URL: https://issues.apache.org/jira/browse/TIKA-1860
> Project: Tika
>  Issue Type: Sub-task
>Reporter: Bob Paulin
>Assignee: Bob Paulin
>
> Create a replacement for the OSGi tika-bundle project out of the new 
> tika-parser-* modules



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Luis Filipe Nassif (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170086#comment-15170086
 ] 

Luis Filipe Nassif edited comment on TIKA-1865 at 2/26/16 11:31 PM:


I do not know if including the email into MESSAGE_TO will break backwards 
compatibility, because currently when there is no nickname, the email already 
goes there. The docs say nothing about the expected value and at least the 
RFC822Parser and MboxParser already put both name and email into that key. So, 
I think putting the email info into MESSAGE_(TO/CC/BCC) of MSG files will make 
things more consistent across parsers, that is why I suggested putting both 
values into those keys.


was (Author: lfcnassif):
I do not know if including the email into MESSAGE_TO will break backwards 
compatibility, because currently when there is no nickname, the email already 
goes there. The docs say nothing about the expected value and at least the 
RFC822Parser and MboxParser already put both name and email into that key. So, 
I think putting the email info into MESSAGE_(TO/CC/BCC) of MSG files will make 
things more consistent across parsers.

> Save sender email address in Outlook MSG metadata
> -
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
>Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. 
> Currently only sender name is extracted. That is an important information to 
> be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Luis Filipe Nassif (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170086#comment-15170086
 ] 

Luis Filipe Nassif commented on TIKA-1865:
--

I do not know if including the email into MESSAGE_TO will break backwards 
compatibility, because currently when there is no nickname, the email already 
goes there. The docs say nothing about the expected value and at least the 
RFC822Parser and MboxParser already put both name and email into that key. So, 
I think putting the email info into MESSAGE_(TO/CC/BCC) of MSG files will make 
things more consistent across parsers.

> Save sender email address in Outlook MSG metadata
> -
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
>Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. 
> Currently only sender name is extracted. That is an important information to 
> be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Luis Filipe Nassif (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170035#comment-15170035
 ] 

Luis Filipe Nassif commented on TIKA-1865:
--

Does Outlook display the sender's name or email for testMSG_chinese.msg? I 
think all msg files should keep the sender's email somewhere, not necessarily 
in header_from. It looks like POI must be patched for a complete solution, as 
Nick said. And I do not know anything about POI source code, unfortunately...

> Save sender email address in Outlook MSG metadata
> -
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
>Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. 
> Currently only sender name is extracted. That is an important information to 
> be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Jeremy B. Merrill (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169968#comment-15169968
 ] 

Jeremy B. Merrill commented on TIKA-1865:
-

My heart wants to say yes, but my calendar says no. :) Or at least not with any 
time super soon.

You're right that this is a ticket that's interesting to me, though. I did just 
get my own dump of real-life .msg files (not shareable, unfortunately) and I've 
noticed how senders' email addresses seem to get lost, which is a pain... Is 
this just a feature that is not yet implemented? Or is there an underlying 
reason why?

(Funnily enough, it matches the behavior of Outlook printouts, which gives you 
only the sender's alias, not their address -- including, most annoyingly for 
me, in the dumps of Hillary Clinton's emails that the State Dept. has been 
releasing.) 

Do we know if all the various email formats include the sender's email address, 
so it'd be theoretically accessible to Tika somehow? What even are all the 
formats for emails that Tika handles? Outlook (PST/MSG), .eml/rfc822, mbox, 
anything else?

> Save sender email address in Outlook MSG metadata
> -
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
>Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. 
> Currently only sender name is extracted. That is an important information to 
> be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything

2016-02-26 Thread Prasad Nagaraj Subramanya (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Nagaraj Subramanya updated TIKA-1877:

Attachment: 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984
tika-mimetypes.xml

Attached the changed tika-mimetypes.xml and .fts file

> On updating the tika-mimetypes.xml to detect .fts file format, tika detector 
> does not return anything
> -
>
> Key: TIKA-1877
> URL: https://issues.apache.org/jira/browse/TIKA-1877
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Reporter: Prasad Nagaraj Subramanya
>Priority: Minor
> Attachments: 
> 4E8D6B46E2366D7063DE3926AF0F976A0DCCD57A7E3B53B7D54768F16DD23984, 
> tika-mimetypes.xml
>
>
> The match value for .fts file format in tika-mimetypes.xml is "SIMPLE  =  
>   T".
> Tika detected a .fts file as application/octet-stream. On verifying the 
> header I found the value to be "SIMPLE  =T"(just 16 spaces 
> before = and T)
> I tried the following changes-
> Change 1) Updated the existing match value. But the build failed 
> Change 2) Added a new match value  type="string" offset="0"/> after the existing one.
> But now, tika returns empty value. It neither identifies the file as .fts nor 
> as application/octet-stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1877) On updating the tika-mimetypes.xml to detect .fts file format, tika detector does not return anything

2016-02-26 Thread Prasad Nagaraj Subramanya (JIRA)
Prasad Nagaraj Subramanya created TIKA-1877:
---

 Summary: On updating the tika-mimetypes.xml to detect .fts file 
format, tika detector does not return anything
 Key: TIKA-1877
 URL: https://issues.apache.org/jira/browse/TIKA-1877
 Project: Tika
  Issue Type: Bug
  Components: mime
Reporter: Prasad Nagaraj Subramanya
Priority: Minor


The match value for .fts file format in tika-mimetypes.xml is "SIMPLE  =
T".
Tika detected a .fts file as application/octet-stream. On verifying the header 
I found the value to be "SIMPLE  =T"(just 16 spaces before = 
and T)

I tried the following changes-
Change 1) Updated the existing match value. But the build failed 

Change 2) Added a new match value  after the existing one.
But now, tika returns empty value. It neither identifies the file as .fts nor 
as application/octet-stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1857) Enhance PDFParser to extract text from XFA forms

2016-02-26 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169395#comment-15169395
 ] 

Tim Allison commented on TIKA-1857:
---

I implemented a first attempt XFA scraper with StAX; this pulls the content 
from the fields that Pascal identified into the ContentHhandler, and it merges 
the "values" from the data section with the fields section.

Currently, if XFA exists, I process that and skip the AcroForm data.  

I'm not certain what the best path is for ignoring/processing content extracted 
from the "regular" PDF if there is XFA data.

For now, I'm also processing the contents of the rest of the PDF. I'm more 
averse to losing data than to duplication because my main use case is 
search...but I realize this will be really frustrating to users who want "just 
one copy" of the content.

In looking at the pdfs with xfa data in govdocs1, it looks like there would be 
lost content in  _some_ files if we processed only the XFA and did not do the 
regular text extraction.  On the other hand, for most of the files I examined, 
it looked like the content is entirely duplicative -- [~pascal.essiembre]'s 
point above.

I propose adding a parameter to the PDFParserConfig along the lines of 
{{ifXFAExistsProcessItAlone}}...this would allow the behavior of Pascal's 
patch.  I propose that the default be set to "false", erring on the side of 
extracting more content at the cost of duplication.

Is this ok?  Or, is there an easy way to determine if regular content is 
entirely duplicative of XFA content?



> Enhance PDFParser to extract text from XFA forms
> 
>
> Key: TIKA-1857
> URL: https://issues.apache.org/jira/browse/TIKA-1857
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Reporter: Pascal Essiembre
>Priority: Trivial
>  Labels: patch
> Fix For: 1.13
>
> Attachments: 041617_filled_out.pdf, govdocs1_xfas.zip, 
> xfa_in_govdocs1.txt
>
>
> Extract text from PDF Forms (XFA).  Information about XFA: 
> https://en.wikipedia.org/wiki/XFA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Commented] (TIKA-1875) Updating tika-mimetypes.xml to detect .NC files

2016-02-26 Thread Prasad N S
Hi Nick,

I have opened a pull request for the issue -
https://github.com/apache/tika/pull/78


Thanks,
Prasad

On Fri, Feb 26, 2016 at 2:47 AM, Nick Burch (JIRA)  wrote:

>
> [
> https://issues.apache.org/jira/browse/TIKA-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168810#comment-15168810
> ]
>
> Nick Burch commented on TIKA-1875:
> --
>
> No patch of pull request was attached. You can attach a patch using "More"
> then "Attach Files", otherwise if you use github you can share the patch
> through opening a pull request
>
> > Updating tika-mimetypes.xml to detect .NC files
> > 
> >
> > Key: TIKA-1875
> > URL: https://issues.apache.org/jira/browse/TIKA-1875
> > Project: Tika
> >  Issue Type: Improvement
> >  Components: mime
> >Affects Versions: 1.12
> >Reporter: Prasad Nagaraj Subramanya
> >Priority: Minor
> >  Labels: patch
> > Fix For: 1.11
> >
> >
> > Adding magic number to detect .NC files
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>


[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169009#comment-15169009
 ] 

Tim Allison commented on TIKA-1865:
---

Completely agree on all counts.  Did not mean to suggest breaking backwards 
compat!

And, y, this will require mods to mbox, etc.  Thank you!

bq. find a suitable metadata scheme

Any recommendations?

bq.  add additional keys that hold the email addresses and the names in a way 
that they can be helpfully associated together?

Until TIKA-1607 is solved, perhaps parallel arrays for something like these 
metadata keys: "MESSAGE_TO_EMAIL", "MESSAGE_TO_NAME"?


> Save sender email address in Outlook MSG metadata
> -
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
>Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. 
> Currently only sender name is extracted. That is an important information to 
> be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169005#comment-15169005
 ] 

Nick Burch commented on TIKA-1865:
--

Whatever we do, matching changes should be made to the other Email file format 
parsers to keep things consistent

I'm not sure we should be changing the existing keys to suddenly hold different 
values, that'll break backwards compatibility and likely confuse existing users

Maybe we should find a suitable metadata scheme for this, and add additional 
keys that hold the email addresses and the names in a way that they can be 
helpfully associated together?

> Save sender email address in Outlook MSG metadata
> -
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
>Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. 
> Currently only sender name is extracted. That is an important information to 
> be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168996#comment-15168996
 ] 

Tim Allison commented on TIKA-1865:
---

[~jeremybmerrill], any interest in this?  Want to contribute?

> Save sender email address in Outlook MSG metadata
> -
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
>Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. 
> Currently only sender name is extracted. That is an important information to 
> be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168945#comment-15168945
 ] 

Tim Allison edited comment on TIKA-1865 at 2/26/16 1:17 PM:


With the handful of MSG files in our "test-documents", I get this:

{noformat}
test-outlook2003.msg
emailFromChunk:olt...@microsoft.com
header_from:null

testMSG.msg
emailFromChunk:jukka.zitt...@gmail.com
header_from:From: Jukka Zitting 

testMSG_att_doc.msg
emailFromChunk:nicolas1.23...@free.fr
header_from:null

testMSG_att_msg.msg
emailFromChunk:/O=PHILLIPS ORMONDE AND FITZPATRICK/OU=EXCHANGE ADMINISTRATIVE 
GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=NICK.BOOTH
header_from:From: Nick Booth 

testMSG_chinese.msg
emailFromChunk:/O=FT GROUP/OU=FT/CN=RECIPIENTS/CN=LYDIACHANG
header_from:null

testMSG_forwarded.msg
emailFromChunk:/O=OEXCH018/OU=EXCHANGE ADMINISTRATIVE GROUP 
(FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=PAUL_METAJURE
header_from:From: Paul Allan Hill 
{noformat}

Perhaps a strategy of try emailFromChunk and then back off to a regex on the 
header {{From}} if that's there?  That would get a "regular" email address from 
the above except for {{testMSG_chinese.msg}}.  Or, is the exchange info useful  
to you if that's all we can get, as well?





was (Author: talli...@mitre.org):
With the handful of MSG files in our "test-documents", I get this:

{noformat}
test-outlook2003.msg : olt...@microsoft.com
testMSG.msg : jukka.zitt...@gmail.com
testMSG_att_doc.msg : nicolas1.23...@free.fr
testMSG_att_msg.msg : /O=PHILLIPS ORMONDE AND FITZPATRICK/OU=EXCHANGE 
ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=NICK.BOOTH
testMSG_chinese.msg : /O=FT GROUP/OU=FT/CN=RECIPIENTS/CN=LYDIACHANG
testMSG_forwarded.msg : /O=OEXCH018/OU=EXCHANGE ADMINISTRATIVE GROUP 
(FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=PAUL_METAJURE
{noformat}




> Save sender email address in Outlook MSG metadata
> -
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
>Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. 
> Currently only sender name is extracted. That is an important information to 
> be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168951#comment-15168951
 ] 

Tim Allison commented on TIKA-1865:
---

And if you are interested in working on a patch for this, we now have ~3800 msg 
files that I pulled with [~centic]'s CommonCrawlDocumentDownload tool...in 
addition to what we had in our slice of CommonCrawl and govdocs1.

> Save sender email address in Outlook MSG metadata
> -
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
>Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. 
> Currently only sender name is extracted. That is an important information to 
> be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168946#comment-15168946
 ] 

Tim Allison commented on TIKA-1865:
---

Yes and yes...any interest in submitting a patch?


If you're interested in this info, you might also be interested TIKA-1759, a 
low priority for me at the time, but that could change if there was interest 
from the community.

> Save sender email address in Outlook MSG metadata
> -
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
>Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. 
> Currently only sender name is extracted. That is an important information to 
> be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-26 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168949#comment-15168949
 ] 

Tim Allison commented on TIKA-1865:
---

Thank you, Nick.

> Save sender email address in Outlook MSG metadata
> -
>
> Key: TIKA-1865
> URL: https://issues.apache.org/jira/browse/TIKA-1865
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.11
> Environment: Windows 7 x64, jre 1.8.0_60 x64
>Reporter: Luis Filipe Nassif
>
> Sender email address is lost when extracting metadata from Outlook msg files. 
> Currently only sender name is extracted. That is an important information to 
> be extracted for search engines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)