https://issues.apache.org/bugzilla/show_bug.cgi?id=53784
Priority: P2
Bug ID: 53784
Assignee: [email protected]
Summary: POI-3.8 HSMF fails to extract dates from certain
Outlook 2007 messages (.msg)
Severity: normal
Classification: Unclassified
Reporter: [email protected]
Hardware: Macintosh
Status: NEW
Version: 3.8
Component: HSMF
Product: POI
Created attachment 29287
--> https://issues.apache.org/bugzilla/attachment.cgi?id=29287&action=edit
Two sample Outlook 2007 messages, one whose date POI HSMF fails to recognize,
and one whose date POI HSMF successfully recognizes
Overview
--------
POI HSMF fails to recognize the dates in some of my Outlook 2007 (.msg) files.
Steps to Reproduce
------------------
I have two reproducible tests, and I've attached a zip file containing test
.msg files that illustrate my results. When I use the first file
(test-message-poi-succeeds.msg), POI HSMF succeeds. When I use the second file
(test-message-poi-fails.msg), POI HSMF fails. Here are the two tests:
1) The first test is to run the .msg files through Tika 1.2 (which uses POI
HSMF to parse Outlook files), using the following command:
java -jar tika-app-1.2.jar -m <filename>
Tika succeeds to find the date on the first message, returning these headers --
9 of which contain dates, and the key field being "date: 2012-06-22T18:32:54Z":
Author: Ashley, Carl E (PACE)
Content-Length: 35840
Content-Type: application/vnd.ms-outlook
Creation-Date: 2012-06-22T18:32:54Z
Last-Modified: 2012-06-22T18:32:54Z
Last-Save-Date: 2012-06-22T18:32:54Z
Message-Bcc:
Message-Cc: PA History Mailbox
Message-From: Ashley, Carl E (PACE)
Message-Recipient-Address: [email protected]
Message-To: '[email protected]'
creator: Ashley, Carl E (PACE)
date: 2012-06-22T18:32:54Z
dc:creator: Ashley, Carl E (PACE)
dc:description: HAC Annual Report
dc:title: HAC Annual Report
dcterms:created: 2012-06-22T18:32:54Z
dcterms:modified: 2012-06-22T18:32:54Z
meta:author: Ashley, Carl E (PACE)
meta:creation-date: 2012-06-22T18:32:54Z
meta:save-date: 2012-06-22T18:32:54Z
modified: 2012-06-22T18:32:54Z
resourceName: test-message-poi-succeeds.msg
subject: HAC Annual Report
title: HAC Annual Report
Tika fails on the second message, returning no date fields:
Author: PA History Mailbox
Content-Length: 29696
Content-Type: application/vnd.ms-outlook
Message-Bcc:
Message-Cc:
Message-From: PA History Mailbox
Message-Recipient-Address: [email protected]
Message-To: Garrett, Amy C (PACE)
creator: PA History Mailbox
dc:creator: PA History Mailbox
dc:description: Draft to La Tetra
dc:title: Draft to La Tetra
meta:author: PA History Mailbox
resourceName: test-message-poi-fails.msg
subject: Draft to La Tetra
title: Draft to La Tetra
2) The second test is to run POI HSMFDump directly on each file:
java -classpath poi-3.8-20120326.jar:poi-scratchpad-3.8-20120326.jar
org.apache.poi.hsmf.dev.HSMFDump <filename>
When I run this command on the first file, it returns ALL of the fields above
(including the 'date' field) in the following area:
125 - TransportMessageHeaders - Unicode String
When I run this command on the second file, it returns NO 'date' fields, and no
such '125' field.
I would appreciate anyone's assistance with this issue.
My System
---------
I am using poi-bin-3.8-20120326, tika-app-1.2, and Mac OS X 10.8.1 with Java
1.6.0_33.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]