https://issues.apache.org/bugzilla/show_bug.cgi?id=53784

          Priority: P2
            Bug ID: 53784
          Assignee: [email protected]
           Summary: POI-3.8 HSMF fails to extract dates from certain
                    Outlook 2007 messages (.msg)
          Severity: normal
    Classification: Unclassified
          Reporter: [email protected]
          Hardware: Macintosh
            Status: NEW
           Version: 3.8
         Component: HSMF
           Product: POI

Created attachment 29287
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=29287&action=edit
Two sample Outlook 2007 messages, one whose date POI HSMF fails to recognize,
and one whose date POI HSMF successfully recognizes

Overview
--------
POI HSMF fails to recognize the dates in some of my Outlook 2007 (.msg) files.  

Steps to Reproduce
------------------
I have two reproducible tests, and I've attached a zip file containing test
.msg files that illustrate my results.  When I use the first file
(test-message-poi-succeeds.msg), POI HSMF succeeds.  When I use the second file
(test-message-poi-fails.msg), POI HSMF fails.  Here are the two tests: 

1) The first test is to run the .msg files through Tika 1.2 (which uses POI
HSMF to parse Outlook files), using the following command:

  java -jar tika-app-1.2.jar -m <filename>

Tika succeeds to find the date on the first message, returning these headers --
9 of which contain dates, and the key field being "date: 2012-06-22T18:32:54Z":

Author: Ashley, Carl E (PACE)
Content-Length: 35840
Content-Type: application/vnd.ms-outlook
Creation-Date: 2012-06-22T18:32:54Z
Last-Modified: 2012-06-22T18:32:54Z
Last-Save-Date: 2012-06-22T18:32:54Z
Message-Bcc: 
Message-Cc: PA History Mailbox
Message-From: Ashley, Carl E (PACE)
Message-Recipient-Address: [email protected]
Message-To: '[email protected]'
creator: Ashley, Carl E (PACE)
date: 2012-06-22T18:32:54Z
dc:creator: Ashley, Carl E (PACE)
dc:description: HAC Annual Report
dc:title: HAC Annual Report
dcterms:created: 2012-06-22T18:32:54Z
dcterms:modified: 2012-06-22T18:32:54Z
meta:author: Ashley, Carl E (PACE)
meta:creation-date: 2012-06-22T18:32:54Z
meta:save-date: 2012-06-22T18:32:54Z
modified: 2012-06-22T18:32:54Z
resourceName: test-message-poi-succeeds.msg
subject: HAC Annual Report
title: HAC Annual Report

Tika fails on the second message, returning no date fields:

Author: PA History Mailbox
Content-Length: 29696
Content-Type: application/vnd.ms-outlook
Message-Bcc: 
Message-Cc: 
Message-From: PA History Mailbox
Message-Recipient-Address: [email protected]
Message-To: Garrett, Amy C (PACE)
creator: PA History Mailbox
dc:creator: PA History Mailbox
dc:description: Draft to La Tetra
dc:title: Draft to La Tetra
meta:author: PA History Mailbox
resourceName: test-message-poi-fails.msg
subject: Draft to La Tetra
title: Draft to La Tetra


2) The second test is to run POI HSMFDump directly on each file:

  java -classpath poi-3.8-20120326.jar:poi-scratchpad-3.8-20120326.jar
org.apache.poi.hsmf.dev.HSMFDump <filename>

When I run this command on the first file, it returns ALL of the fields above
(including the 'date' field) in the following area:

  125 - TransportMessageHeaders - Unicode String

When I run this command on the second file, it returns NO 'date' fields, and no
such '125' field.

I would appreciate anyone's assistance with this issue.

My System
---------
I am using poi-bin-3.8-20120326, tika-app-1.2, and Mac OS X 10.8.1 with Java
1.6.0_33.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to