[
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920692#comment-13920692
]
Hong-Thai Nguyen edited comment on TIKA-623 at 3/5/14 9:30 AM:
---
java-libpst-0.7 has been uploaded to oss sonatype nexus:
https://issues.sonatype.org/browse/OSSRH-8965
If there's no objection, I'll refactory attached parser and provide output as:
{code}
html xmlns=http://www.w3.org/1999/xhtml;
head
meta name=Content-Length content=271360 /
meta name=isValid content=true /
meta name=Content-Type content=application/vnd.ms-outlook /
title/title
/head
body
div class=email-folder
h1Début du fichier de données Outlook/h1
div class=email-entry
h1lt;530d9cac.5080...@gmail.comgt;/h1
meta subject=Re: Feature Generators /
meta
internetMessageId=lt;530d9cac.5080...@gmail.comgt; /
meta descriptorNodeId=2097188 /
meta lastModificationTime=1393418263291 /
meta senderName=Jörn Kottmann /
meta senderEmailAddress=kottm...@gmail.com /
meta recipients=No recipients table! /
pmail content/p
/div
div class=email-folder
h1Éléments supprimés/h1
/div
/div
div class=email-folder
h1Racine (pour la recherche)/h1
/div
div class=email-folder
h1SPAM Search Folder 2/h1
/div
/body
/html
{code}
was (Author: thaichat04):
java-libpst-0.7 has been uploaded to oss sonatype nexus. If there's no
objection, I'll refactory attached parser and provide output as:
{code}
html xmlns=http://www.w3.org/1999/xhtml;
head
meta name=Content-Length content=271360 /
meta name=isValid content=true /
meta name=Content-Type content=application/vnd.ms-outlook /
title/title
/head
body
div class=email-folder
h1Début du fichier de données Outlook/h1
div class=email-entry
h1lt;530d9cac.5080...@gmail.comgt;/h1
meta subject=Re: Feature Generators /
meta
internetMessageId=lt;530d9cac.5080...@gmail.comgt; /
meta descriptorNodeId=2097188 /
meta lastModificationTime=1393418263291 /
meta senderName=Jörn Kottmann /
meta senderEmailAddress=kottm...@gmail.com /
meta recipients=No recipients table! /
pmail content/p
/div
div class=email-folder
h1Éléments supprimés/h1
/div
/div
div class=email-folder
h1Racine (pour la recherche)/h1
/div
div class=email-folder
h1SPAM Search Folder 2/h1
/div
/body
/html
{code}
Add support for Outlook PST
---
Key: TIKA-623
URL: https://issues.apache.org/jira/browse/TIKA-623
Project: Tika
Issue Type: New Feature
Components: parser
Reporter: Tran Nam Quang
Assignee: Hong-Thai Nguyen
Fix For: 1.6
Attachments: OutlookPSTParser.java
Hello everyone,
As you might know, Outlook stores its mails and other stuff in a single PST
file. There's a relatively new Java library called java-libpst for reading
Outlook PST files. It is licensed under the LGPL and available over here:
http://code.google.com/p/java-libpst/
I have tested the library on Outlook 2000 and Outlook 2003, with good
results. It would be great if the library could be integrated into Tika.
Best regards
Tran Nam Quang
--
This message was sent by Atlassian JIRA
(v6.2#6252)