[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920692#comment-13920692 ]
Hong-Thai Nguyen edited comment on TIKA-623 at 3/5/14 9:30 AM: --------------------------------------------------------------- java-libpst-0.7 has been uploaded to oss sonatype nexus: https://issues.sonatype.org/browse/OSSRH-8965 If there's no objection, I'll refactory attached parser and provide output as: {code} <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="Content-Length" content="271360" /> <meta name="isValid" content="true" /> <meta name="Content-Type" content="application/vnd.ms-outlook" /> <title></title> </head> <body> <div class="email-folder"> <h1>Début du fichier de données Outlook</h1> <div class="email-entry"> <h1><530d9cac.5080...@gmail.com></h1> <meta subject="Re: Feature Generators" /> <meta internetMessageId="<530d9cac.5080...@gmail.com>" /> <meta descriptorNodeId="2097188" /> <meta lastModificationTime="1393418263291" /> <meta senderName="Jörn Kottmann" /> <meta senderEmailAddress="kottm...@gmail.com" /> <meta recipients="No recipients table!" /> <p>mail content</p> </div> <div class="email-folder"> <h1>Éléments supprimés</h1> </div> </div> <div class="email-folder"> <h1>Racine (pour la recherche)</h1> </div> <div class="email-folder"> <h1>SPAM Search Folder 2</h1> </div> </body> </html> {code} was (Author: thaichat04): java-libpst-0.7 has been uploaded to oss sonatype nexus. If there's no objection, I'll refactory attached parser and provide output as: {code} <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="Content-Length" content="271360" /> <meta name="isValid" content="true" /> <meta name="Content-Type" content="application/vnd.ms-outlook" /> <title></title> </head> <body> <div class="email-folder"> <h1>Début du fichier de données Outlook</h1> <div class="email-entry"> <h1><530d9cac.5080...@gmail.com></h1> <meta subject="Re: Feature Generators" /> <meta internetMessageId="<530d9cac.5080...@gmail.com>" /> <meta descriptorNodeId="2097188" /> <meta lastModificationTime="1393418263291" /> <meta senderName="Jörn Kottmann" /> <meta senderEmailAddress="kottm...@gmail.com" /> <meta recipients="No recipients table!" /> <p>mail content</p> </div> <div class="email-folder"> <h1>Éléments supprimés</h1> </div> </div> <div class="email-folder"> <h1>Racine (pour la recherche)</h1> </div> <div class="email-folder"> <h1>SPAM Search Folder 2</h1> </div> </body> </html> {code} > Add support for Outlook PST > --------------------------- > > Key: TIKA-623 > URL: https://issues.apache.org/jira/browse/TIKA-623 > Project: Tika > Issue Type: New Feature > Components: parser > Reporter: Tran Nam Quang > Assignee: Hong-Thai Nguyen > Fix For: 1.6 > > Attachments: OutlookPSTParser.java > > > Hello everyone, > As you might know, Outlook stores its mails and other stuff in a single PST > file. There's a relatively new Java library called java-libpst for reading > Outlook PST files. It is licensed under the LGPL and available over here: > http://code.google.com/p/java-libpst/ > I have tested the library on Outlook 2000 and Outlook 2003, with good > results. It would be great if the library could be integrated into Tika. > Best regards > Tran Nam Quang -- This message was sent by Atlassian JIRA (v6.2#6252)