[jira] [Commented] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012756#comment-13012756 ] Nick Burch commented on TIKA-623: - Details on the licenses that are allowed to be used are at: http://www.apache.org/legal/resolved.html >From looking at their homepage, writing a tika parser shouldn't be too hard - >you'd likely want to crib off one of the other container based parsers to see >how to have each part processed for you by the appropriate tika parsers. > Add support for Outlook PST > --- > > Key: TIKA-623 > URL: https://issues.apache.org/jira/browse/TIKA-623 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Tran Nam Quang > > Hello everyone, > As you might know, Outlook stores its mails and other stuff in a single PST > file. There's a relatively new Java library called java-libpst for reading > Outlook PST files. It is licensed under the LGPL and available over here: > http://code.google.com/p/java-libpst/ > I have tested the library on Outlook 2000 and Outlook 2003, with good > results. It would be great if the library could be integrated into Tika. > Best regards > Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012744#comment-13012744 ] Tran Nam Quang edited comment on TIKA-623 at 3/29/11 10:17 PM: --- What licenses would permit inclusion in Tika, other than the Apache License 2.0? I could ask the author to change the library's license or to switch to dual-licensing... The basic parser is already listed as an example on the front page of the java-libpst website, by the way. was (Author: qforce): What license is required for inclusion in Tika, other than the Apache License 2.0? I could ask the author to change the license or switch to dual-licensing... The basic parser is already listed as an example on the front page of the java-libpst website, by the way. > Add support for Outlook PST > --- > > Key: TIKA-623 > URL: https://issues.apache.org/jira/browse/TIKA-623 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Tran Nam Quang > > Hello everyone, > As you might know, Outlook stores its mails and other stuff in a single PST > file. There's a relatively new Java library called java-libpst for reading > Outlook PST files. It is licensed under the LGPL and available over here: > http://code.google.com/p/java-libpst/ > I have tested the library on Outlook 2000 and Outlook 2003, with good > results. It would be great if the library could be integrated into Tika. > Best regards > Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012746#comment-13012746 ] Uwe Schindler commented on TIKA-623: >From looking at the code of this library, it looks that it needs some >improvements/fixes: - It catches all exceptions and instead of simply wrap'n'rethrow or declare the checked exceptions in the methods, it prints the stack trace to System.out. Also messages are printed to System.out. - The RTF compression decoder uses new String(byte[]) without charset -> locale dependent! Other places do this, too. This is broken, as the file format should define the charset. > Add support for Outlook PST > --- > > Key: TIKA-623 > URL: https://issues.apache.org/jira/browse/TIKA-623 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Tran Nam Quang > > Hello everyone, > As you might know, Outlook stores its mails and other stuff in a single PST > file. There's a relatively new Java library called java-libpst for reading > Outlook PST files. It is licensed under the LGPL and available over here: > http://code.google.com/p/java-libpst/ > I have tested the library on Outlook 2000 and Outlook 2003, with good > results. It would be great if the library could be integrated into Tika. > Best regards > Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012744#comment-13012744 ] Tran Nam Quang commented on TIKA-623: - What license is required for inclusion in Tika, other than the Apache License 2.0? I could ask the author to change the license or switch to dual-licensing... The basic parser is already listed as an example on the front page of the java-libpst website, by the way. > Add support for Outlook PST > --- > > Key: TIKA-623 > URL: https://issues.apache.org/jira/browse/TIKA-623 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Tran Nam Quang > > Hello everyone, > As you might know, Outlook stores its mails and other stuff in a single PST > file. There's a relatively new Java library called java-libpst for reading > Outlook PST files. It is licensed under the LGPL and available over here: > http://code.google.com/p/java-libpst/ > I have tested the library on Outlook 2000 and Outlook 2003, with good > results. It would be great if the library could be integrated into Tika. > Best regards > Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012730#comment-13012730 ] Nick Burch commented on TIKA-623: - If it's LGPL then we can't include it in Tika as standard However, it is possible to have the parser dynamically loaded if a user chooses to download the parser + dependent files (if the license works for them) If you're interested in pst support, then I'd suggest you try to knock up a basic parser using libpst. If you do get it working, please list it on the wiki: http://wiki.apache.org/tika/3rd%20party%20parser%20plugins If you need help with developing the plugin, please ask on the dev list. You might also be interested in looking at the relatively small patch that was all that was required to enable JTNEF (GPL) to be used as a Tika plugin: https://github.com/jukka/jtnef/commit/a9a51982165101c0bdda4cb5266d7f8958c271ef > Add support for Outlook PST > --- > > Key: TIKA-623 > URL: https://issues.apache.org/jira/browse/TIKA-623 > Project: Tika > Issue Type: New Feature > Components: parser >Reporter: Tran Nam Quang > > Hello everyone, > As you might know, Outlook stores its mails and other stuff in a single PST > file. There's a relatively new Java library called java-libpst for reading > Outlook PST files. It is licensed under the LGPL and available over here: > http://code.google.com/p/java-libpst/ > I have tested the library on Outlook 2000 and Outlook 2003, with good > results. It would be great if the library could be integrated into Tika. > Best regards > Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (TIKA-623) Add support for Outlook PST
Add support for Outlook PST --- Key: TIKA-623 URL: https://issues.apache.org/jira/browse/TIKA-623 Project: Tika Issue Type: New Feature Components: parser Reporter: Tran Nam Quang Hello everyone, As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/ I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika. Best regards Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira