[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-03-29 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012756#comment-13012756
 ] 

Nick Burch commented on TIKA-623:
-

Details on the licenses that are allowed to be used are at: 
http://www.apache.org/legal/resolved.html

>From looking at their homepage, writing a tika parser shouldn't be too hard - 
>you'd likely want to crib off one of the other container based parsers to see 
>how to have each part processed for you by the appropriate tika parsers.

> Add support for Outlook PST
> ---
>
> Key: TIKA-623
> URL: https://issues.apache.org/jira/browse/TIKA-623
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Tran Nam Quang
>
> Hello everyone,
> As you might know, Outlook stores its mails and other stuff in a single PST 
> file. There's a relatively new Java library called java-libpst for reading 
> Outlook PST files. It is licensed under the LGPL and available over here: 
> http://code.google.com/p/java-libpst/
> I have tested the library on Outlook 2000 and Outlook 2003, with good 
> results. It would be great if the library could be integrated into Tika.
> Best regards
> Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Edited] (TIKA-623) Add support for Outlook PST

2011-03-29 Thread Tran Nam Quang (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012744#comment-13012744
 ] 

Tran Nam Quang edited comment on TIKA-623 at 3/29/11 10:17 PM:
---

What licenses would permit inclusion in Tika, other than the Apache License 
2.0? I could ask the author to change the library's license or to switch to 
dual-licensing...

The basic parser is already listed as an example on the front page of the 
java-libpst website, by the way.

  was (Author: qforce):
What license is required for inclusion in Tika, other than the Apache 
License 2.0? I could ask the author to change the license or switch to 
dual-licensing...

The basic parser is already listed as an example on the front page of the 
java-libpst website, by the way.
  
> Add support for Outlook PST
> ---
>
> Key: TIKA-623
> URL: https://issues.apache.org/jira/browse/TIKA-623
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Tran Nam Quang
>
> Hello everyone,
> As you might know, Outlook stores its mails and other stuff in a single PST 
> file. There's a relatively new Java library called java-libpst for reading 
> Outlook PST files. It is licensed under the LGPL and available over here: 
> http://code.google.com/p/java-libpst/
> I have tested the library on Outlook 2000 and Outlook 2003, with good 
> results. It would be great if the library could be integrated into Tika.
> Best regards
> Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-03-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012746#comment-13012746
 ] 

Uwe Schindler commented on TIKA-623:


>From looking at the code of this library, it looks that it needs some 
>improvements/fixes:
- It catches all exceptions and instead of simply wrap'n'rethrow or declare the 
checked exceptions in the methods, it prints the stack trace to System.out. 
Also messages are printed to System.out.
- The RTF compression decoder uses new String(byte[]) without charset -> locale 
dependent! Other places do this, too. This is broken, as the file format should 
define the charset.


> Add support for Outlook PST
> ---
>
> Key: TIKA-623
> URL: https://issues.apache.org/jira/browse/TIKA-623
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Tran Nam Quang
>
> Hello everyone,
> As you might know, Outlook stores its mails and other stuff in a single PST 
> file. There's a relatively new Java library called java-libpst for reading 
> Outlook PST files. It is licensed under the LGPL and available over here: 
> http://code.google.com/p/java-libpst/
> I have tested the library on Outlook 2000 and Outlook 2003, with good 
> results. It would be great if the library could be integrated into Tika.
> Best regards
> Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-03-29 Thread Tran Nam Quang (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012744#comment-13012744
 ] 

Tran Nam Quang commented on TIKA-623:
-

What license is required for inclusion in Tika, other than the Apache License 
2.0? I could ask the author to change the license or switch to dual-licensing...

The basic parser is already listed as an example on the front page of the 
java-libpst website, by the way.

> Add support for Outlook PST
> ---
>
> Key: TIKA-623
> URL: https://issues.apache.org/jira/browse/TIKA-623
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Tran Nam Quang
>
> Hello everyone,
> As you might know, Outlook stores its mails and other stuff in a single PST 
> file. There's a relatively new Java library called java-libpst for reading 
> Outlook PST files. It is licensed under the LGPL and available over here: 
> http://code.google.com/p/java-libpst/
> I have tested the library on Outlook 2000 and Outlook 2003, with good 
> results. It would be great if the library could be integrated into Tika.
> Best regards
> Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-03-29 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012730#comment-13012730
 ] 

Nick Burch commented on TIKA-623:
-

If it's LGPL then we can't include it in Tika as standard

However, it is possible to have the parser dynamically loaded if a user chooses 
to download the parser + dependent files (if the license works for them)

If you're interested in pst support, then I'd suggest you try to knock up a 
basic parser using libpst. If you do get it working, please list it on the wiki:
   http://wiki.apache.org/tika/3rd%20party%20parser%20plugins

If you need help with developing the plugin, please ask on the dev list. You 
might also be interested in looking at the relatively small patch that was all 
that was required to enable JTNEF (GPL) to be used as a Tika plugin:
   
https://github.com/jukka/jtnef/commit/a9a51982165101c0bdda4cb5266d7f8958c271ef

> Add support for Outlook PST
> ---
>
> Key: TIKA-623
> URL: https://issues.apache.org/jira/browse/TIKA-623
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Reporter: Tran Nam Quang
>
> Hello everyone,
> As you might know, Outlook stores its mails and other stuff in a single PST 
> file. There's a relatively new Java library called java-libpst for reading 
> Outlook PST files. It is licensed under the LGPL and available over here: 
> http://code.google.com/p/java-libpst/
> I have tested the library on Outlook 2000 and Outlook 2003, with good 
> results. It would be great if the library could be integrated into Tika.
> Best regards
> Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (TIKA-623) Add support for Outlook PST

2011-03-29 Thread Tran Nam Quang (JIRA)
Add support for Outlook PST
---

 Key: TIKA-623
 URL: https://issues.apache.org/jira/browse/TIKA-623
 Project: Tika
  Issue Type: New Feature
  Components: parser
Reporter: Tran Nam Quang


Hello everyone,

As you might know, Outlook stores its mails and other stuff in a single PST 
file. There's a relatively new Java library called java-libpst for reading 
Outlook PST files. It is licensed under the LGPL and available over here: 
http://code.google.com/p/java-libpst/

I have tested the library on Outlook 2000 and Outlook 2003, with good results. 
It would be great if the library could be integrated into Tika.

Best regards
Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira