[jira] [Updated] (TIKA-2068) RTFParser crashes with NullPointerException

2016-09-08 Thread Tran Nam Quang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tran Nam Quang updated TIKA-2068:
-
Description: 
The RTFParser seems to crash on RTF files containing pictures. The attached 
file produces the following stacktrace:

java.lang.NullPointerException
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174)
at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
at org.apache.tika.config.TikaConfig.(TikaConfig.java:218)
at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198)
at 
org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357)
at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456)
at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87)

  was:
The RTFParser seems to crash on RTF files containing pictures. Stacktrace:

java.lang.NullPointerException
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174)
at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
at org.apache.tika.config.TikaConfig.(TikaConfig.java:218)
at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198)
at 
org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357)
at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456)
at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87)


> RTFParser crashes with NullPointerException
> ---
>
> Key: TIKA-2068
> URL: https://issues.apache.org/jira/browse/TIKA-2068
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.13
>Reporter: Tran Nam Quang
> Attachments: Styrodur C - 2800_C.rtf
>
>
> The RTFParser seems to crash on RTF files containing pictures. The attached 
> file produces the following stacktrace:
> java.lang.NullPointerException
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
> at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:218)
> at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198)
> at 
> org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357)
> at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456)
> at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439)
> at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-2068) RTFParser crashes with NullPointerException

2016-09-08 Thread Tran Nam Quang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tran Nam Quang updated TIKA-2068:
-
Description: 
The RTFParser seems to crash on RTF files containing pictures. The attached 
file produces the following stacktrace:

java.lang.NullPointerException
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174)
at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
at org.apache.tika.config.TikaConfig.(TikaConfig.java:218)
at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198)
at 
org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357)
at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456)
at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87)

  was:
The RTFParser crashes on RTF files containing pictures. The attached file 
produces the following stacktrace:

java.lang.NullPointerException
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174)
at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
at org.apache.tika.config.TikaConfig.(TikaConfig.java:218)
at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198)
at 
org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357)
at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456)
at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87)


> RTFParser crashes with NullPointerException
> ---
>
> Key: TIKA-2068
> URL: https://issues.apache.org/jira/browse/TIKA-2068
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.13
>Reporter: Tran Nam Quang
> Attachments: Styrodur C - 2800_C.rtf
>
>
> The RTFParser seems to crash on RTF files containing pictures. The attached 
> file produces the following stacktrace:
> java.lang.NullPointerException
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
> at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:218)
> at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198)
> at 
> org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357)
> at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456)
> at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439)
> at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-2068) RTFParser crashes with NullPointerException

2016-09-08 Thread Tran Nam Quang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tran Nam Quang updated TIKA-2068:
-
Description: 
The RTFParser crashes on RTF files containing pictures. The attached file 
produces the following stacktrace:

java.lang.NullPointerException
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174)
at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
at org.apache.tika.config.TikaConfig.(TikaConfig.java:218)
at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198)
at 
org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357)
at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456)
at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87)

  was:
The RTFParser seems to crash on RTF files containing pictures. The attached 
file produces the following stacktrace:

java.lang.NullPointerException
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174)
at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
at org.apache.tika.config.TikaConfig.(TikaConfig.java:218)
at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219)
at 
org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198)
at 
org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357)
at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456)
at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87)


> RTFParser crashes with NullPointerException
> ---
>
> Key: TIKA-2068
> URL: https://issues.apache.org/jira/browse/TIKA-2068
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.13
>Reporter: Tran Nam Quang
> Attachments: Styrodur C - 2800_C.rtf
>
>
> The RTFParser crashes on RTF files containing pictures. The attached file 
> produces the following stacktrace:
> java.lang.NullPointerException
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
> at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:218)
> at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198)
> at 
> org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357)
> at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456)
> at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439)
> at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-2068) RTFParser crashes with NullPointerException

2016-09-08 Thread Tran Nam Quang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tran Nam Quang updated TIKA-2068:
-
Attachment: Styrodur C - 2800_C.rtf

> RTFParser crashes with NullPointerException
> ---
>
> Key: TIKA-2068
> URL: https://issues.apache.org/jira/browse/TIKA-2068
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.13
>Reporter: Tran Nam Quang
> Attachments: Styrodur C - 2800_C.rtf
>
>
> The RTFParser seems to crash on RTF files containing pictures. Stacktrace:
> java.lang.NullPointerException
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
> at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
> at org.apache.tika.config.TikaConfig.(TikaConfig.java:218)
> at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219)
> at 
> org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198)
> at 
> org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357)
> at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456)
> at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439)
> at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-623) Add support for Outlook PST

2011-04-08 Thread Tran Nam Quang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tran Nam Quang updated TIKA-623:


Attachment: OutlookPSTParser.java

First version of parser

 Add support for Outlook PST
 ---

 Key: TIKA-623
 URL: https://issues.apache.org/jira/browse/TIKA-623
 Project: Tika
  Issue Type: New Feature
  Components: parser
Reporter: Tran Nam Quang
 Attachments: OutlookPSTParser.java


 Hello everyone,
 As you might know, Outlook stores its mails and other stuff in a single PST 
 file. There's a relatively new Java library called java-libpst for reading 
 Outlook PST files. It is licensed under the LGPL and available over here: 
 http://code.google.com/p/java-libpst/
 I have tested the library on Outlook 2000 and Outlook 2003, with good 
 results. It would be great if the library could be integrated into Tika.
 Best regards
 Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (TIKA-623) Add support for Outlook PST

2011-04-08 Thread Tran Nam Quang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tran Nam Quang updated TIKA-623:


Comment: was deleted

(was: First version of parser)

 Add support for Outlook PST
 ---

 Key: TIKA-623
 URL: https://issues.apache.org/jira/browse/TIKA-623
 Project: Tika
  Issue Type: New Feature
  Components: parser
Reporter: Tran Nam Quang
 Attachments: OutlookPSTParser.java


 Hello everyone,
 As you might know, Outlook stores its mails and other stuff in a single PST 
 file. There's a relatively new Java library called java-libpst for reading 
 Outlook PST files. It is licensed under the LGPL and available over here: 
 http://code.google.com/p/java-libpst/
 I have tested the library on Outlook 2000 and Outlook 2003, with good 
 results. It would be great if the library could be integrated into Tika.
 Best regards
 Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-04-03 Thread Tran Nam Quang (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015152#comment-13015152
 ] 

Tran Nam Quang commented on TIKA-623:
-

I started work on the Tika parser, but got stuck with the following problem: In 
order to access the Outlook PST file, I need to create a PSTFile instance. Now, 
the PSTFile constructor requires either a File or a String argument that points 
at the PST file. The constructor then takes either of these to create a 
RandomAccessFile internally. However, Tika's Parser interface gives me an 
InputStream. What do I do?

 Add support for Outlook PST
 ---

 Key: TIKA-623
 URL: https://issues.apache.org/jira/browse/TIKA-623
 Project: Tika
  Issue Type: New Feature
  Components: parser
Reporter: Tran Nam Quang

 Hello everyone,
 As you might know, Outlook stores its mails and other stuff in a single PST 
 file. There's a relatively new Java library called java-libpst for reading 
 Outlook PST files. It is licensed under the LGPL and available over here: 
 http://code.google.com/p/java-libpst/
 I have tested the library on Outlook 2000 and Outlook 2003, with good 
 results. It would be great if the library could be integrated into Tika.
 Best regards
 Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Edited] (TIKA-623) Add support for Outlook PST

2011-04-03 Thread Tran Nam Quang (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015152#comment-13015152
 ] 

Tran Nam Quang edited comment on TIKA-623 at 4/3/11 2:25 PM:
-

I started work on the Tika parser, but got stuck with the following problem: In 
order to access the Outlook PST file, I need to create a PSTFile instance. Now, 
the PSTFile constructor requires either a File or a String argument that points 
at the PST file. The constructor then takes either of these arguments to create 
a RandomAccessFile internally. However, Tika's Parser interface gives me an 
InputStream. What do I do?

  was (Author: qforce):
I started work on the Tika parser, but got stuck with the following 
problem: In order to access the Outlook PST file, I need to create a PSTFile 
instance. Now, the PSTFile constructor requires either a File or a String 
argument that points at the PST file. The constructor then takes either of 
these to create a RandomAccessFile internally. However, Tika's Parser interface 
gives me an InputStream. What do I do?
  
 Add support for Outlook PST
 ---

 Key: TIKA-623
 URL: https://issues.apache.org/jira/browse/TIKA-623
 Project: Tika
  Issue Type: New Feature
  Components: parser
Reporter: Tran Nam Quang

 Hello everyone,
 As you might know, Outlook stores its mails and other stuff in a single PST 
 file. There's a relatively new Java library called java-libpst for reading 
 Outlook PST files. It is licensed under the LGPL and available over here: 
 http://code.google.com/p/java-libpst/
 I have tested the library on Outlook 2000 and Outlook 2003, with good 
 results. It would be great if the library could be integrated into Tika.
 Best regards
 Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-04-03 Thread Tran Nam Quang (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015171#comment-13015171
 ] 

Tran Nam Quang commented on TIKA-623:
-

The PST file is basically a folder tree with emails and other stuff in it. Is 
there some sort of specification out there that tells me how to map this tree 
to specific XHTML elements?

More specifically, what XML tags should I use to separate the emails from one 
another? And should the output be just a linear stream of emails, or should the 
tree structure be included in the output as well?

 Add support for Outlook PST
 ---

 Key: TIKA-623
 URL: https://issues.apache.org/jira/browse/TIKA-623
 Project: Tika
  Issue Type: New Feature
  Components: parser
Reporter: Tran Nam Quang

 Hello everyone,
 As you might know, Outlook stores its mails and other stuff in a single PST 
 file. There's a relatively new Java library called java-libpst for reading 
 Outlook PST files. It is licensed under the LGPL and available over here: 
 http://code.google.com/p/java-libpst/
 I have tested the library on Outlook 2000 and Outlook 2003, with good 
 results. It would be great if the library could be integrated into Tika.
 Best regards
 Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-04-02 Thread Tran Nam Quang (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015046#comment-13015046
 ] 

Tran Nam Quang commented on TIKA-623:
-

Cool! I'll start writing the Tika parser as soon as I can. Could take a couple 
of days though.

Richard, I have one question regarding the API: PSTMessage has two methods, 
getDescriptorNodeId() and getInternetMessageId(). Both return identifiers, 
apparently. My question is: Which one is an unique identifier that will never, 
ever change? Cause I wouldn't want the Tika parser to extract identifiers that 
are internal-only and not unique.

 Add support for Outlook PST
 ---

 Key: TIKA-623
 URL: https://issues.apache.org/jira/browse/TIKA-623
 Project: Tika
  Issue Type: New Feature
  Components: parser
Reporter: Tran Nam Quang

 Hello everyone,
 As you might know, Outlook stores its mails and other stuff in a single PST 
 file. There's a relatively new Java library called java-libpst for reading 
 Outlook PST files. It is licensed under the LGPL and available over here: 
 http://code.google.com/p/java-libpst/
 I have tested the library on Outlook 2000 and Outlook 2003, with good 
 results. It would be great if the library could be integrated into Tika.
 Best regards
 Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Edited] (TIKA-623) Add support for Outlook PST

2011-04-02 Thread Tran Nam Quang (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015046#comment-13015046
 ] 

Tran Nam Quang edited comment on TIKA-623 at 4/2/11 4:30 PM:
-

Cool! I'll start writing the Tika parser as soon as I can. Could take a couple 
of days though.

Richard, I have one question regarding the API: PSTMessage has two methods, 
getDescriptorNodeId() and getInternetMessageId(). Both return identifiers, 
apparently. My question is: Which one is an unique identifier that will never, 
ever change? Cause I wouldn't want the Tika parser to extract identifiers that 
are internal-only and not unique.

Btw, maybe it's a good idea to also clarify this in the Javadoc.

  was (Author: qforce):
Cool! I'll start writing the Tika parser as soon as I can. Could take a 
couple of days though.

Richard, I have one question regarding the API: PSTMessage has two methods, 
getDescriptorNodeId() and getInternetMessageId(). Both return identifiers, 
apparently. My question is: Which one is an unique identifier that will never, 
ever change? Cause I wouldn't want the Tika parser to extract identifiers that 
are internal-only and not unique.
  
 Add support for Outlook PST
 ---

 Key: TIKA-623
 URL: https://issues.apache.org/jira/browse/TIKA-623
 Project: Tika
  Issue Type: New Feature
  Components: parser
Reporter: Tran Nam Quang

 Hello everyone,
 As you might know, Outlook stores its mails and other stuff in a single PST 
 file. There's a relatively new Java library called java-libpst for reading 
 Outlook PST files. It is licensed under the LGPL and available over here: 
 http://code.google.com/p/java-libpst/
 I have tested the library on Outlook 2000 and Outlook 2003, with good 
 results. It would be great if the library could be integrated into Tika.
 Best regards
 Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-03-30 Thread Tran Nam Quang (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013645#comment-13013645
 ] 

Tran Nam Quang commented on TIKA-623:
-

I contacted the library author, he agreed to dual-licensing the library as 
LGPL/Apache. I hope this clears up the licensing issues.

As for the Tika parser, I won't be able to implement that before Saturday or 
Sunday (assuming I'm still supposed to).

 Add support for Outlook PST
 ---

 Key: TIKA-623
 URL: https://issues.apache.org/jira/browse/TIKA-623
 Project: Tika
  Issue Type: New Feature
  Components: parser
Reporter: Tran Nam Quang

 Hello everyone,
 As you might know, Outlook stores its mails and other stuff in a single PST 
 file. There's a relatively new Java library called java-libpst for reading 
 Outlook PST files. It is licensed under the LGPL and available over here: 
 http://code.google.com/p/java-libpst/
 I have tested the library on Outlook 2000 and Outlook 2003, with good 
 results. It would be great if the library could be integrated into Tika.
 Best regards
 Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Edited] (TIKA-623) Add support for Outlook PST

2011-03-30 Thread Tran Nam Quang (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013645#comment-13013645
 ] 

Tran Nam Quang edited comment on TIKA-623 at 3/30/11 9:03 PM:
--

I contacted the library author, he agreed to dual-licensing the library as 
LGPL/Apache. This means java-libpst can be included by default in Tika, right?

As for the Tika parser, I won't be able to implement that before Saturday or 
Sunday (assuming I'm still supposed to).

  was (Author: qforce):
I contacted the library author, he agreed to dual-licensing the library as 
LGPL/Apache. I hope this clears up the licensing issues.

As for the Tika parser, I won't be able to implement that before Saturday or 
Sunday (assuming I'm still supposed to).
  
 Add support for Outlook PST
 ---

 Key: TIKA-623
 URL: https://issues.apache.org/jira/browse/TIKA-623
 Project: Tika
  Issue Type: New Feature
  Components: parser
Reporter: Tran Nam Quang

 Hello everyone,
 As you might know, Outlook stores its mails and other stuff in a single PST 
 file. There's a relatively new Java library called java-libpst for reading 
 Outlook PST files. It is licensed under the LGPL and available over here: 
 http://code.google.com/p/java-libpst/
 I have tested the library on Outlook 2000 and Outlook 2003, with good 
 results. It would be great if the library could be integrated into Tika.
 Best regards
 Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-03-30 Thread Tran Nam Quang (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013673#comment-13013673
 ] 

Tran Nam Quang commented on TIKA-623:
-

I have zero experience with Maven, so I don't think I'm the right person to 
take care of the Maven upload.

I might be able to handle the Parser, although it'll probably have to wait 
until the library author makes a new relicensed release available.

 Add support for Outlook PST
 ---

 Key: TIKA-623
 URL: https://issues.apache.org/jira/browse/TIKA-623
 Project: Tika
  Issue Type: New Feature
  Components: parser
Reporter: Tran Nam Quang

 Hello everyone,
 As you might know, Outlook stores its mails and other stuff in a single PST 
 file. There's a relatively new Java library called java-libpst for reading 
 Outlook PST files. It is licensed under the LGPL and available over here: 
 http://code.google.com/p/java-libpst/
 I have tested the library on Outlook 2000 and Outlook 2003, with good 
 results. It would be great if the library could be integrated into Tika.
 Best regards
 Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (TIKA-623) Add support for Outlook PST

2011-03-29 Thread Tran Nam Quang (JIRA)
Add support for Outlook PST
---

 Key: TIKA-623
 URL: https://issues.apache.org/jira/browse/TIKA-623
 Project: Tika
  Issue Type: New Feature
  Components: parser
Reporter: Tran Nam Quang


Hello everyone,

As you might know, Outlook stores its mails and other stuff in a single PST 
file. There's a relatively new Java library called java-libpst for reading 
Outlook PST files. It is licensed under the LGPL and available over here: 
http://code.google.com/p/java-libpst/

I have tested the library on Outlook 2000 and Outlook 2003, with good results. 
It would be great if the library could be integrated into Tika.

Best regards
Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-623) Add support for Outlook PST

2011-03-29 Thread Tran Nam Quang (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012744#comment-13012744
 ] 

Tran Nam Quang commented on TIKA-623:
-

What license is required for inclusion in Tika, other than the Apache License 
2.0? I could ask the author to change the license or switch to dual-licensing...

The basic parser is already listed as an example on the front page of the 
java-libpst website, by the way.

 Add support for Outlook PST
 ---

 Key: TIKA-623
 URL: https://issues.apache.org/jira/browse/TIKA-623
 Project: Tika
  Issue Type: New Feature
  Components: parser
Reporter: Tran Nam Quang

 Hello everyone,
 As you might know, Outlook stores its mails and other stuff in a single PST 
 file. There's a relatively new Java library called java-libpst for reading 
 Outlook PST files. It is licensed under the LGPL and available over here: 
 http://code.google.com/p/java-libpst/
 I have tested the library on Outlook 2000 and Outlook 2003, with good 
 results. It would be great if the library could be integrated into Tika.
 Best regards
 Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Edited] (TIKA-623) Add support for Outlook PST

2011-03-29 Thread Tran Nam Quang (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012744#comment-13012744
 ] 

Tran Nam Quang edited comment on TIKA-623 at 3/29/11 10:17 PM:
---

What licenses would permit inclusion in Tika, other than the Apache License 
2.0? I could ask the author to change the library's license or to switch to 
dual-licensing...

The basic parser is already listed as an example on the front page of the 
java-libpst website, by the way.

  was (Author: qforce):
What license is required for inclusion in Tika, other than the Apache 
License 2.0? I could ask the author to change the license or switch to 
dual-licensing...

The basic parser is already listed as an example on the front page of the 
java-libpst website, by the way.
  
 Add support for Outlook PST
 ---

 Key: TIKA-623
 URL: https://issues.apache.org/jira/browse/TIKA-623
 Project: Tika
  Issue Type: New Feature
  Components: parser
Reporter: Tran Nam Quang

 Hello everyone,
 As you might know, Outlook stores its mails and other stuff in a single PST 
 file. There's a relatively new Java library called java-libpst for reading 
 Outlook PST files. It is licensed under the LGPL and available over here: 
 http://code.google.com/p/java-libpst/
 I have tested the library on Outlook 2000 and Outlook 2003, with good 
 results. It would be great if the library could be integrated into Tika.
 Best regards
 Tran Nam Quang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira