[jira] [Updated] (TIKA-2068) RTFParser crashes with NullPointerException
[ https://issues.apache.org/jira/browse/TIKA-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tran Nam Quang updated TIKA-2068: - Description: The RTFParser seems to crash on RTF files containing pictures. The attached file produces the following stacktrace: java.lang.NullPointerException at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174) at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577) at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69) at org.apache.tika.config.TikaConfig.(TikaConfig.java:218) at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341) at org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263) at org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242) at org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219) at org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198) at org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357) at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456) at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439) at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87) was: The RTFParser seems to crash on RTF files containing pictures. Stacktrace: java.lang.NullPointerException at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174) at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577) at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69) at org.apache.tika.config.TikaConfig.(TikaConfig.java:218) at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341) at org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263) at org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242) at org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219) at org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198) at org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357) at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456) at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439) at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87) > RTFParser crashes with NullPointerException > --- > > Key: TIKA-2068 > URL: https://issues.apache.org/jira/browse/TIKA-2068 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.13 >Reporter: Tran Nam Quang > Attachments: Styrodur C - 2800_C.rtf > > > The RTFParser seems to crash on RTF files containing pictures. The attached > file produces the following stacktrace: > java.lang.NullPointerException > at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) > at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174) > at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577) > at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69) > at org.apache.tika.config.TikaConfig.(TikaConfig.java:218) > at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198) > at > org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357) > at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456) > at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439) > at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-2068) RTFParser crashes with NullPointerException
[ https://issues.apache.org/jira/browse/TIKA-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tran Nam Quang updated TIKA-2068: - Description: The RTFParser seems to crash on RTF files containing pictures. The attached file produces the following stacktrace: java.lang.NullPointerException at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174) at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577) at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69) at org.apache.tika.config.TikaConfig.(TikaConfig.java:218) at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341) at org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263) at org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242) at org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219) at org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198) at org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357) at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456) at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439) at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87) was: The RTFParser crashes on RTF files containing pictures. The attached file produces the following stacktrace: java.lang.NullPointerException at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174) at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577) at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69) at org.apache.tika.config.TikaConfig.(TikaConfig.java:218) at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341) at org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263) at org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242) at org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219) at org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198) at org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357) at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456) at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439) at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87) > RTFParser crashes with NullPointerException > --- > > Key: TIKA-2068 > URL: https://issues.apache.org/jira/browse/TIKA-2068 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.13 >Reporter: Tran Nam Quang > Attachments: Styrodur C - 2800_C.rtf > > > The RTFParser seems to crash on RTF files containing pictures. The attached > file produces the following stacktrace: > java.lang.NullPointerException > at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) > at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174) > at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577) > at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69) > at org.apache.tika.config.TikaConfig.(TikaConfig.java:218) > at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198) > at > org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357) > at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456) > at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439) > at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-2068) RTFParser crashes with NullPointerException
[ https://issues.apache.org/jira/browse/TIKA-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tran Nam Quang updated TIKA-2068: - Description: The RTFParser crashes on RTF files containing pictures. The attached file produces the following stacktrace: java.lang.NullPointerException at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174) at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577) at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69) at org.apache.tika.config.TikaConfig.(TikaConfig.java:218) at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341) at org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263) at org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242) at org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219) at org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198) at org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357) at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456) at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439) at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87) was: The RTFParser seems to crash on RTF files containing pictures. The attached file produces the following stacktrace: java.lang.NullPointerException at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174) at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577) at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69) at org.apache.tika.config.TikaConfig.(TikaConfig.java:218) at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341) at org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263) at org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242) at org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219) at org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198) at org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357) at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456) at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439) at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87) > RTFParser crashes with NullPointerException > --- > > Key: TIKA-2068 > URL: https://issues.apache.org/jira/browse/TIKA-2068 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.13 >Reporter: Tran Nam Quang > Attachments: Styrodur C - 2800_C.rtf > > > The RTFParser crashes on RTF files containing pictures. The attached file > produces the following stacktrace: > java.lang.NullPointerException > at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) > at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174) > at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577) > at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69) > at org.apache.tika.config.TikaConfig.(TikaConfig.java:218) > at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198) > at > org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357) > at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456) > at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439) > at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-2068) RTFParser crashes with NullPointerException
[ https://issues.apache.org/jira/browse/TIKA-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tran Nam Quang updated TIKA-2068: - Attachment: Styrodur C - 2800_C.rtf > RTFParser crashes with NullPointerException > --- > > Key: TIKA-2068 > URL: https://issues.apache.org/jira/browse/TIKA-2068 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.13 >Reporter: Tran Nam Quang > Attachments: Styrodur C - 2800_C.rtf > > > The RTFParser seems to crash on RTF files containing pictures. Stacktrace: > java.lang.NullPointerException > at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) > at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:174) > at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577) > at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69) > at org.apache.tika.config.TikaConfig.(TikaConfig.java:218) > at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.getConfig(RTFEmbObjHandler.java:263) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.getExtension(RTFEmbObjHandler.java:242) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.extractObj(RTFEmbObjHandler.java:219) > at > org.apache.tika.parser.rtf.RTFEmbObjHandler.handleCompletedObject(RTFEmbObjHandler.java:198) > at > org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1357) > at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:456) > at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439) > at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tran Nam Quang updated TIKA-623: Attachment: OutlookPSTParser.java First version of parser Add support for Outlook PST --- Key: TIKA-623 URL: https://issues.apache.org/jira/browse/TIKA-623 Project: Tika Issue Type: New Feature Components: parser Reporter: Tran Nam Quang Attachments: OutlookPSTParser.java Hello everyone, As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/ I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika. Best regards Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tran Nam Quang updated TIKA-623: Comment: was deleted (was: First version of parser) Add support for Outlook PST --- Key: TIKA-623 URL: https://issues.apache.org/jira/browse/TIKA-623 Project: Tika Issue Type: New Feature Components: parser Reporter: Tran Nam Quang Attachments: OutlookPSTParser.java Hello everyone, As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/ I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika. Best regards Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015152#comment-13015152 ] Tran Nam Quang commented on TIKA-623: - I started work on the Tika parser, but got stuck with the following problem: In order to access the Outlook PST file, I need to create a PSTFile instance. Now, the PSTFile constructor requires either a File or a String argument that points at the PST file. The constructor then takes either of these to create a RandomAccessFile internally. However, Tika's Parser interface gives me an InputStream. What do I do? Add support for Outlook PST --- Key: TIKA-623 URL: https://issues.apache.org/jira/browse/TIKA-623 Project: Tika Issue Type: New Feature Components: parser Reporter: Tran Nam Quang Hello everyone, As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/ I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika. Best regards Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015152#comment-13015152 ] Tran Nam Quang edited comment on TIKA-623 at 4/3/11 2:25 PM: - I started work on the Tika parser, but got stuck with the following problem: In order to access the Outlook PST file, I need to create a PSTFile instance. Now, the PSTFile constructor requires either a File or a String argument that points at the PST file. The constructor then takes either of these arguments to create a RandomAccessFile internally. However, Tika's Parser interface gives me an InputStream. What do I do? was (Author: qforce): I started work on the Tika parser, but got stuck with the following problem: In order to access the Outlook PST file, I need to create a PSTFile instance. Now, the PSTFile constructor requires either a File or a String argument that points at the PST file. The constructor then takes either of these to create a RandomAccessFile internally. However, Tika's Parser interface gives me an InputStream. What do I do? Add support for Outlook PST --- Key: TIKA-623 URL: https://issues.apache.org/jira/browse/TIKA-623 Project: Tika Issue Type: New Feature Components: parser Reporter: Tran Nam Quang Hello everyone, As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/ I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika. Best regards Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015171#comment-13015171 ] Tran Nam Quang commented on TIKA-623: - The PST file is basically a folder tree with emails and other stuff in it. Is there some sort of specification out there that tells me how to map this tree to specific XHTML elements? More specifically, what XML tags should I use to separate the emails from one another? And should the output be just a linear stream of emails, or should the tree structure be included in the output as well? Add support for Outlook PST --- Key: TIKA-623 URL: https://issues.apache.org/jira/browse/TIKA-623 Project: Tika Issue Type: New Feature Components: parser Reporter: Tran Nam Quang Hello everyone, As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/ I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika. Best regards Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015046#comment-13015046 ] Tran Nam Quang commented on TIKA-623: - Cool! I'll start writing the Tika parser as soon as I can. Could take a couple of days though. Richard, I have one question regarding the API: PSTMessage has two methods, getDescriptorNodeId() and getInternetMessageId(). Both return identifiers, apparently. My question is: Which one is an unique identifier that will never, ever change? Cause I wouldn't want the Tika parser to extract identifiers that are internal-only and not unique. Add support for Outlook PST --- Key: TIKA-623 URL: https://issues.apache.org/jira/browse/TIKA-623 Project: Tika Issue Type: New Feature Components: parser Reporter: Tran Nam Quang Hello everyone, As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/ I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika. Best regards Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015046#comment-13015046 ] Tran Nam Quang edited comment on TIKA-623 at 4/2/11 4:30 PM: - Cool! I'll start writing the Tika parser as soon as I can. Could take a couple of days though. Richard, I have one question regarding the API: PSTMessage has two methods, getDescriptorNodeId() and getInternetMessageId(). Both return identifiers, apparently. My question is: Which one is an unique identifier that will never, ever change? Cause I wouldn't want the Tika parser to extract identifiers that are internal-only and not unique. Btw, maybe it's a good idea to also clarify this in the Javadoc. was (Author: qforce): Cool! I'll start writing the Tika parser as soon as I can. Could take a couple of days though. Richard, I have one question regarding the API: PSTMessage has two methods, getDescriptorNodeId() and getInternetMessageId(). Both return identifiers, apparently. My question is: Which one is an unique identifier that will never, ever change? Cause I wouldn't want the Tika parser to extract identifiers that are internal-only and not unique. Add support for Outlook PST --- Key: TIKA-623 URL: https://issues.apache.org/jira/browse/TIKA-623 Project: Tika Issue Type: New Feature Components: parser Reporter: Tran Nam Quang Hello everyone, As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/ I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika. Best regards Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013645#comment-13013645 ] Tran Nam Quang commented on TIKA-623: - I contacted the library author, he agreed to dual-licensing the library as LGPL/Apache. I hope this clears up the licensing issues. As for the Tika parser, I won't be able to implement that before Saturday or Sunday (assuming I'm still supposed to). Add support for Outlook PST --- Key: TIKA-623 URL: https://issues.apache.org/jira/browse/TIKA-623 Project: Tika Issue Type: New Feature Components: parser Reporter: Tran Nam Quang Hello everyone, As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/ I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika. Best regards Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013645#comment-13013645 ] Tran Nam Quang edited comment on TIKA-623 at 3/30/11 9:03 PM: -- I contacted the library author, he agreed to dual-licensing the library as LGPL/Apache. This means java-libpst can be included by default in Tika, right? As for the Tika parser, I won't be able to implement that before Saturday or Sunday (assuming I'm still supposed to). was (Author: qforce): I contacted the library author, he agreed to dual-licensing the library as LGPL/Apache. I hope this clears up the licensing issues. As for the Tika parser, I won't be able to implement that before Saturday or Sunday (assuming I'm still supposed to). Add support for Outlook PST --- Key: TIKA-623 URL: https://issues.apache.org/jira/browse/TIKA-623 Project: Tika Issue Type: New Feature Components: parser Reporter: Tran Nam Quang Hello everyone, As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/ I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika. Best regards Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013673#comment-13013673 ] Tran Nam Quang commented on TIKA-623: - I have zero experience with Maven, so I don't think I'm the right person to take care of the Maven upload. I might be able to handle the Parser, although it'll probably have to wait until the library author makes a new relicensed release available. Add support for Outlook PST --- Key: TIKA-623 URL: https://issues.apache.org/jira/browse/TIKA-623 Project: Tika Issue Type: New Feature Components: parser Reporter: Tran Nam Quang Hello everyone, As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/ I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika. Best regards Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (TIKA-623) Add support for Outlook PST
Add support for Outlook PST --- Key: TIKA-623 URL: https://issues.apache.org/jira/browse/TIKA-623 Project: Tika Issue Type: New Feature Components: parser Reporter: Tran Nam Quang Hello everyone, As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/ I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika. Best regards Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012744#comment-13012744 ] Tran Nam Quang commented on TIKA-623: - What license is required for inclusion in Tika, other than the Apache License 2.0? I could ask the author to change the license or switch to dual-licensing... The basic parser is already listed as an example on the front page of the java-libpst website, by the way. Add support for Outlook PST --- Key: TIKA-623 URL: https://issues.apache.org/jira/browse/TIKA-623 Project: Tika Issue Type: New Feature Components: parser Reporter: Tran Nam Quang Hello everyone, As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/ I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika. Best regards Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (TIKA-623) Add support for Outlook PST
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012744#comment-13012744 ] Tran Nam Quang edited comment on TIKA-623 at 3/29/11 10:17 PM: --- What licenses would permit inclusion in Tika, other than the Apache License 2.0? I could ask the author to change the library's license or to switch to dual-licensing... The basic parser is already listed as an example on the front page of the java-libpst website, by the way. was (Author: qforce): What license is required for inclusion in Tika, other than the Apache License 2.0? I could ask the author to change the license or switch to dual-licensing... The basic parser is already listed as an example on the front page of the java-libpst website, by the way. Add support for Outlook PST --- Key: TIKA-623 URL: https://issues.apache.org/jira/browse/TIKA-623 Project: Tika Issue Type: New Feature Components: parser Reporter: Tran Nam Quang Hello everyone, As you might know, Outlook stores its mails and other stuff in a single PST file. There's a relatively new Java library called java-libpst for reading Outlook PST files. It is licensed under the LGPL and available over here: http://code.google.com/p/java-libpst/ I have tested the library on Outlook 2000 and Outlook 2003, with good results. It would be great if the library could be integrated into Tika. Best regards Tran Nam Quang -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira