[jira] [Commented] (TIKA-1600) Unable to parse ODT files because of failed to close temporary resources

2015-04-13 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492084#comment-14492084 ] Hong-Thai Nguyen commented on TIKA-1600: The root exception is an NPE when parsing

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-30 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386900#comment-14386900 ] Hong-Thai Nguyen commented on TIKA-1581: And great thank to [~kkrugler] with many i

[jira] [Resolved] (TIKA-1581) jhighlight license concerns

2015-03-27 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1581. Resolution: Fixed > jhighlight license concerns > --- > >

[jira] [Updated] (TIKA-1581) jhighlight license concerns

2015-03-27 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1581: --- Fix Version/s: 1.8 > jhighlight license concerns > --- > >

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-27 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383827#comment-14383827 ] Hong-Thai Nguyen commented on TIKA-1581: On r1669583, I switched to latest jhighlig

[jira] [Comment Edited] (TIKA-1581) jhighlight license concerns

2015-03-20 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371432#comment-14371432 ] Hong-Thai Nguyen edited comment on TIKA-1581 at 3/20/15 3:36 PM:

[jira] [Comment Edited] (TIKA-1581) jhighlight license concerns

2015-03-20 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371432#comment-14371432 ] Hong-Thai Nguyen edited comment on TIKA-1581 at 3/20/15 3:10 PM:

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-20 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371432#comment-14371432 ] Hong-Thai Nguyen commented on TIKA-1581: I've contacted also 'gbe...@uwyn.com', see

[jira] [Commented] (TIKA-1505) chmparser breaks down when extracting from file of CHM format v3

2015-01-05 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264786#comment-14264786 ] Hong-Thai Nguyen commented on TIKA-1505: Can you provide also problem files and tes

[jira] [Resolved] (TIKA-672) Proper error handling in the CHM parser

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-672. --- Resolution: Fixed Check no more System.err/System.out inside CHM parser > Proper error handling

[jira] [Updated] (TIKA-672) Proper error handling in the CHM parser

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-672: -- Fix Version/s: 1.7 > Proper error handling in the CHM parser > ---

[jira] [Updated] (TIKA-1448) CHM parser : defect in file extraction

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1448: --- Fix Version/s: 1.7 > CHM parser : defect in file extraction > -

[jira] [Updated] (TIKA-1446) CHM parser : wrong decompression of aligned blocks

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1446: --- Fix Version/s: 1.7 > CHM parser : wrong decompression of aligned blocks > -

[jira] [Updated] (TIKA-1430) CHM parser gets faulty text (fix found)

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1430: --- Fix Version/s: 1.7 > CHM parser gets faulty text (fix found) >

[jira] [Resolved] (TIKA-1430) CHM parser gets faulty text (fix found)

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1430. Resolution: Fixed > CHM parser gets faulty text (fix found) > ---

[jira] [Updated] (TIKA-1447) CHM parser: wrong directory list

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1447: --- Fix Version/s: 1.7 > CHM parser: wrong directory list > > >

[jira] [Resolved] (TIKA-1448) CHM parser : defect in file extraction

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1448. Resolution: Fixed > CHM parser : defect in file extraction >

[jira] [Resolved] (TIKA-1447) CHM parser: wrong directory list

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1447. Resolution: Fixed > CHM parser: wrong directory list > > >

[jira] [Resolved] (TIKA-1446) CHM parser : wrong decompression of aligned blocks

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1446. Resolution: Fixed > CHM parser : wrong decompression of aligned blocks >

[jira] [Commented] (TIKA-1447) CHM parser: wrong directory list

2014-11-17 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214535#comment-14214535 ] Hong-Thai Nguyen commented on TIKA-1447: [~binhawking], The work on TIKA-1446 fixed

[jira] [Comment Edited] (TIKA-1446) CHM parser : wrong decompression of aligned blocks

2014-11-12 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208079#comment-14208079 ] Hong-Thai Nguyen edited comment on TIKA-1446 at 11/12/14 2:38 PM: ---

[jira] [Commented] (TIKA-1446) CHM parser : wrong decompression of aligned blocks

2014-11-12 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208079#comment-14208079 ] Hong-Thai Nguyen commented on TIKA-1446: Hi [~binhawking], I've merge your pull req

[jira] [Commented] (TIKA-1463) TesseractOCRParser does not work in Windows

2014-11-04 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196343#comment-14196343 ] Hong-Thai Nguyen commented on TIKA-1463: Thank [~lfcnassif], without .exe effective

[jira] [Closed] (TIKA-1463) TesseractOCRParser does not work in Windows

2014-11-03 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen closed TIKA-1463. -- Resolution: Fixed > TesseractOCRParser does not work in Windows > ---

[jira] [Updated] (TIKA-1463) TesseractOCRParser does not work in Windows

2014-11-03 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1463: --- Description: STR: * Case 1: ** Setting tesseractPath to a common installation path of Tesseract

[jira] [Updated] (TIKA-1463) TesseractOCRParser does not work in Windows

2014-11-03 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1463: --- Summary: TesseractOCRParser does not work in Windows (was: TesseractOCRParser does work in Win

[jira] [Commented] (TIKA-1463) TesseractOCRParser does work in Windows

2014-11-03 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194694#comment-14194694 ] Hong-Thai Nguyen commented on TIKA-1463: Fixed in r1636382 > TesseractOCRParser do

[jira] [Created] (TIKA-1463) TesseractOCRParser does work in Windows

2014-11-03 Thread Hong-Thai Nguyen (JIRA)
Hong-Thai Nguyen created TIKA-1463: -- Summary: TesseractOCRParser does work in Windows Key: TIKA-1463 URL: https://issues.apache.org/jira/browse/TIKA-1463 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-1446) CHM parser : wrong decompression of aligned blocks

2014-10-23 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181530#comment-14181530 ] Hong-Thai Nguyen commented on TIKA-1446: Thank alot [~binhawking], I've quick look

[jira] [Comment Edited] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-21 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178186#comment-14178186 ] Hong-Thai Nguyen edited comment on TIKA-1422 at 10/21/14 9:48 AM: ---

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-21 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178186#comment-14178186 ] Hong-Thai Nguyen commented on TIKA-1422: Applied latest fix on r1633325 with some f

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-16 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173537#comment-14173537 ] Hong-Thai Nguyen commented on TIKA-1422: I'm not using Tesseract > org.apache.tika

[jira] [Commented] (TIKA-1176) ChmDirectoryListingSet does not correctly enumerate directory entries

2014-10-13 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169146#comment-14169146 ] Hong-Thai Nguyen commented on TIKA-1176: Hi [~mdgeek], thank for your offering code

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-13 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169130#comment-14169130 ] Hong-Thai Nguyen commented on TIKA-1422: Strange, I'm unable to build causing this

[jira] [Commented] (TIKA-1446) CHM parser : wrong decompression of aligned blocks

2014-10-13 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169098#comment-14169098 ] Hong-Thai Nguyen commented on TIKA-1446: Thank [~binhawking], Any change you can at

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2014-10-13 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169090#comment-14169090 ] Hong-Thai Nguyen commented on TIKA-1445: Interesting question ! For me, parser's se

[jira] [Commented] (TIKA-1428) Microsoft Word 97 - 2003 (.doc) footnote references are Unicode Replacement Character

2014-09-25 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147880#comment-14147880 ] Hong-Thai Nguyen commented on TIKA-1428: Thanks [~theoettheo], any chance to have a

[jira] [Commented] (TIKA-1412) NPE in OpenDocumentParser

2014-09-22 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143043#comment-14143043 ] Hong-Thai Nguyen commented on TIKA-1412: Add a test at r1626706 > NPE in OpenDocum

[jira] [Updated] (TIKA-1421) Tika-Parsers tests fail on CentOS6 if tesseract isn't installed

2014-09-22 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1421: --- Priority: Blocker (was: Major) > Tika-Parsers tests fail on CentOS6 if tesseract isn't install

[jira] [Commented] (TIKA-1421) Tika-Parsers tests fail on CentOS6 if tesseract isn't installed

2014-09-22 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143041#comment-14143041 ] Hong-Thai Nguyen commented on TIKA-1421: Not only CentOS, this test failed also on

[jira] [Resolved] (TIKA-1413) OOXML thumbnail name added to body

2014-09-09 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1413. Resolution: Fixed > OOXML thumbnail name added to body > -- >

[jira] [Commented] (TIKA-1413) OOXML thumbnail name added to body

2014-09-09 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126949#comment-14126949 ] Hong-Thai Nguyen commented on TIKA-1413: I agree. Fixed in r1623819 and _id_ is now

[jira] [Commented] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-29 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077885#comment-14077885 ] Hong-Thai Nguyen commented on TIKA-1373: Normally it's on next official 1.6 releas

[jira] [Resolved] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1373. Resolution: Fixed > AutoDetectParser extracts no text when SourceCodeParser is selected > --

[jira] [Commented] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073042#comment-14073042 ] Hong-Thai Nguyen commented on TIKA-1373: HtmlParser skips tags generated by JHighli

[jira] [Comment Edited] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-23 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071643#comment-14071643 ] Hong-Thai Nguyen edited comment on TIKA-1373 at 7/23/14 1:42 PM:

[jira] [Commented] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-23 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071713#comment-14071713 ] Hong-Thai Nguyen commented on TIKA-1373: Yes, I saw the trouble when implementing t

[jira] [Commented] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-23 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071643#comment-14071643 ] Hong-Thai Nguyen commented on TIKA-1373: Can you format your description with {code

[jira] [Updated] (TIKA-1095) Only gibberish extracted from this PDF

2014-07-15 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1095: --- Labels: pdfbox (was: patch) > Only gibberish extracted from this PDF > --

[jira] [Updated] (TIKA-1095) Only gibberish extracted from this PDF

2014-07-15 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1095: --- Component/s: (was: general) parser > Only gibberish extracted from this P

[jira] [Commented] (TIKA-1095) Only gibberish extracted from this PDF

2014-07-15 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061867#comment-14061867 ] Hong-Thai Nguyen commented on TIKA-1095: Event with latest Tika can't convert this

[jira] [Commented] (TIKA-1332) Create "eval" code

2014-06-26 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044706#comment-14044706 ] Hong-Thai Nguyen commented on TIKA-1332: What you are describing is something alike

[jira] [Commented] (TIKA-1350) OutlookPSTParser: Unknown message type: IPM.Note

2014-06-23 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040519#comment-14040519 ] Hong-Thai Nguyen commented on TIKA-1350: Richard Johnson (author of java-pstlib) is

[jira] [Commented] (TIKA-1320) extract text from jpeg in solr tika

2014-06-04 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017473#comment-14017473 ] Hong-Thai Nguyen commented on TIKA-1320: OCR is a solution: TIKA-93. Unfortunately,

[jira] [Commented] (TIKA-1308) Support in memory parse mode(don't create temp file): to support run Tika in GAE

2014-05-26 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008704#comment-14008704 ] Hong-Thai Nguyen commented on TIKA-1308: A virtual FileSystem may be a solution, If

[jira] [Resolved] (TIKA-1290) Upgrade to PDFBOX 1.8.5

2014-05-06 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1290. Resolution: Fixed r1592780 > Upgrade to PDFBOX 1.8.5 > --- > >

[jira] [Updated] (TIKA-1290) Upgrade to PDFBOX 1.8.5

2014-05-06 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1290: --- Labels: trivial (was: ) > Upgrade to PDFBOX 1.8.5 > --- > >

[jira] [Created] (TIKA-1290) Upgrade to PDFBOX 1.8.5

2014-05-02 Thread Hong-Thai Nguyen (JIRA)
Hong-Thai Nguyen created TIKA-1290: -- Summary: Upgrade to PDFBOX 1.8.5 Key: TIKA-1290 URL: https://issues.apache.org/jira/browse/TIKA-1290 Project: Tika Issue Type: Improvement Re

[jira] [Commented] (TIKA-1287) Update NetCDF .jar file on Maven Central

2014-05-02 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987521#comment-13987521 ] Hong-Thai Nguyen commented on TIKA-1287: Technically, not difficult to upload new j

[jira] [Commented] (TIKA-1283) Add "thumbnail" as possible metadata item to TikaCoreProperties

2014-04-28 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983434#comment-13983434 ] Hong-Thai Nguyen commented on TIKA-1283: +1 for me to create a thumbnail field in m

[jira] [Resolved] (TIKA-1279) Missing return lines at output of SourceCodeParser

2014-04-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1279. Resolution: Fixed Thank [~rgauss] for this good catch. I fixed with more tests in r1589742 H

[jira] [Resolved] (TIKA-1276) Missing embedded dependencies in tika-bundle

2014-04-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1276. Resolution: Fixed Thank [~rwesten], added your patch at r1589717 > Missing embedded depende

[jira] [Updated] (TIKA-1276) Missing embedded dependencies in tika-bundle

2014-04-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1276: --- Fix Version/s: 1.6 > Missing embedded dependencies in tika-bundle > --

[jira] [Resolved] (TIKA-1279) Missing return lines at output of SourceCodeParser

2014-04-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1279. Resolution: Fixed Fixed at r1589687 > Missing return lines at output of SourceCodeParser >

[jira] [Commented] (TIKA-1224) Adding Source code (Java, Groovy, C) parser

2014-04-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979614#comment-13979614 ] Hong-Thai Nguyen commented on TIKA-1224: Thank [~ben.12] for feedback. For line ret

[jira] [Created] (TIKA-1279) Missing return lines at output of SourceCodeParser

2014-04-24 Thread Hong-Thai Nguyen (JIRA)
Hong-Thai Nguyen created TIKA-1279: -- Summary: Missing return lines at output of SourceCodeParser Key: TIKA-1279 URL: https://issues.apache.org/jira/browse/TIKA-1279 Project: Tika Issue Type:

[jira] [Updated] (TIKA-623) Add support for Outlook PST

2014-04-04 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-623: -- Assignee: (was: Hong-Thai Nguyen) > Add support for Outlook PST > ---

[jira] [Resolved] (TIKA-623) Add support for Outlook PST

2014-04-04 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-623. --- Resolution: Fixed Improvement: extract each mail as attachment document. Recursion down to fol

[jira] [Resolved] (TIKA-1244) Better parsing of Mbox files

2014-03-31 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1244. Resolution: Fixed Fix Version/s: 1.6 Commited on r1583305, thanks [~lfcnassif] I pres

[jira] [Assigned] (TIKA-1244) Better parsing of Mbox files

2014-03-28 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen reassigned TIKA-1244: -- Assignee: Hong-Thai Nguyen > Better parsing of Mbox files >

[jira] [Commented] (TIKA-1244) Better parsing of Mbox files

2014-03-21 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942965#comment-13942965 ] Hong-Thai Nguyen commented on TIKA-1244: +1 for me too, I was at same intention to

[jira] [Reopened] (TIKA-623) Add support for Outlook PST

2014-03-07 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen reopened TIKA-623: --- Assignee: Tim Allison (was: Hong-Thai Nguyen) > Add support for Outlook PST > --

[jira] [Commented] (TIKA-623) Add support for Outlook PST

2014-03-07 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923703#comment-13923703 ] Hong-Thai Nguyen commented on TIKA-623: --- [~lfcnassif], binary attached is handled with

[jira] [Updated] (TIKA-1257) MS Word Filter out control characters on ouput

2014-03-06 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1257: --- Attachment: testControlCharacters.doc > MS Word Filter out control characters on ouput > -

[jira] [Updated] (TIKA-1257) MS Word Filter out control characters on ouput

2014-03-06 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1257: --- Attachment: (was: 5f01ae23-9e6e-4faa-808a-f78dbb20cc71.doc) > MS Word Filter out control c

[jira] [Comment Edited] (TIKA-1257) MS Word Filter out control characters on ouput

2014-03-06 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922490#comment-13922490 ] Hong-Thai Nguyen edited comment on TIKA-1257 at 3/6/14 1:50 PM: -

[jira] [Resolved] (TIKA-1257) MS Word Filter out control characters on ouput

2014-03-06 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1257. Resolution: Fixed Fixed on r1574874 > MS Word Filter out control characters on ouput >

[jira] [Updated] (TIKA-1257) MS Word Filter out control characters on ouput

2014-03-06 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1257: --- Attachment: tika-doc-control-char.png 5f01ae23-9e6e-4faa-808a-f78dbb20cc71.doc

[jira] [Created] (TIKA-1257) MS Word Filter out control characters on ouput

2014-03-06 Thread Hong-Thai Nguyen (JIRA)
Hong-Thai Nguyen created TIKA-1257: -- Summary: MS Word Filter out control characters on ouput Key: TIKA-1257 URL: https://issues.apache.org/jira/browse/TIKA-1257 Project: Tika Issue Type: Bug

[jira] [Resolved] (TIKA-623) Add support for Outlook PST

2014-03-05 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-623. --- Resolution: Fixed Commit on r1574411 > Add support for Outlook PST > -

[jira] [Comment Edited] (TIKA-623) Add support for Outlook PST

2014-03-05 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920692#comment-13920692 ] Hong-Thai Nguyen edited comment on TIKA-623 at 3/5/14 9:30 AM: ---

[jira] [Assigned] (TIKA-623) Add support for Outlook PST

2014-03-05 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen reassigned TIKA-623: - Assignee: Hong-Thai Nguyen > Add support for Outlook PST > --- > >

[jira] [Commented] (TIKA-623) Add support for Outlook PST

2014-03-05 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920692#comment-13920692 ] Hong-Thai Nguyen commented on TIKA-623: --- java-libpst-0.7 has been uploaded to oss sona

[jira] [Updated] (TIKA-623) Add support for Outlook PST

2014-03-05 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-623: -- Fix Version/s: 1.6 > Add support for Outlook PST > --- > >

[jira] [Assigned] (TIKA-1223) Extract thumbnail of OOXML Office files

2014-02-17 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen reassigned TIKA-1223: -- Assignee: (was: Hong-Thai Nguyen) > Extract thumbnail of OOXML Office files > --

[jira] [Resolved] (TIKA-1223) Extract thumbnail of OOXML Office files

2014-02-17 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1223. Resolution: Fixed r1568954 > Extract thumbnail of OOXML Office files >

[jira] [Assigned] (TIKA-1223) Extract thumbnail of OOXML Office files

2014-02-17 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen reassigned TIKA-1223: -- Assignee: Hong-Thai Nguyen > Extract thumbnail of OOXML Office files > -

[jira] [Resolved] (TIKA-1089) Tika conversion failed on following documents

2014-02-17 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1089. Resolution: Invalid Fix Version/s: 1.5 Assignee: Hong-Thai Nguyen Should cre

[jira] [Resolved] (TIKA-1224) Adding Source code (Java, Groovy, C) parser

2014-02-03 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1224. Resolution: Fixed > Adding Source code (Java, Groovy, C) parser > --

[jira] [Commented] (TIKA-1224) Adding Source code (Java, Groovy, C) parser

2014-02-03 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889491#comment-13889491 ] Hong-Thai Nguyen commented on TIKA-1224: Commited on 1563902 > Adding Source code

[jira] [Commented] (TIKA-1224) Adding Source code (Java, Groovy, C) parser

2014-01-21 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877343#comment-13877343 ] Hong-Thai Nguyen commented on TIKA-1224: I agree that parsing deeply each language

[jira] [Created] (TIKA-1224) Adding Source code (Java, Groovy, C) parser

2014-01-20 Thread Hong-Thai Nguyen (JIRA)
Hong-Thai Nguyen created TIKA-1224: -- Summary: Adding Source code (Java, Groovy, C) parser Key: TIKA-1224 URL: https://issues.apache.org/jira/browse/TIKA-1224 Project: Tika Issue Type: Improv

[jira] [Updated] (TIKA-1223) Extract thumbnail of OOXML Office files

2014-01-17 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1223: --- Description: >From Microsoft Office 2007 file formats, thumbnail could be included in >packag

[jira] [Updated] (TIKA-1223) Extract thumbnail of OOXML Office files

2014-01-17 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1223: --- Attachment: TIKA-1223.patch > Extract thumbnail of OOXML Office files > --

[jira] [Created] (TIKA-1223) Extract thumbnail of OOXML Office files

2014-01-17 Thread Hong-Thai Nguyen (JIRA)
Hong-Thai Nguyen created TIKA-1223: -- Summary: Extract thumbnail of OOXML Office files Key: TIKA-1223 URL: https://issues.apache.org/jira/browse/TIKA-1223 Project: Tika Issue Type: Improvemen

[jira] [Commented] (TIKA-1215) Regression: Unable to parse a mp3 file on 1.5 which parsed successfully on 1.4

2014-01-14 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870573#comment-13870573 ] Hong-Thai Nguyen commented on TIKA-1215: Great catch. Thank [~jukkaz] > Regression

[jira] [Commented] (TIKA-1215) Regression: Unable to parse a mp3 file on 1.5 which parsed successfully on 1.4

2014-01-13 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869590#comment-13869590 ] Hong-Thai Nguyen commented on TIKA-1215: [~talli...@apache.org], here's XML of inpu

[jira] [Updated] (TIKA-1215) Regression: Unable to parse a mp3 file on 1.5 which parsed successfully on 1.4

2014-01-13 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1215: --- Attachment: tika-1215-without-wildcard.patch [~gagravarr], my code style is different the one

[jira] [Commented] (TIKA-90) Allow thumbnails as document metadata

2014-01-09 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866498#comment-13866498 ] Hong-Thai Nguyen commented on TIKA-90: -- Useful for Open XML Office & OpenOffice files an

[jira] [Comment Edited] (TIKA-1216) parse method of Mp3Parser doesn't work for few mp3 files

2014-01-07 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864202#comment-13864202 ] Hong-Thai Nguyen edited comment on TIKA-1216 at 1/7/14 3:57 PM: -

  1   2   >