[VOTE] Apache Tika 1.8 Release Candidate #2
Hi Folks, A candidate for the Tika 1.8 release is available at: https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/ The SHA1 checksum of the archive is 5e22fee9079370398472e59082d171ae2d7fdd31. In addition, a staged maven repository is available here: https://repository.apache.org/content/repositories/orgapachetika-1009 Please vote on releasing this package as Apache Tika 1.8. The vote is open for the next 72 hours and passes if a majority of at least three +1 Tika PMC votes are cast. [ ] +1 Release this package as Apache Tika 1.8 [ ] ±0 I don't object to this release, but I haven't checked it [ ] -1 Do not release this package because... Thanks, Tyler
[jira] [Commented] (TIKA-1600) Unable to parse ODT files because of failed to close temporary resources
[ https://issues.apache.org/jira/browse/TIKA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492084#comment-14492084 ] Hong-Thai Nguyen commented on TIKA-1600: The root exception is an NPE when parsing ODT files with elements in footnote: {code} java.lang.NullPointerException at org.apache.tika.parser.odf.OpenDocumentContentParser$OpenDocumentElementMappingContentHandler.startSpan(OpenDocumentContentParser.java:174) at org.apache.tika.parser.odf.OpenDocumentContentParser$OpenDocumentElementMappingContentHandler.startElement(OpenDocumentContentParser.java:287) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at org.apache.tika.parser.odf.NSNormalizerContentHandler.startElement(NSNormalizerContentHandler.java:69) at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:501) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:400) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2756) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:647) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) at javax.xml.parsers.SAXParser.parse(SAXParser.java:395) at javax.xml.parsers.SAXParser.parse(SAXParser.java:198) at org.apache.tika.parser.odf.OpenDocumentContentParser.parseInternal(OpenDocumentContentParser.java:503) at org.apache.tika.parser.odf.OpenDocumentParser.handleZipEntry(OpenDocumentParser.java:187) at org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParser.java:164) at org.apache.tika.parser.odf.OpenDocumentParserTest.can_parse_odt_file(OpenDocumentParserTest.java:41) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197) {code} Seems that supporting style for ODF is recently added in 1.8: {noformat} Revision: 107 Author: tpalsulich Date: samedi 14 mars 2015 00:25:53
RE: [VOTE] Release Apache Tika 1.8 Candidate #1
Not yet, I'm investigating more on TIKA-1600 today. Hong-Thai -Message d'origine- De : Allison, Timothy B. [mailto:talli...@mitre.org] Envoyé : lundi 13 avril 2015 01:07 À : dev@tika.apache.org Objet : RE: [VOTE] Release Apache Tika 1.8 Candidate #1 I don't think we've solved TIKA-1600, yet, or have we? -Original Message- From: Tyler Palsulich [mailto:tpalsul...@gmail.com] Sent: Sunday, April 12, 2015 12:12 AM To: dev@tika.apache.org Subject: Re: [VOTE] Release Apache Tika 1.8 Candidate #1 Are we ready for another RC? I'd like to make sure the above issues are (believed to be) settled before the next cut. Thanks, Tyler On Apr 10, 2015 4:55 PM, David Meikle loo...@gmail.com wrote: On 10 Apr 2015, at 11:38, Allison, Timothy B. talli...@mitre.org wrote: I agree that the ODT issue might require a respin. What do others think? +1 for re-spin. Unfortunately, there might be 2 odt docs (mime type: “application/vnd.oasis.opendocument.text”?) in govdocs1…so we wouldn't see that problem. I did do a comparison of 1.7 vs 1.8-rc1, and the results are here: https://github.com/tballison/share/blob/master/tika_comparisons/tika_1 _7_v_1_8-rc1.zip https://github.com/tballison/share/blob/master/tika_comparisons/tika_1 _7_v_1_8-rc1.zip I encourage folks (if you haven't, and if you care :) ) to take a look and see if you see something that I don’t. Thanks for this Tim. About to get on a flight, so will check through on that. Cheers, Dave
[jira] [Created] (TIKA-1604) Memory spike while parsing pptx file
Grega Gašperšič created TIKA-1604: - Summary: Memory spike while parsing pptx file Key: TIKA-1604 URL: https://issues.apache.org/jira/browse/TIKA-1604 Project: Tika Issue Type: Bug Affects Versions: 1.7, 1.6 Reporter: Grega Gašperšič While trying to parse a pptx file with only one slide to a string (using parseToString method) the Tika object (version 1.6 and 1.7) increases the heap size of the JVM to 8 GB (on my local computer) and up to 16 GB (on our server). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1604) Memory spike while parsing pptx file
[ https://issues.apache.org/jira/browse/TIKA-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grega Gašperšič updated TIKA-1604: -- Attachment: emittance_measurement_proposal_20150225.pptx Example of with which the error occurred. Memory spike while parsing pptx file Key: TIKA-1604 URL: https://issues.apache.org/jira/browse/TIKA-1604 Project: Tika Issue Type: Bug Affects Versions: 1.6, 1.7 Reporter: Grega Gašperšič Attachments: emittance_measurement_proposal_20150225.pptx While trying to parse a pptx file with only one slide to a string (using parseToString method) the Tika object (version 1.6 and 1.7) increases the heap size of the JVM to 8 GB (on my local computer) and up to 16 GB (on our server). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1600) Unable to parse ODT files because of failed to close temporary resources
[ https://issues.apache.org/jira/browse/TIKA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492629#comment-14492629 ] Hudson commented on TIKA-1600: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #622 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/622/]) TIKA-1600. Reformat ODF Parser files and move OpenDocumentParserTest tests to ODFParserTest. (tpalsulich: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1673236) * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/NSNormalizerContentHandler.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentContentParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentMetaParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentParser.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/odf/ODFParserTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/odf/OpenDocumentParserTest.java Unable to parse ODT files because of failed to close temporary resources Key: TIKA-1600 URL: https://issues.apache.org/jira/browse/TIKA-1600 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.8 Environment: Windows Reporter: Hong-Thai Nguyen Assignee: Hong-Thai Nguyen Priority: Blocker Attachments: Manuel_koha.odt Many ODT files are failed to parse causing of this exception. A sample file in attachment {code} Apache Tika was unable to parse the document at C:\Users\hong-thai.nguyen\Downloads\Manuel_koha.odt. The full exception stack trace is included below: org.apache.tika.exception.TikaException: Failed to close temporary resources at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:152) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:127) at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:342) at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:299) at org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:256) at javax.swing.AbstractButton.fireActionPerformed(Unknown Source) at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source) at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source) at javax.swing.DefaultButtonModel.setPressed(Unknown Source) at javax.swing.AbstractButton.doClick(Unknown Source) at javax.swing.plaf.basic.BasicMenuItemUI.doClick(Unknown Source) at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(Unknown Source) at java.awt.Component.processMouseEvent(Unknown Source) at javax.swing.JComponent.processMouseEvent(Unknown Source) at java.awt.Component.processEvent(Unknown Source) at java.awt.Container.processEvent(Unknown Source) at java.awt.Component.dispatchEventImpl(Unknown Source) at java.awt.Container.dispatchEventImpl(Unknown Source) at java.awt.Component.dispatchEvent(Unknown Source) at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source) at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source) at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source) at java.awt.Container.dispatchEventImpl(Unknown Source) at java.awt.Window.dispatchEventImpl(Unknown Source) at java.awt.Component.dispatchEvent(Unknown Source) at java.awt.EventQueue.dispatchEventImpl(Unknown Source) at java.awt.EventQueue.access$400(Unknown Source) at java.awt.EventQueue$3.run(Unknown Source) at java.awt.EventQueue$3.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) at java.awt.EventQueue$4.run(Unknown Source) at java.awt.EventQueue$4.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) at java.awt.EventQueue.dispatchEvent(Unknown Source) at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source) at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source) at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source) at java.awt.EventDispatchThread.pumpEvents(Unknown Source) at java.awt.EventDispatchThread.pumpEvents(Unknown Source) at java.awt.EventDispatchThread.run(Unknown Source) Caused by: java.io.IOException: Could not delete temporary file
[jira] [Commented] (TIKA-1593) Doco: Broken link to Parser Quick Start Guide
[ https://issues.apache.org/jira/browse/TIKA-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492638#comment-14492638 ] Tyler Palsulich commented on TIKA-1593: --- See https://svn.apache.org/repos/asf/tika/site/src/site/apt/download.apt.vm -- you need the vm extension. Then, you can use {code}${project.parent.version}{code} to get the current version of the project. Then, when we update the site for a new release, you just have to change the version number in the site's pom.xml file. I'll fix this right now. Doco: Broken link to Parser Quick Start Guide --- Key: TIKA-1593 URL: https://issues.apache.org/jira/browse/TIKA-1593 Project: Tika Issue Type: Bug Components: documentation Affects Versions: 1.7 Reporter: Dan Rollo Priority: Minor The Tika web page: https://tika.apache.org/contribute.html, under the Section: New Parsers, Detectors and Mime Types, there is a link with the text: Parser Quick Start Guide. The link URL is: https://tika.apache.org/parser_guide.apt, and does not work. The .apt extension seems odd. I don't know what the link should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TIKA-1593) Doco: Broken link to Parser Quick Start Guide
[ https://issues.apache.org/jira/browse/TIKA-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1593. --- Resolution: Fixed Assignee: Tyler Palsulich Fixed in r1673240. Thank you [~bhamail]! Please let us know if you find any more. Doco: Broken link to Parser Quick Start Guide --- Key: TIKA-1593 URL: https://issues.apache.org/jira/browse/TIKA-1593 Project: Tika Issue Type: Bug Components: documentation Affects Versions: 1.7 Reporter: Dan Rollo Assignee: Tyler Palsulich Priority: Minor The Tika web page: https://tika.apache.org/contribute.html, under the Section: New Parsers, Detectors and Mime Types, there is a link with the text: Parser Quick Start Guide. The link URL is: https://tika.apache.org/parser_guide.apt, and does not work. The .apt extension seems odd. I don't know what the link should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TIKA-1593) Doco: Broken link to Parser Quick Start Guide
[ https://issues.apache.org/jira/browse/TIKA-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492662#comment-14492662 ] Tyler Palsulich edited comment on TIKA-1593 at 4/13/15 5:02 PM: Fixed in r1673240 and r1673241. Thank you [~bhamail]! Please let us know if you find any more. was (Author: tpalsulich): Fixed in r1673240. Thank you [~bhamail]! Please let us know if you find any more. Doco: Broken link to Parser Quick Start Guide --- Key: TIKA-1593 URL: https://issues.apache.org/jira/browse/TIKA-1593 Project: Tika Issue Type: Bug Components: documentation Affects Versions: 1.7 Reporter: Dan Rollo Assignee: Tyler Palsulich Priority: Minor The Tika web page: https://tika.apache.org/contribute.html, under the Section: New Parsers, Detectors and Mime Types, there is a link with the text: Parser Quick Start Guide. The link URL is: https://tika.apache.org/parser_guide.apt, and does not work. The .apt extension seems odd. I don't know what the link should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (TIKA-1600) Unable to parse ODT files because of failed to close temporary resources
[ https://issues.apache.org/jira/browse/TIKA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich resolved TIKA-1600. --- Resolution: Fixed Assignee: Hong-Thai Nguyen Thanks, [~thaichat04]! I just updated it -- reformatted the ODF parsing files (they were all a bit odd with whitespace) and moved the test into the existing test file. Marking this as fixed and will cut a new release shortly. Unable to parse ODT files because of failed to close temporary resources Key: TIKA-1600 URL: https://issues.apache.org/jira/browse/TIKA-1600 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.8 Environment: Windows Reporter: Hong-Thai Nguyen Assignee: Hong-Thai Nguyen Attachments: Manuel_koha.odt Many ODT files are failed to parse causing of this exception. A sample file in attachment {code} Apache Tika was unable to parse the document at C:\Users\hong-thai.nguyen\Downloads\Manuel_koha.odt. The full exception stack trace is included below: org.apache.tika.exception.TikaException: Failed to close temporary resources at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:152) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:127) at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:342) at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:299) at org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:256) at javax.swing.AbstractButton.fireActionPerformed(Unknown Source) at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source) at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source) at javax.swing.DefaultButtonModel.setPressed(Unknown Source) at javax.swing.AbstractButton.doClick(Unknown Source) at javax.swing.plaf.basic.BasicMenuItemUI.doClick(Unknown Source) at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(Unknown Source) at java.awt.Component.processMouseEvent(Unknown Source) at javax.swing.JComponent.processMouseEvent(Unknown Source) at java.awt.Component.processEvent(Unknown Source) at java.awt.Container.processEvent(Unknown Source) at java.awt.Component.dispatchEventImpl(Unknown Source) at java.awt.Container.dispatchEventImpl(Unknown Source) at java.awt.Component.dispatchEvent(Unknown Source) at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source) at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source) at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source) at java.awt.Container.dispatchEventImpl(Unknown Source) at java.awt.Window.dispatchEventImpl(Unknown Source) at java.awt.Component.dispatchEvent(Unknown Source) at java.awt.EventQueue.dispatchEventImpl(Unknown Source) at java.awt.EventQueue.access$400(Unknown Source) at java.awt.EventQueue$3.run(Unknown Source) at java.awt.EventQueue$3.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) at java.awt.EventQueue$4.run(Unknown Source) at java.awt.EventQueue$4.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) at java.awt.EventQueue.dispatchEvent(Unknown Source) at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source) at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source) at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source) at java.awt.EventDispatchThread.pumpEvents(Unknown Source) at java.awt.EventDispatchThread.pumpEvents(Unknown Source) at java.awt.EventDispatchThread.run(Unknown Source) Caused by: java.io.IOException: Could not delete temporary file C:\Users\HONG-T~1.NGU\AppData\Local\Temp\apache-tika-2891340188156641845.tmp at org.apache.tika.io.TemporaryResources$1.close(TemporaryResources.java:70) at org.apache.tika.io.TemporaryResources.close(TemporaryResources.java:121) at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:150) ... 42 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1600) Unable to parse ODT files because of failed to close temporary resources
[ https://issues.apache.org/jira/browse/TIKA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich updated TIKA-1600: -- Priority: Blocker (was: Major) Unable to parse ODT files because of failed to close temporary resources Key: TIKA-1600 URL: https://issues.apache.org/jira/browse/TIKA-1600 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.8 Environment: Windows Reporter: Hong-Thai Nguyen Assignee: Hong-Thai Nguyen Priority: Blocker Attachments: Manuel_koha.odt Many ODT files are failed to parse causing of this exception. A sample file in attachment {code} Apache Tika was unable to parse the document at C:\Users\hong-thai.nguyen\Downloads\Manuel_koha.odt. The full exception stack trace is included below: org.apache.tika.exception.TikaException: Failed to close temporary resources at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:152) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:127) at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:342) at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:299) at org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:256) at javax.swing.AbstractButton.fireActionPerformed(Unknown Source) at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source) at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source) at javax.swing.DefaultButtonModel.setPressed(Unknown Source) at javax.swing.AbstractButton.doClick(Unknown Source) at javax.swing.plaf.basic.BasicMenuItemUI.doClick(Unknown Source) at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(Unknown Source) at java.awt.Component.processMouseEvent(Unknown Source) at javax.swing.JComponent.processMouseEvent(Unknown Source) at java.awt.Component.processEvent(Unknown Source) at java.awt.Container.processEvent(Unknown Source) at java.awt.Component.dispatchEventImpl(Unknown Source) at java.awt.Container.dispatchEventImpl(Unknown Source) at java.awt.Component.dispatchEvent(Unknown Source) at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source) at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source) at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source) at java.awt.Container.dispatchEventImpl(Unknown Source) at java.awt.Window.dispatchEventImpl(Unknown Source) at java.awt.Component.dispatchEvent(Unknown Source) at java.awt.EventQueue.dispatchEventImpl(Unknown Source) at java.awt.EventQueue.access$400(Unknown Source) at java.awt.EventQueue$3.run(Unknown Source) at java.awt.EventQueue$3.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) at java.awt.EventQueue$4.run(Unknown Source) at java.awt.EventQueue$4.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source) at java.awt.EventQueue.dispatchEvent(Unknown Source) at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source) at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source) at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source) at java.awt.EventDispatchThread.pumpEvents(Unknown Source) at java.awt.EventDispatchThread.pumpEvents(Unknown Source) at java.awt.EventDispatchThread.run(Unknown Source) Caused by: java.io.IOException: Could not delete temporary file C:\Users\HONG-T~1.NGU\AppData\Local\Temp\apache-tika-2891340188156641845.tmp at org.apache.tika.io.TemporaryResources$1.close(TemporaryResources.java:70) at org.apache.tika.io.TemporaryResources.close(TemporaryResources.java:121) at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:150) ... 42 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Release Apache Tika 1.8 Candidate #1
Hi Folks, Marking this VOTE as failed. Now that the above issues have been addressed, I'll cut a new release. Please let me know if you find any other blockers. Thanks, Tyler On Mon, Apr 13, 2015 at 12:45 AM, Hong-Thai Nguyen hngu...@customermatrix.com wrote: Not yet, I'm investigating more on TIKA-1600 today. Hong-Thai -Message d'origine- De : Allison, Timothy B. [mailto:talli...@mitre.org] Envoyé : lundi 13 avril 2015 01:07 À : dev@tika.apache.org Objet : RE: [VOTE] Release Apache Tika 1.8 Candidate #1 I don't think we've solved TIKA-1600, yet, or have we? -Original Message- From: Tyler Palsulich [mailto:tpalsul...@gmail.com] Sent: Sunday, April 12, 2015 12:12 AM To: dev@tika.apache.org Subject: Re: [VOTE] Release Apache Tika 1.8 Candidate #1 Are we ready for another RC? I'd like to make sure the above issues are (believed to be) settled before the next cut. Thanks, Tyler On Apr 10, 2015 4:55 PM, David Meikle loo...@gmail.com wrote: On 10 Apr 2015, at 11:38, Allison, Timothy B. talli...@mitre.org wrote: I agree that the ODT issue might require a respin. What do others think? +1 for re-spin. Unfortunately, there might be 2 odt docs (mime type: “application/vnd.oasis.opendocument.text”?) in govdocs1…so we wouldn't see that problem. I did do a comparison of 1.7 vs 1.8-rc1, and the results are here: https://github.com/tballison/share/blob/master/tika_comparisons/tika_1 _7_v_1_8-rc1.zip https://github.com/tballison/share/blob/master/tika_comparisons/tika_1 _7_v_1_8-rc1.zip I encourage folks (if you haven't, and if you care :) ) to take a look and see if you see something that I don’t. Thanks for this Tim. About to get on a flight, so will check through on that. Cheers, Dave
[jira] [Commented] (TIKA-1602) Detecting standards-non-compliant emails as message/rfc822
[ https://issues.apache.org/jira/browse/TIKA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492540#comment-14492540 ] Jeremy B. Merrill commented on TIKA-1602: - Sounds about right, thanks for finding that for me. I'll go ahead and mark the issue a dupe or close it. Any idea when that patch'll get merged into trunk? (Or -- since I'm an svn n00b -- if there's a way for me to download that patched version.) Detecting standards-non-compliant emails as message/rfc822 -- Key: TIKA-1602 URL: https://issues.apache.org/jira/browse/TIKA-1602 Project: Tika Issue Type: New Feature Reporter: Jeremy B. Merrill Priority: Minor Original Estimate: 1h Remaining Estimate: 1h Tika does not properly detect certain emails as `message/rfc822` if they're slightly standards-non-compliant and begin with `Status: ` as the first header. I've added `Status: ` as a magic detection line in tika-mimetypes.xml. This solves my problem and does not appear to cause unit test failures. I have not yet run the tika-batch tests. As further information, the emails that are processed incorrectly come from dumps directly from various US public officials' mailservers. The dumps, I believe since they're not intended to be transmitted over the wire, sometimes are slightly non-compliant. It's important to note that Tika (and the underlying library, James Mime4J) do properly *parse* these emails, despite the non-compliant header. The problem is getting Tika to *detect* the file as an email so that Mime4J gets chosen to parse it. Pull request on Github at https://github.com/apache/tika/pull/40 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (TIKA-1602) Detecting standards-non-compliant emails as message/rfc822
[ https://issues.apache.org/jira/browse/TIKA-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy B. Merrill closed TIKA-1602. --- Resolution: Duplicate Detecting standards-non-compliant emails as message/rfc822 -- Key: TIKA-1602 URL: https://issues.apache.org/jira/browse/TIKA-1602 Project: Tika Issue Type: New Feature Reporter: Jeremy B. Merrill Priority: Minor Original Estimate: 1h Remaining Estimate: 1h Tika does not properly detect certain emails as `message/rfc822` if they're slightly standards-non-compliant and begin with `Status: ` as the first header. I've added `Status: ` as a magic detection line in tika-mimetypes.xml. This solves my problem and does not appear to cause unit test failures. I have not yet run the tika-batch tests. As further information, the emails that are processed incorrectly come from dumps directly from various US public officials' mailservers. The dumps, I believe since they're not intended to be transmitted over the wire, sometimes are slightly non-compliant. It's important to note that Tika (and the underlying library, James Mime4J) do properly *parse* these emails, despite the non-compliant header. The problem is getting Tika to *detect* the file as an email so that Mime4J gets chosen to parse it. Pull request on Github at https://github.com/apache/tika/pull/40 -- This message was sent by Atlassian JIRA (v6.3.4#6332)