[jira] [Commented] (TIKA-1176) ChmDirectoryListingSet does not correctly enumerate directory entries
[ https://issues.apache.org/jira/browse/TIKA-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169146#comment-14169146 ] Hong-Thai Nguyen commented on TIKA-1176: Hi [~mdgeek], thank for your offering code testing file. Unfortunately, this check raised other exception on this file: {code} The full exception stack trace is included below: org.apache.tika.exception.TikaException at org.apache.tika.parser.chm.core.ChmExtractor.extractChmEntry(ChmExtractor.java:355) at org.apache.tika.parser.chm.ChmParser.parse(ChmParser.java:70) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:247) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:326) at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:285) at org.apache.tika.gui.ParsingTransferHandler.importFiles(ParsingTransferHandler.java:94) at org.apache.tika.gui.ParsingTransferHandler.importData(ParsingTransferHandler.java:77) at javax.swing.TransferHandler.importData(TransferHandler.java:755) at javax.swing.TransferHandler$DropHandler.drop(TransferHandler.java:1478) at java.awt.dnd.DropTarget.drop(DropTarget.java:434) at javax.swing.TransferHandler$SwingDropTarget.drop(TransferHandler.java:1203) at sun.awt.dnd.SunDropTargetContextPeer.processDropMessage(SunDropTargetContextPeer.java:519) at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchDropEvent(SunDropTargetContextPeer.java:832) at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchEvent(SunDropTargetContextPeer.java:756) at sun.awt.dnd.SunDropTargetEvent.dispatch(SunDropTargetEvent.java:30) at java.awt.Component.dispatchEventImpl(Component.java:4517) at java.awt.Container.dispatchEventImpl(Container.java:2097) at java.awt.Component.dispatchEvent(Component.java:4488) at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4575) at java.awt.LightweightDispatcher.processDropTargetEvent(Container.java:4310) at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4161) at java.awt.Container.dispatchEventImpl(Container.java:2083) at java.awt.Window.dispatchEventImpl(Window.java:2489) at java.awt.Component.dispatchEvent(Component.java:4488) at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:674) at java.awt.EventQueue.access$400(EventQueue.java:81) at java.awt.EventQueue$2.run(EventQueue.java:633) at java.awt.EventQueue$2.run(EventQueue.java:631) at java.security.AccessController.doPrivileged(Native Method) at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87) at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:98) at java.awt.EventQueue$3.run(EventQueue.java:647) at java.awt.EventQueue$3.run(EventQueue.java:645) at java.security.AccessController.doPrivileged(Native Method) at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87) at java.awt.EventQueue.dispatchEvent(EventQueue.java:644) at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:269) at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:184) at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:174) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:169) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:161) at java.awt.EventDispatchThread.run(EventDispatchThread.java:122) Caused by: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.tika.parser.chm.core.ChmCommons.copyOfRange(ChmCommons.java:342) at org.apache.tika.parser.chm.core.ChmCommons.getChmBlockSegment(ChmCommons.java:108) at org.apache.tika.parser.chm.core.ChmExtractor.extractChmEntry(ChmExtractor.java:337) ... 43 more {code} It's quite complex our CHM Parser, can you apply a full fix and a test with expected content in output on your file ? Thanks, ChmDirectoryListingSet does not correctly enumerate directory entries - Key: TIKA-1176 URL: https://issues.apache.org/jira/browse/TIKA-1176 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.4 Reporter: Doug Martin Attachments: HelpStudioSample.chm
[jira] [Commented] (TIKA-1176) ChmDirectoryListingSet does not correctly enumerate directory entries
[ https://issues.apache.org/jira/browse/TIKA-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783218#comment-13783218 ] Doug Martin commented on TIKA-1176: --- The following change fixes the problem: {code} if (indexUserData indexWorkData || indexWorkData == -1) { setPlaceHolder(indexUserData); } else { setPlaceHolder(indexWorkData); } {code} ChmDirectoryListingSet does not correctly enumerate directory entries - Key: TIKA-1176 URL: https://issues.apache.org/jira/browse/TIKA-1176 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.4 Reporter: Doug Martin ChmDirectoryListingSet.enumerateOneSegment method does not correctly enumerate directory entries when ChmCommons.indexOf returns -1 for work data or user data. Here is the offending code: {code} int indexWorkData = ChmCommons.indexOf(dir_chunk, ::.getBytes()); int indexUserData = ChmCommons.indexOf(dir_chunk, /.getBytes()); if (indexUserData indexWorkData) setPlaceHolder(indexUserData); else setPlaceHolder(indexWorkData); if (getPlaceHolder() 0 ... {code} If either indexUserData or indexWorkData is -1, that value will be set as the placeholder index, resulting in the method returning without processing any entries. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (TIKA-1176) ChmDirectoryListingSet does not correctly enumerate directory entries
[ https://issues.apache.org/jira/browse/TIKA-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783286#comment-13783286 ] Nick Burch commented on TIKA-1176: -- Any chance you could upload a small sample file that shows the problem? We could then use that in a unit test, to verify the fix, and also so ensure it stays fixed! ChmDirectoryListingSet does not correctly enumerate directory entries - Key: TIKA-1176 URL: https://issues.apache.org/jira/browse/TIKA-1176 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.4 Reporter: Doug Martin ChmDirectoryListingSet.enumerateOneSegment method does not correctly enumerate directory entries when ChmCommons.indexOf returns -1 for work data or user data. Here is the offending code: {code} int indexWorkData = ChmCommons.indexOf(dir_chunk, ::.getBytes()); int indexUserData = ChmCommons.indexOf(dir_chunk, /.getBytes()); if (indexUserData indexWorkData) setPlaceHolder(indexUserData); else setPlaceHolder(indexWorkData); if (getPlaceHolder() 0 ... {code} If either indexUserData or indexWorkData is -1, that value will be set as the placeholder index, resulting in the method returning without processing any entries. -- This message was sent by Atlassian JIRA (v6.1#6144)