[jira] [Commented] (TIKA-1191) ForkParser / ClassLoaderProxy does not define package
[ https://issues.apache.org/jira/browse/TIKA-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319839#comment-16319839 ] Nick Burch commented on TIKA-1191: -- [~talli...@mitre.org] I'm minded to apply Ben Romberg's patch from pull #215, any thoughts/comments/objections? > ForkParser / ClassLoaderProxy does not define package > - > > Key: TIKA-1191 > URL: https://issues.apache.org/jira/browse/TIKA-1191 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4, 1.5 >Reporter: Nicolas Belisle > Attachments: ClassLoaderProxy.java.patch, Test.java, test.eml > > > ForkParser will throw an Exception in some cases : > org.apache.tika.exception.TikaException: Invalid embedded resource > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:189) > at > org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:135) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.tika.fork.ForkServer.call(ForkServer.java:144) > at org.apache.tika.fork.ForkServer.processRequests(ForkServer.java:124) > at org.apache.tika.fork.ForkServer.main(ForkServer.java:69) > Caused by: java.lang.NullPointerException > at > org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:136) > at > org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:499) > at > org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:60) > at org.apache.tika.config.TikaConfig.(TikaConfig.java:169) > at > org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:268) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getTikaConfig(AbstractPOIFSExtractor.java:72) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.getDetector(AbstractPOIFSExtractor.java:79) > at > org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:176) > ... 10 more > A patch will follow -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-2545) RereadableInputStream backing byte array not constructed properly
[ https://issues.apache.org/jira/browse/TIKA-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319773#comment-16319773 ] Nick Burch commented on TIKA-2545: -- Are you able to produce a short junit unit test that shows up the problem that your pull request (https://github.com/apache/tika/pull/217) fixes? I'm also wondering if we need to reset size to zero or not on a re-wind, and a unit test seems a good way to check that too! > RereadableInputStream backing byte array not constructed properly > - > > Key: TIKA-2545 > URL: https://issues.apache.org/jira/browse/TIKA-2545 > Project: Tika > Issue Type: Bug > Components: core >Reporter: Eugene Hart >Priority: Minor > > For original inputstreams smaller than buffersize, should create > bytearrayinputstream with bounds determined by size of original input, not > pass in entire buffer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TIKA-2545) RereadableInputStream backing byte array not constructed properly
Eugene Hart created TIKA-2545: - Summary: RereadableInputStream backing byte array not constructed properly Key: TIKA-2545 URL: https://issues.apache.org/jira/browse/TIKA-2545 Project: Tika Issue Type: Bug Components: core Reporter: Eugene Hart Priority: Minor For original inputstreams smaller than buffersize, should create bytearrayinputstream with bounds determined by size of original input, not pass in entire buffer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TIKA-2196) IllegalArgumentException on a valid Excel file
[ https://issues.apache.org/jira/browse/TIKA-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay Kawade updated TIKA-2196: --- Attachment: 1.xls Sample file with only one sheet and 2 cells populated for testing. > IllegalArgumentException on a valid Excel file > -- > > Key: TIKA-2196 > URL: https://issues.apache.org/jira/browse/TIKA-2196 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.14 > Environment: Windows 7 x64, JVM 1.8.0_101 >Reporter: Seva Alekseyev > Attachments: 1.xls, 2007 Experiment watch.xls > > > On the attached Excel file, which opens fine in Excel, Tika throws the > following error: > java.lang.IllegalArgumentException: Cannot format given Object as a Number > at java.text.DecimalFormat.format:-1 > at org.apache.poi.ss.usermodel.ExcelGeneralNumberFormat.format:67 > at java.text.Format.format:-1 > at org.apache.poi.ss.usermodel.DataFormatter.performDateFormatting:736 > at org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents:804 > at org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents:785 > at > org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.formatNumberDateCell:143 > at > org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener$TikaFormatTrackingHSSFListener.formatNumberDateCell:633 > at > org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord:405 > at > org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord:336 > at > org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord:92 > at org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord:109 > at > org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents:179 > at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents:136 > at > org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile:312 > at org.apache.tika.parser.microsoft.ExcelExtractor.parse:169 > at org.apache.tika.parser.microsoft.OfficeParser.parse:177 > at org.apache.tika.parser.microsoft.OfficeParser.parse:130 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TIKA-2196) IllegalArgumentException on a valid Excel file
[ https://issues.apache.org/jira/browse/TIKA-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16319016#comment-16319016 ] Vinay Kawade commented on TIKA-2196: This seems to be happening when a cell is set to custom format with double quotes, for example: {code:java} ""ddd,mmm dd or "", dd, {code} As per, https://bz.apache.org/bugzilla/show_bug.cgi?id=54786 the double double quotes are replaced by a single single quote > IllegalArgumentException on a valid Excel file > -- > > Key: TIKA-2196 > URL: https://issues.apache.org/jira/browse/TIKA-2196 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.14 > Environment: Windows 7 x64, JVM 1.8.0_101 >Reporter: Seva Alekseyev > Attachments: 2007 Experiment watch.xls > > > On the attached Excel file, which opens fine in Excel, Tika throws the > following error: > java.lang.IllegalArgumentException: Cannot format given Object as a Number > at java.text.DecimalFormat.format:-1 > at org.apache.poi.ss.usermodel.ExcelGeneralNumberFormat.format:67 > at java.text.Format.format:-1 > at org.apache.poi.ss.usermodel.DataFormatter.performDateFormatting:736 > at org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents:804 > at org.apache.poi.ss.usermodel.DataFormatter.formatRawCellContents:785 > at > org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.formatNumberDateCell:143 > at > org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener$TikaFormatTrackingHSSFListener.formatNumberDateCell:633 > at > org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord:405 > at > org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord:336 > at > org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord:92 > at org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord:109 > at > org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents:179 > at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents:136 > at > org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile:312 > at org.apache.tika.parser.microsoft.ExcelExtractor.parse:169 > at org.apache.tika.parser.microsoft.OfficeParser.parse:177 > at org.apache.tika.parser.microsoft.OfficeParser.parse:130 -- This message was sent by Atlassian JIRA (v6.4.14#64029)