Yes, Tim, I saw all these reporting artifacs, I agree they are good things.
2017-12-08 14:32 GMT-02:00 Allison, Timothy B. <talli...@mitre.org>: > Thank you, Luís. I’ve finally had a chance to take a look. As exceptions > go, the PPT is the most eye-opening. I don’t know how I didn’t catch > those…ugh. > > > > There are a bunch more exceptions for zerobyte file exceptions in > attachments, but this is a good thing, because now we can figure out if > those are corrupt files, missing dependencies or something else…just a > reporting artifact. > > > > There are a bunch more exceptions for emf/wmf caused by “safelyAllocate”, > which, I think, is a good thing. After the release, I’ll want to look at > those to see if we need improvements in emf/wmf parsing, or if we need to > bump the maximum expected byte lengths in the calls to safelyAllocate, or > if the files are just plain corrupt. > > > > After I fix TIKA-2483, I think I’ll be good to roll rc1 for 1.17. > > > > Anything else holding us back? > > > > *From:* Luís Filipe Nassif [mailto:lfcnas...@gmail.com] > *Sent:* Thursday, December 7, 2017 1:18 PM > *To:* dev@tika.apache.org; Allison, Timothy B. <talli...@mitre.org> > *Subject:* Fwd: Tika 1.17? > > > > Oh sorry, I thought I have sent to dev list, forwarding... > > > > Luis > > > > ---------- Forwarded message ---------- > From: *Allison, Timothy B.* <talli...@mitre.org> > Date: 2017-12-07 14:10 GMT-02:00 > Subject: RE: Tika 1.17? > To: "lfcnas...@gmail.com" <lfcnas...@gmail.com> > > Agreed. Thank you! Do you mind sharing this with the list? > > > > *From:* Luís Filipe Nassif [mailto:lfcnas...@gmail.com] > *Sent:* Thursday, December 7, 2017 10:26 AM > *To:* Allison, Timothy B. <talli...@mitre.org> > *Subject:* RE: Tika 1.17? > > > > Hi Tim, > > > > I don't think it is a blocker, maybe a minor regression, given we are much > better with 20x more fixed exceptions. I sent it just to let us be aware. > There are some few ~40 new exceptions with pdf, and 20x more fixed ones, so > my vote is to go for 1.17! > > > > Luis > > > > > > Em 7 de dez de 2017 11:47 AM, "Allison, Timothy B." <talli...@mitre.org> > escreveu: > > Thank you, Luís! Given where POI is in its dev cycle, should we go for a > release of 1.17 now and then push for a 1.17.1 as soon as POI fixes this? > Should we revert to 3.17-beta1? (wait, we can't do this because of a bug > that prevents parsing of pptx in Solr) > > Or is this grave enough to wait a few months before we release 1.17? > > I found a zip/mime detection issue that we need to fix at the Tika level, > but that fix is trivial. > > > -----Original Message----- > From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com] > Sent: Wednesday, December 6, 2017 9:30 AM > To: dev@tika.apache.org > Subject: Re: Tika 1.17? > > Hi Tim, > > I've had a briefly look at exceptions folder, seems we are much better > with ppt (4677 fixed exceptions) and pdf (7798), but there are 208 new > exceptions with ppt. I did not check the files to see if they are > corrupted, but some common tokens were lost. Below the most common new > stacktrace: > > org.apache.poi.hslf.exceptions.HSLFException: Couldn't instantiate the > class for type with id 1000 on class class org.apache.poi.hslf.record.Document > : > java.lang.reflect.InvocationTargetException > Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't > instantiate the class for type with id 1010 on class class > org.apache.poi.hslf.record.Environment : > java.lang.reflect.InvocationTargetException > Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't > instantiate the class for type with id 2005 on class class > org.apache.poi.hslf.record.FontCollection : > java.lang.reflect.InvocationTargetException > Cause was : java.lang.IllegalArgumentException: typeface can't be null > nor empty at org.apache.poi.hslf.record.Record.createRecordForType( > Record.java:186) > at org.apache.poi.hslf.record.Record.buildRecordAtOffset(Record.java:104) > at > org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.read( > HSLFSlideShowImpl.java:279) > at > org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.buildRecords( > HSLFSlideShowImpl.java:260) > at > org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.<init>( > HSLFSlideShowImpl.java:166) > at > org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:181) > at > org.apache.tika.parser.microsoft.HSLFExtractor.parse( > HSLFExtractor.java:78) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:179) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at org.apache.tika.parser.AutoDetectParser.parse( > AutoDetectParser.java:143) > at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188) > at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84) > at > org.apache.tika.parser.RecursiveParserWrapper.parse( > RecursiveParserWrapper.java:158) > at > org.apache.tika.batch.FileResourceConsumer.parse( > FileResourceConsumer.java:406) > at > org.apache.tika.batch.fs.RecursiveParserWrapperFSConsum > er.processFileResource(RecursiveParserWrapperFSConsumer.java:104) > at > org.apache.tika.batch.FileResourceConsumer._processFileResource( > FileResourceConsumer.java:181) > at > org.apache.tika.batch.FileResourceConsumer.call( > FileResourceConsumer.java:115) > at > org.apache.tika.batch.FileResourceConsumer.call( > FileResourceConsumer.java:50) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedConstructorAccessor283.newInstance(Unknown > Source) at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance( > DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182) > ... 25 more > Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't > instantiate the class for type with id 1010 on class class > org.apache.poi.hslf.record.Environment : > java.lang.reflect.InvocationTargetException > Cause was : org.apache.poi.hslf.exceptions.HSLFException: Couldn't > instantiate the class for type with id 2005 on class class > org.apache.poi.hslf.record.FontCollection : > java.lang.reflect.InvocationTargetException > Cause was : java.lang.IllegalArgumentException: typeface can't be null > nor empty at org.apache.poi.hslf.record.Record.createRecordForType( > Record.java:186) > at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:129) > at org.apache.poi.hslf.record.Document.<init>(Document.java:133) > ... 29 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedConstructorAccessor285.newInstance(Unknown > Source) at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance( > DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182) > ... 31 more > Caused by: org.apache.poi.hslf.exceptions.HSLFException: Couldn't > instantiate the class for type with id 2005 on class class > org.apache.poi.hslf.record.FontCollection : > java.lang.reflect.InvocationTargetException > Cause was : java.lang.IllegalArgumentException: typeface can't be null > nor empty at org.apache.poi.hslf.record.Record.createRecordForType( > Record.java:186) > at org.apache.poi.hslf.record.Record.findChildRecords(Record.java:129) > at org.apache.poi.hslf.record.Environment.<init>(Environment.java:54) > ... 35 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedConstructorAccessor286.newInstance(Unknown > Source) at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance( > DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.poi.hslf.record.Record.createRecordForType(Record.java:182) > ... 37 more > Caused by: java.lang.IllegalArgumentException: typeface can't be null nor > empty at > org.apache.poi.hslf.usermodel.HSLFFontInfo.setTypeface( > HSLFFontInfo.java:129) > at org.apache.poi.hslf.usermodel.HSLFFontInfo.<init>(HSLFFontInfo.java:74) > at org.apache.poi.hslf.record.FontCollection.<init>( > FontCollection.java:47) > ... 41 more > > > 2017-12-05 21:44 GMT-02:00 Allison, Timothy B. <talli...@mitre.org>: > > > Reports are here: > > > > http://162.242.228.174/reports/reports_Tika1_16V1_17.zip > > > > I haven't had a chance to look. Tomorrow... > > > > Let me know what you find. > > > > -----Original Message----- > > From: Allison, Timothy B. [mailto:talli...@mitre.org] > > Sent: Wednesday, November 29, 2017 1:08 PM > > To: dev@tika.apache.org > > Subject: RE: Tika 1.17? > > > > +1 > > > > -----Original Message----- > > From: Chris Mattmann [mailto:mattm...@apache.org] > > Sent: Wednesday, November 29, 2017 12:57 PM > > To: dev@tika.apache.org > > Subject: Re: Tika 1.17? > > > > Thanks so much for fixing this. It worked during MEMEX and then I > > think has since fallen out of date and perhaps I committed Zarana’s > > code wrong or something. Will be great to get this working! > > > > > > > > On 11/29/17, 9:54 AM, "David Meikle" <loo...@gmail.com> wrote: > > > > I am thinking TIKA-2385. I've got a resized image that I can > > commit tonight > > that should close this one off. > > > > Cheers, > > Dave > > > > > > On 29 Nov 2017 14:42, "Allison, Timothy B." <talli...@mitre.org> > > wrote: > > > > Many thanks to Bob for help on TIKA-2502! > > > > Anything else we want to put into 1.17 before I run the regression > > tests? > > > > -----Original Message----- > > From: Allison, Timothy B. [mailto:talli...@mitre.org] > > Sent: Monday, November 13, 2017 1:42 PM > > To: dev@tika.apache.org > > Subject: RE: Tika 1.17? > > > > Y. You're right. Thank you! > > > > I think I've been avoiding that because there were some regressions > in > > metadata-extractor last I looked at this. Let's hope those are gone > in > > 2.10.1. > > > > -----Original Message----- > > From: Tyler Bui-Palsulich [mailto:tpalsul...@apache.org] > > Sent: Sunday, November 12, 2017 2:54 PM > > To: dev@tika.apache.org > > Subject: RE: Tika 1.17? > > > > TIKA-2486 might be worth blocking on since there is a CVE. > > > > Tyler > > > > On Nov 6, 2017 5:26 AM, "Allison, Timothy B." <talli...@mitre.org> > > wrote: > > > > > Y. I'm happy enough to wait a few more days. I wasn't able to > kick > > > off the regression tests last week. Should I wait for the new > > parsers > > > to run the regression tests? > > > > > > -----Original Message----- > > > From: David Meikle [mailto:loo...@gmail.com] > > > Sent: Friday, November 3, 2017 7:42 PM > > > To: dev@tika.apache.org > > > Subject: Re: Tika 1.17? > > > > > > Sounds good. I have a couple of new parsers I would like to slot in > > > but not had a chance the last few months. Will go for it over the > > > weekend, if that works for you Tim. > > > > > > Cheers, > > > Dave > > > > > > > > > > > > On 3 November 2017 at 15:19, Mattmann, Chris A (3010) < > > > chris.a.mattm...@jpl.nasa.gov> wrote: > > > > > > > Let’s make it so ( > > > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > ++++++++++++++ > > > > Chris Mattmann, Ph.D. > > > > Principal Data Scientist, Engineering Administrative Office > (3010) > > > > Manager, NSF & Open Source Projects Formulation and Development > > > > Offices > > > > (8212) > > > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > > > Office: 180-503E, Mailstop: 180-503 > > > > Email: chris.a.mattm...@nasa.gov > > > > WWW: http://sunset.usc.edu/~mattmann/ > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > ++++++++++++++ > > > > Director, Information Retrieval and Data Science Group (IRDS) > > > > Adjunct Associate Professor, Computer Science Department > University > > > > of Southern California, Los Angeles, CA 90089 USA > > > > WWW: http://irds.usc.edu/ > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > ++++++++++++++ > > > > > > > > > > > > > > > > On 11/3/17, 7:35 AM, "Allison, Timothy B." > > <talli...@mitre.org> > > wrote: > > > > > > > > All, > > > > > > > > PDFBox 2.0.8 is now integrated. I want to fix TIKA-2490 > before > > > > we release 1.17. Are there other issues that are blockers or > you'd > > > > like to fix before 1.17 (TIKA-2471, maybe?)? > > > > > > > > I plan to run initial large scale regression tests shortly > for > > > > rfc822 and mbox because of TIKA-2478. I'll run the full > regression > > > > tests before cutting the RC, but I want to focus on those for > now. > > Other requests? > > > > > > > > Cheers, > > > > > > > > Tim > > > > > > > > > > > > > > > > > > > > > > > > > > > >