+1 Thank you! -----Original Message----- From: Chris Mattmann [mailto:mattm...@apache.org] Sent: Thursday, May 18, 2017 10:15 AM To: dev@tika.apache.org Subject: Re: Tika 1.15
Hey Tim, I am, Luis is, you are, that’s probably a good enough start. I’ll roll the RC this afternoon, early AM pacific tomorrow! Cheers, Chris On 5/18/17, 3:56 AM, "Allison, Timothy B." <talli...@mitre.org> wrote: Yes, yes we are...if you and fellow devs are ok with the log message in TIKA-2359. Happy to change that message if there are any concerns/recommendations. Onward! Thank you! Cheers, Tim -----Original Message----- From: Chris Mattmann [mailto:mattm...@apache.org] Sent: Wednesday, May 17, 2017 10:01 PM To: dev@tika.apache.org Subject: Re: Tika 1.15 Tim, are we good for 1.15? Should I roll the RC? Thanks! On 5/17/17, 3:50 AM, "Allison, Timothy B." <talli...@mitre.org> wrote: Full report on attachment # diffs: http://162.242.228.174/reports/attachment_diffs_complete_20170516.xlsx Still need to look through contents diffs. -----Original Message----- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Tuesday, May 16, 2017 3:11 PM To: dev@tika.apache.org Subject: RE: Tika 1.15 I reran the eval with some updates, including rc1 of PDFBox 2.0.6, which is now integrated. http://162.242.228.174/reports/reports_tika_20170515.tar.gz I need to do some more digging on attachments -- hit max limit. The decrease in attachments from the few docs I reviewed is explained by change in default behavior of macro extraction -- in 1.14 we were extracting macros by default, but we aren't doing this in 1.15. However, I want to look at more than the first x diffs because there may be other file formats further down the results that weren't included in the report. I also want to look at the contents...haven't had a chance. > On May 1, 2017 3:59 PM, "Allison, Timothy B." <talli...@mitre.org> > wrote: > > > Sounds good. W00t! > > > > -----Original Message----- > > From: Chris Mattmann [mailto:mattm...@apache.org] > > Sent: Monday, May 1, 2017 4:57 PM > > To: dev@tika.apache.org > > Subject: Re: Tika 1.15 > > > > Thanks Tim. I am going to try and get tika-dl added (if > possible), and > > also try the Sentiment Parser next. If I can get one or both of those > > (in the next day or so), then I will give you the heads up to > begin testing. > > Video recognition is in! > > > > > > > > > > > > On 5/1/17, 12:42 PM, "Allison, Timothy B." <talli...@mitre.org> > wrote: > > > > I finally had a chance to look through the results of the first > > regression run. > > > > I made a few trivial changes to our parsers and to tika-eval. > > > > We appear to have many more exceptions in files parsed by our > > CompressorParser, but this is because of reporting...not because of > > reality > > -- the exception is now coming in the container file, not an > > attachment...and tika-eval wasn't matching A and B correctly. > > > > There is a regression that's been fixed in PDFBox trunk > > (PDFBOX-3717), but I don't see that as a blocker. > > > > We have new exceptions in the new parsers, EMF, WMF, .xlsb, > > wordperfect, but that's because we're actually parsing those now. :) > > > > All else looks to be in decent shape. > > > > Chris and Team and All, > > Let me know when you're ready for me to kick off the next > > regression run. > > > > Cheers, > > > > Tim > > > > > > > > > > -----Original Message----- > > From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl. > nasa.gov] > > Sent: Wednesday, April 26, 2017 12:48 PM > > To: dev@tika.apache.org > > Subject: Re: Tika 1.15 > > > > Thank you! > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++++++++ > > Chris Mattmann, Ph.D. > > Principal Data Scientist, Engineering Administrative Office > (3010) > > Manager, NSF & Open Source Projects Formulation and Development > > Offices > > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 180-503E, Mailstop: 180-503 > > Email: chris.a.mattm...@nasa.gov > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++++++++ > > Director, Information Retrieval and Data Science Group (IRDS) > > Adjunct Associate Professor, Computer Science Department > University of > > Southern California, Los Angeles, CA 90089 USA > > WWW: http://irds.usc.edu/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++++++++ > > > > > > On 4/26/17, 9:35 AM, "Allison, Timothy B." <talli...@mitre.org> > wrote: > > > > Oh. Ok. Will wait, then? > > > > -----Original Message----- > > From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl. > > nasa.gov] > > Sent: Wednesday, April 26, 2017 11:38 AM > > To: dev@tika.apache.org > > Subject: Re: Tika 1.15 > > > > I want to see if I can get in the VideoRecognition parser, > and > > also the Sentiment one. > > > > I hope to get it done in the next day or so. Thanks. > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++++++++ > > Chris Mattmann, Ph.D. > > Principal Data Scientist, Engineering Administrative Office > > (3010) Manager, NSF & Open Source Projects Formulation and > Development > > Offices > > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 180-503E, Mailstop: 180-503 > > Email: chris.a.mattm...@nasa.gov > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++++++++ > > Director, Information Retrieval and Data Science Group (IRDS) > > Adjunct Associate Professor, Computer Science Department > University of > > Southern California, Los Angeles, CA 90089 USA > > WWW: http://irds.usc.edu/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++++++++++++++ > > > > > > On 4/26/17, 7:54 AM, "Allison, Timothy B." > > <talli...@mitre.org> > > wrote: > > > > With the added TSD parser, I think I should rerun the > > regression testing. Given that, I also fixed 2099, and we'll benefit > > from a rerun. > > > > Anything else before I rerun the regression testing? > > > > Any problems observed in first run? > > > > > > > > > > > > > > > > > > > > > >