Thanks! +1 BR, Oleg
On Tue, Aug 4, 2015 at 5:37 AM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > +1 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > -----Original Message----- > From: "Allison, Timothy B." <talli...@mitre.org> > Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> > Date: Tuesday, July 28, 2015 at 11:08 AM > To: "dev@tika.apache.org" <dev@tika.apache.org> > Subject: RE: release Tika 1.10? > > >Just finished the run against ~2.8 million docs (4.8 million including > >attachments) from a combination of govdocs1 and Common Crawl. I compared > >1.9 with trunk. > > > >Most looks good. > > > >Some highlights: > >* Thanks to Andrew Jackson and TIKA-1678, we're now getting better > >metadata out of ~1300 from 550k PDFs. This appears to be far more common > >in Common Crawl PDFs than in govdocs1 PDFs. > >* No significant changes found in the handful of msg files...I wanted to > >check after the work on TIKA-1238. > >* Thanks to Andreas Beeker and TIKA-1046/POI 54332, there are far fewer > >PPT exceptions > >* There are a very few more files in CommonCrawl that are now incorrectly > >identified as RFC vs text (TIKA-1602), but this is a tiny handful (total > >of 4 documents in both CC and govdocs1) > > > >A regret: > >This run used the digesting parser for both container and embedded files. > > This causes some truncated (=corrupt) package files to throw an > >exception before they otherwise would. The opposite happens, too (more > >embedded files when using the digester), but this is extremely rare. This > >means that for truncated gz, x-xz and x-archive files there are many more > >with fewer attachments in Tika 1.10-SNAPSHOT than in Tika 1.9. > > > >With Konstantin's and Bob's fix of TIKA-1524, I think we're in good shape > >for 1.10...from my perspective. > > > > Best, > > > > Tim > >-----Original Message----- > >From: David Meikle [mailto:loo...@gmail.com] > >Sent: Sunday, July 26, 2015 10:50 AM > >To: dev@tika.apache.org > >Subject: Re: release Tika 1.10? > > > > > >> On 23 Jul 2015, at 14:07, Allison, Timothy B. <talli...@mitre.org> > >>wrote: > >> > >> With the fix of TIKA-1690, I think it makes sense to roll a new > >>release (1.10) in the next week or so. I'd like to get TIKA-1667 > >>(upgrade poi) in before the release. Are there any other blockers on > >>1.10? > > > >+1 from me too. As discussed on private, I will roll the release on > >Tuesday night (UK Time) to give people time to shout for other candidates. > > > >Cheers, > >Dave > >