Y. It is daunting at this point, and please do help! The key sheets I look at:
exceptions/exceptions_compared_by_mime_type.xlsx exceptions/new_exceptions_in_B_by_mime.xlsx mimes/mime_diffs_A_to_B.xlsx attachments/attachment_diffs.xlsx metadata/metadata_value_count_diffs.xlsx I can dump json, but wouldn't it be easier for you to pull directly from the db? My vision is to put a gui on the db that would allow you to visualize the reports/see the data and have links to the original (binary) files plus the extract files for both A and B (perhaps with a diff visualization). Three cheers for d3. -----Original Message----- From: Tyler Bui-Palsulich [mailto:tpalsul...@apache.org] Sent: Monday, May 1, 2017 11:39 PM To: dev@tika.apache.org Subject: RE: Tika 1.15 How exactly did you "evaluate" the results? I opened the zip and looked at a few of the sheets, but it's a bit daunting. Any way we could dump JSON? That's a bit easier to build visualizations for. Tyler On May 1, 2017 3:59 PM, "Allison, Timothy B." <talli...@mitre.org> wrote: > Sounds good. W00t! > > -----Original Message----- > From: Chris Mattmann [mailto:mattm...@apache.org] > Sent: Monday, May 1, 2017 4:57 PM > To: dev@tika.apache.org > Subject: Re: Tika 1.15 > > Thanks Tim. I am going to try and get tika-dl added (if possible), and > also try the Sentiment Parser next. If I can get one or both of those > (in the next day or so), then I will give you the heads up to begin testing. > Video recognition is in! > > > > > > On 5/1/17, 12:42 PM, "Allison, Timothy B." <talli...@mitre.org> wrote: > > I finally had a chance to look through the results of the first > regression run. > > I made a few trivial changes to our parsers and to tika-eval. > > We appear to have many more exceptions in files parsed by our > CompressorParser, but this is because of reporting...not because of > reality > -- the exception is now coming in the container file, not an > attachment...and tika-eval wasn't matching A and B correctly. > > There is a regression that's been fixed in PDFBox trunk > (PDFBOX-3717), but I don't see that as a blocker. > > We have new exceptions in the new parsers, EMF, WMF, .xlsb, > wordperfect, but that's because we're actually parsing those now. :) > > All else looks to be in decent shape. > > Chris and Team and All, > Let me know when you're ready for me to kick off the next > regression run. > > Cheers, > > Tim > > > > > -----Original Message----- > From: Mattmann, Chris A (3010) [mailto:chris.a.mattm...@jpl.nasa.gov] > Sent: Wednesday, April 26, 2017 12:48 PM > To: dev@tika.apache.org > Subject: Re: Tika 1.15 > > Thank you! > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++++++++ > Chris Mattmann, Ph.D. > Principal Data Scientist, Engineering Administrative Office (3010) > Manager, NSF & Open Source Projects Formulation and Development > Offices > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 180-503E, Mailstop: 180-503 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++++++++ > Director, Information Retrieval and Data Science Group (IRDS) > Adjunct Associate Professor, Computer Science Department University of > Southern California, Los Angeles, CA 90089 USA > WWW: http://irds.usc.edu/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++++++++ > > > On 4/26/17, 9:35 AM, "Allison, Timothy B." <talli...@mitre.org> wrote: > > Oh. Ok. Will wait, then? > > -----Original Message----- > From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl. > nasa.gov] > Sent: Wednesday, April 26, 2017 11:38 AM > To: dev@tika.apache.org > Subject: Re: Tika 1.15 > > I want to see if I can get in the VideoRecognition parser, and > also the Sentiment one. > > I hope to get it done in the next day or so. Thanks. > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++++++++ > Chris Mattmann, Ph.D. > Principal Data Scientist, Engineering Administrative Office > (3010) Manager, NSF & Open Source Projects Formulation and Development > Offices > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 180-503E, Mailstop: 180-503 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++++++++ > Director, Information Retrieval and Data Science Group (IRDS) > Adjunct Associate Professor, Computer Science Department University of > Southern California, Los Angeles, CA 90089 USA > WWW: http://irds.usc.edu/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++++++++++++ > > > On 4/26/17, 7:54 AM, "Allison, Timothy B." > <talli...@mitre.org> > wrote: > > With the added TSD parser, I think I should rerun the > regression testing. Given that, I also fixed 2099, and we'll benefit > from a rerun. > > Anything else before I rerun the regression testing? > > Any problems observed in first run? > > > > > > > > >