Sounds good. As long as the default behavior remains the same, I'm happy. I'm going to play with a combination of your patch and Tyler's and see what the ramifications are for embedded docs.
To confirm, the OCR integration is fantastic. Thank you and Tyler! Best, Tim -----Original Message----- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Friday, October 24, 2014 5:36 PM To: dev@tika.apache.org Subject: Re: 1.7 release? Hey Tim, What do you think about my existing patch for 1445? For example to just call all the parsers? I thought I was seeing behavior that was slow because of that, but it turned out to be Tesseract and my machine at the time? I think my patch for 1445 may be enough, and we should get the metadata I think? Thoughts? I honestly think we need to deliver Tesseract in 1.7. We're close. I'll even take it upon myself to try and experiment with the idea of multiple parsers being called. I think a simple solution to the metadata key conflict issue is simply to have a policy to add values (by default) and replace if a property is set in ParseContext. Some simple updates to CompositeParser would allow this. Thoughts? Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: <Allison>, "Timothy B." <talli...@mitre.org> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> Date: Friday, October 24, 2014 at 2:24 PM To: "dev@tika.apache.org" <dev@tika.apache.org> Subject: RE: 1.7 release? >Sorry for coming late to the game on the implications of TIKA-1445. I >don't want to hold up the release of 1.7. > >However, would it be possible to return to the legacy default behavior of >extracting metadata from images? > >We can then document on the OCR parser page on the wiki that you need to >install Tesseract _and_ make a change in the parser/mime config file. If >you want this new capability, it will take a small bit of work until we >solve TIKA-1445. > >I worry that the current behavior of 1.7 would be surprising to most >non-dev users (well, even to at least one dev :) ). > >Cheers, > > Tim > >________________________________________ >From: Oleg Tikhonov [olegtikho...@gmail.com] >Sent: Friday, October 24, 2014 2:24 PM >To: dev@tika.apache.org >Subject: Re: 1.7 release? > >Hi Tyler, >don't mention. > >Cheers, >Oleg >On Oct 24, 2014 8:02 PM, "Tyler Palsulich" <tpalsul...@gmail.com> wrote: > >> Thank you for the help, Oleg! I just resolved TIKA-1422. So, are there >>any >> other issues anyone would like to resolve before a new release? >> >> Thanks, >> Tyler >> >> On Tue, Oct 21, 2014 at 2:42 AM, Oleg Tikhonov <olegtikho...@gmail.com> >> wrote: >> >> > Sorry!!! >> > >> > On Tue, Oct 21, 2014 at 9:37 AM, Mattmann, Chris A (3980) < >> > chris.a.mattm...@jpl.nasa.gov> wrote: >> > >> > > Thanks Oleg, will try tomorrow for me Los angeles time! >> > > >> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > Chris Mattmann, Ph.D. >> > > Chief Architect >> > > Instrument Software and Science Data Systems Section (398) >> > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> > > Office: 168-519, Mailstop: 168-527 >> > > Email: chris.a.mattm...@nasa.gov >> > > WWW: http://sunset.usc.edu/~mattmann/ >> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > Adjunct Associate Professor, Computer Science Department >> > > University of Southern California, Los Angeles, CA 90089 USA >> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > >> > > >> > > >> > > >> > > >> > > >> > > -----Original Message----- >> > > From: Oleg Tikhonov <o...@apache.org> >> > > Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> >> > > Date: Monday, October 20, 2014 at 11:20 PM >> > > To: "dev@tika.apache.org" <dev@tika.apache.org> >> > > Subject: Re: 1.7 release? >> > > >> > > >Please take a try with newest patch. >> > > >Cheers, >> > > >Oleg >> > > > >> > > >On Tue, Oct 21, 2014 at 9:08 AM, Oleg Tikhonov < >> olegtikho...@gmail.com> >> > > >wrote: >> > > > >> > > >> Taken. Thanks. in progress ... >> > > >> >> > > >> On Tue, Oct 21, 2014 at 8:54 AM, Mattmann, Chris A (3980) < >> > > >> chris.a.mattm...@jpl.nasa.gov> wrote: >> > > >> >> > > >>> Trunk is the current checkout/branch: >> > > >>> >> > > >>> http://svn.apache.org/repos/asf/tika/trunk >> > > >>> >> > > >>> >> > > >>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > >>> Chris Mattmann, Ph.D. >> > > >>> Chief Architect >> > > >>> Instrument Software and Science Data Systems Section (398) >> > > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> > > >>> Office: 168-519, Mailstop: 168-527 >> > > >>> Email: chris.a.mattm...@nasa.gov >> > > >>> WWW: http://sunset.usc.edu/~mattmann/ >> > > >>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > >>> Adjunct Associate Professor, Computer Science Department >> > > >>> University of Southern California, Los Angeles, CA 90089 USA >> > > >>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> -----Original Message----- >> > > >>> From: Oleg Tikhonov <olegtikho...@gmail.com> >> > > >>> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> >> > > >>> Date: Monday, October 20, 2014 at 10:16 PM >> > > >>> To: "dev@tika.apache.org" <dev@tika.apache.org> >> > > >>> Subject: Re: 1.7 release? >> > > >>> >> > > >>> >Hi, I can try this on. >> > > >>> >What is a trunk? >> > > >>> > >> > > >>> > >> > > >>> >Thanks, >> > > >>> >Oleg >> > > >>> > >> > > >>> >On Tue, Oct 21, 2014 at 6:21 AM, Mattmann, Chris A (3980) < >> > > >>> >chris.a.mattm...@jpl.nasa.gov> wrote: >> > > >>> > >> > > >>> >> Hmm any idea why this is failing on Windows? Tyler P. and >> > > >>> >> I were talking the other day - maybe we shouldn't run the >> > > >>> >> tests from TIKA-1422 unless Tesseract is installed? Thoughts? >> > > >>> >> >> > > >>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > >>> >> Chris Mattmann, Ph.D. >> > > >>> >> Chief Architect >> > > >>> >> Instrument Software and Science Data Systems Section (398) >> > > >>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> > > >>> >> Office: 168-519, Mailstop: 168-527 >> > > >>> >> Email: chris.a.mattm...@nasa.gov >> > > >>> >> WWW: http://sunset.usc.edu/~mattmann/ >> > > >>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > >>> >> Adjunct Associate Professor, Computer Science Department >> > > >>> >> University of Southern California, Los Angeles, CA 90089 USA >> > > >>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > >>> >> >> > > >>> >> >> > > >>> >> >> > > >>> >> >> > > >>> >> >> > > >>> >> >> > > >>> >> -----Original Message----- >> > > >>> >> From: Hong-Thai Nguyen <thaicha...@gmail.com> >> > > >>> >> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org> >> > > >>> >> Date: Thursday, October 16, 2014 at 2:03 AM >> > > >>> >> To: "dev@tika.apache.org" <dev@tika.apache.org> >> > > >>> >> Subject: Re: 1.7 release? >> > > >>> >> >> > > >>> >> >Hi Andrzej, >> > > >>> >> > >> > > >>> >> >We are impatient for 1.7 release too. >> > > >>> >> >I'm having compiling problem of TIKA-1422 on me. If anyone >>can >> > > >>>build >> > > >>> >> >successfully on Windows, I have no objection to release 1.7 >> > > >>> >> > >> > > >>> >> >Thanks, >> > > >>> >> > >> > > >>> >> >On Thu, Oct 16, 2014 at 10:51 AM, Andrzej BiaĆecki < >> > a...@getopt.org> >> > > >>> >>wrote: >> > > >>> >> > >> > > >>> >> >> Hi, >> > > >>> >> >> >> > > >>> >> >> Any news on the 1.7 release? or at least a 1.6.1 release >>that >> > > >>> >>includes >> > > >>> >> >>the >> > > >>> >> >> fix for broken ODF parsing... >> > > >>> >> >> >> > > >>> >> >> --- >> > > >>> >> >> Best regards, >> > > >>> >> >> >> > > >>> >> >> Andrzej Bialecki >> > > >>> >> >> >> > > >>> >> >> >> > > >>> >> > >> > > >>> >> > >> > > >>> >> >-- >> > > >>> >> >-------------- >> > > >>> >> >Hong-Thai >> > > >>> >> >> > > >>> >> >> > > >>> >> > > >>> >> > > >> >> > > >> > > >> > >>