Re: [VOTE] Apache Tika 1.8 Release Candidate #2
Thank you, Everyone! I'll move forward now. Lewis, KEYS are here: https://people.apache.org/keys/group/tika.asc. Of course, I'm also +1. Tyler On Mon, Apr 20, 2015 at 3:47 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi Folks, > > On Thu, Apr 16, 2015 at 2:42 PM, wrote: > > > > > > Hi Folks, > > > > > > A candidate for the Tika 1.8 release is available at: > > > https://dist.apache.org/repos/dist/dev/tika/ > > > > > > The release candidate is a zip archive of the sources in: > > > http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/ > > > > > > The SHA1 checksum of the archive is > > > 5e22fee9079370398472e59082d171ae2d7fdd31. > > > > > > In addition, a staged maven repository is available here: > > > > https://repository.apache.org/content/repositories/orgapachetika-1009 > > > > > > Please vote on releasing this package as Apache Tika 1.8. The vote is > > open > > > for the next 72 hours and passes if a majority of at least three +1 > Tika > > > PMC votes are cast. > > > > > Where is the KEYS? > All signatures are fine. > Test are A OK. > The remaining issue is with the Tika 1616 issue which was patched and > committed to trunk. > IMHO this is not a blocker. We could probably release 1.9 in a shorter > release cycle to accomodate the change > > > > > > > > [X] +1 Release this package as Apache Tika 1.8 > > > I am +1 for releasing this as 1.8. > Lewis >
Re: [VOTE] Apache Tika 1.8 Release Candidate #2
Hi Folks, On Thu, Apr 16, 2015 at 2:42 PM, wrote: > > > Hi Folks, > > > > A candidate for the Tika 1.8 release is available at: > > https://dist.apache.org/repos/dist/dev/tika/ > > > > The release candidate is a zip archive of the sources in: > > http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/ > > > > The SHA1 checksum of the archive is > > 5e22fee9079370398472e59082d171ae2d7fdd31. > > > > In addition, a staged maven repository is available here: > > https://repository.apache.org/content/repositories/orgapachetika-1009 > > > > Please vote on releasing this package as Apache Tika 1.8. The vote is > open > > for the next 72 hours and passes if a majority of at least three +1 Tika > > PMC votes are cast. > Where is the KEYS? All signatures are fine. Test are A OK. The remaining issue is with the Tika 1616 issue which was patched and committed to trunk. IMHO this is not a blocker. We could probably release 1.9 in a shorter release cycle to accomodate the change > > > > [X] +1 Release this package as Apache Tika 1.8 I am +1 for releasing this as 1.8. Lewis
Re: [VOTE] Apache Tika 1.8 Release Candidate #2
Both Nutch and Behemoth declare Hadoop 1.2.1 as a dependency and since it does not use Guava they won't have the same issue. However, did is just the default version and some people use them on Hadoop 2.x, in which case they'll might need to find a workaround On 20 April 2015 at 15:56, Julien Nioche wrote: > and I haven't tested it with Nutch either... > > On 20 April 2015 at 15:46, Julien Nioche > wrote: > >> I haven't tested the RC with Behemoth, it will probably have the same >> issue but I'll do like you and defer the update if that's the case. >> >> On 20 April 2015 at 15:23, Ken Krugler >> wrote: >> >>> >>> > From: Allison, Timothy B. >>> > Sent: April 20, 2015 5:11:04am PDT >>> > To: dev@tika.apache.org >>> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 >>> > >>> > If I understand correctly, if we release rc2, Tika 1.8 will break in >>> Hadoop clusters across the land?! >>> > Or, Hadoop folks will have to apply a classloading workaround or >>> rebuild 1.8/trunk with small version mod in TIKA-1606 to get Tika to work. >>> > >>> > For most Hadoopites, this will be a straightforward fix, and I'm >>> assuming that's why Ken is not more outspoken against releasing rc2 as is >>> (Ken, let me know if I'm wrong!). >>> >>> Usually it's straightforward. Though whenever you start manipulating the >>> classloader logic, you can get odd results. >>> >>> E.g. by forcing your job jar's dependencies to show up first, now you >>> can have an issue where one of your jars masks an older/newer version that >>> Hadoop needs, so the job fails for some other reason. >>> >>> But yes, I don't feel strongly enough about this to vote -1, as I don't >>> think there are that many people using Tika with Hadoop. >>> >>> For Bixo, I'd defer updating the Tika dependency until another version >>> is released. >>> >>> Don't know about Behemoth - Julien? >>> >>> -- Ken >>> >>> >>> > For other users, though, say, in healthcare, where code security >>> review is stringent, this could be a real pain, no? >>> > >>> > Am I understanding correctly what will happen? If so, do we really >>> want to do this? >>> > >>> > >>> > -Original Message- >>> > From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] >>> > Sent: Saturday, April 18, 2015 11:48 PM >>> > To: dev@tika.apache.org >>> > Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2 >>> > >>> > +1 to pushing on Monday - if we have to roll a 1.9 quickly >>> > after, we can :) >>> > >>> > ++ >>> > Chris Mattmann, Ph.D. >>> > Chief Architect >>> > Instrument Software and Science Data Systems Section (398) >>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>> > Office: 168-519, Mailstop: 168-527 >>> > Email: chris.a.mattm...@nasa.gov >>> > WWW: http://sunset.usc.edu/~mattmann/ >>> > ++ >>> > Adjunct Associate Professor, Computer Science Department >>> > University of Southern California, Los Angeles, CA 90089 USA >>> > ++ >>> > >>> > >>> > >>> > >>> > >>> > >>> > -Original Message- >>> > From: Tyler Palsulich >>> > Reply-To: "dev@tika.apache.org" >>> > Date: Saturday, April 18, 2015 at 11:29 PM >>> > To: "dev@tika.apache.org" >>> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 >>> > >>> >> Hi Folks, >>> >> >>> >> If there are no blocking complaints (OSGi?) by Monday (a little longer >>> >> than >>> >> 3 days, I realize), I'll mark this as passed and finish the release >>> >> process. >>> >> >>> >> Of course, it's no problem for me to cut another RC, if it's needed. >>> >> >>> >> Have a great weekend! >>> >> Tyler >>> >> I've run into one problem while testing Tika 1.8 with Bixo >>> &
RE: [VOTE] Apache Tika 1.8 Release Candidate #2
Um...Ok. If no one else is concerned... off we go? -Original Message- From: Julien Nioche [mailto:lists.digitalpeb...@gmail.com] Sent: Monday, April 20, 2015 10:56 AM To: dev@tika.apache.org Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2 and I haven't tested it with Nutch either... On 20 April 2015 at 15:46, Julien Nioche wrote: > I haven't tested the RC with Behemoth, it will probably have the same > issue but I'll do like you and defer the update if that's the case. > > On 20 April 2015 at 15:23, Ken Krugler > wrote: > >> >> > From: Allison, Timothy B. >> > Sent: April 20, 2015 5:11:04am PDT >> > To: dev@tika.apache.org >> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 >> > >> > If I understand correctly, if we release rc2, Tika 1.8 will break in >> Hadoop clusters across the land?! >> > Or, Hadoop folks will have to apply a classloading workaround or >> rebuild 1.8/trunk with small version mod in TIKA-1606 to get Tika to work. >> > >> > For most Hadoopites, this will be a straightforward fix, and I'm >> assuming that's why Ken is not more outspoken against releasing rc2 as is >> (Ken, let me know if I'm wrong!). >> >> Usually it's straightforward. Though whenever you start manipulating the >> classloader logic, you can get odd results. >> >> E.g. by forcing your job jar's dependencies to show up first, now you can >> have an issue where one of your jars masks an older/newer version that >> Hadoop needs, so the job fails for some other reason. >> >> But yes, I don't feel strongly enough about this to vote -1, as I don't >> think there are that many people using Tika with Hadoop. >> >> For Bixo, I'd defer updating the Tika dependency until another version is >> released. >> >> Don't know about Behemoth - Julien? >> >> -- Ken >> >> >> > For other users, though, say, in healthcare, where code security review >> is stringent, this could be a real pain, no? >> > >> > Am I understanding correctly what will happen? If so, do we really >> want to do this? >> > >> > >> > -Original Message- >> > From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] >> > Sent: Saturday, April 18, 2015 11:48 PM >> > To: dev@tika.apache.org >> > Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2 >> > >> > +1 to pushing on Monday - if we have to roll a 1.9 quickly >> > after, we can :) >> > >> > ++ >> > Chris Mattmann, Ph.D. >> > Chief Architect >> > Instrument Software and Science Data Systems Section (398) >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> > Office: 168-519, Mailstop: 168-527 >> > Email: chris.a.mattm...@nasa.gov >> > WWW: http://sunset.usc.edu/~mattmann/ >> > ++++++ >> > Adjunct Associate Professor, Computer Science Department >> > University of Southern California, Los Angeles, CA 90089 USA >> > ++ >> > >> > >> > >> > >> > >> > >> > -Original Message- >> > From: Tyler Palsulich >> > Reply-To: "dev@tika.apache.org" >> > Date: Saturday, April 18, 2015 at 11:29 PM >> > To: "dev@tika.apache.org" >> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 >> > >> >> Hi Folks, >> >> >> >> If there are no blocking complaints (OSGi?) by Monday (a little longer >> >> than >> >> 3 days, I realize), I'll mark this as passed and finish the release >> >> process. >> >> >> >> Of course, it's no problem for me to cut another RC, if it's needed. >> >> >> >> Have a great weekend! >> >> Tyler >> >> I've run into one problem while testing Tika 1.8 with Bixo >> >> >> >> It involves a dependency issue involving (of course) Guava, since that >> >> project loves to break their API :( >> >> >> >> The bixo-core jar has these transitive dependencies on various >> versions of >> >> Guava: >> >> >> >> Hadoop - 11.0.2 >> >> Cascading - 14.0.1 >> >> Tika-parsers - 10.0.1 >> >> cdm - 17
Re: [VOTE] Apache Tika 1.8 Release Candidate #2
and I haven't tested it with Nutch either... On 20 April 2015 at 15:46, Julien Nioche wrote: > I haven't tested the RC with Behemoth, it will probably have the same > issue but I'll do like you and defer the update if that's the case. > > On 20 April 2015 at 15:23, Ken Krugler > wrote: > >> >> > From: Allison, Timothy B. >> > Sent: April 20, 2015 5:11:04am PDT >> > To: dev@tika.apache.org >> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 >> > >> > If I understand correctly, if we release rc2, Tika 1.8 will break in >> Hadoop clusters across the land?! >> > Or, Hadoop folks will have to apply a classloading workaround or >> rebuild 1.8/trunk with small version mod in TIKA-1606 to get Tika to work. >> > >> > For most Hadoopites, this will be a straightforward fix, and I'm >> assuming that's why Ken is not more outspoken against releasing rc2 as is >> (Ken, let me know if I'm wrong!). >> >> Usually it's straightforward. Though whenever you start manipulating the >> classloader logic, you can get odd results. >> >> E.g. by forcing your job jar's dependencies to show up first, now you can >> have an issue where one of your jars masks an older/newer version that >> Hadoop needs, so the job fails for some other reason. >> >> But yes, I don't feel strongly enough about this to vote -1, as I don't >> think there are that many people using Tika with Hadoop. >> >> For Bixo, I'd defer updating the Tika dependency until another version is >> released. >> >> Don't know about Behemoth - Julien? >> >> -- Ken >> >> >> > For other users, though, say, in healthcare, where code security review >> is stringent, this could be a real pain, no? >> > >> > Am I understanding correctly what will happen? If so, do we really >> want to do this? >> > >> > >> > -Original Message- >> > From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] >> > Sent: Saturday, April 18, 2015 11:48 PM >> > To: dev@tika.apache.org >> > Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2 >> > >> > +1 to pushing on Monday - if we have to roll a 1.9 quickly >> > after, we can :) >> > >> > ++ >> > Chris Mattmann, Ph.D. >> > Chief Architect >> > Instrument Software and Science Data Systems Section (398) >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> > Office: 168-519, Mailstop: 168-527 >> > Email: chris.a.mattm...@nasa.gov >> > WWW: http://sunset.usc.edu/~mattmann/ >> > ++++++ >> > Adjunct Associate Professor, Computer Science Department >> > University of Southern California, Los Angeles, CA 90089 USA >> > ++ >> > >> > >> > >> > >> > >> > >> > -Original Message- >> > From: Tyler Palsulich >> > Reply-To: "dev@tika.apache.org" >> > Date: Saturday, April 18, 2015 at 11:29 PM >> > To: "dev@tika.apache.org" >> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 >> > >> >> Hi Folks, >> >> >> >> If there are no blocking complaints (OSGi?) by Monday (a little longer >> >> than >> >> 3 days, I realize), I'll mark this as passed and finish the release >> >> process. >> >> >> >> Of course, it's no problem for me to cut another RC, if it's needed. >> >> >> >> Have a great weekend! >> >> Tyler >> >> I've run into one problem while testing Tika 1.8 with Bixo >> >> >> >> It involves a dependency issue involving (of course) Guava, since that >> >> project loves to break their API :( >> >> >> >> The bixo-core jar has these transitive dependencies on various >> versions of >> >> Guava: >> >> >> >> Hadoop - 11.0.2 >> >> Cascading - 14.0.1 >> >> Tika-parsers - 10.0.1 >> >> cdm - 17.0 >> >> >> >> Everyone winds up using version 10.0.1 (note that Tika has a >> dependency on >> >> cdm, which wants to use 17.0) >> >> >> >> The problem is that Hadoop (for any recent version)
Re: [VOTE] Apache Tika 1.8 Release Candidate #2
I haven't tested the RC with Behemoth, it will probably have the same issue but I'll do like you and defer the update if that's the case. On 20 April 2015 at 15:23, Ken Krugler wrote: > > > From: Allison, Timothy B. > > Sent: April 20, 2015 5:11:04am PDT > > To: dev@tika.apache.org > > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 > > > > If I understand correctly, if we release rc2, Tika 1.8 will break in > Hadoop clusters across the land?! > > Or, Hadoop folks will have to apply a classloading workaround or rebuild > 1.8/trunk with small version mod in TIKA-1606 to get Tika to work. > > > > For most Hadoopites, this will be a straightforward fix, and I'm > assuming that's why Ken is not more outspoken against releasing rc2 as is > (Ken, let me know if I'm wrong!). > > Usually it's straightforward. Though whenever you start manipulating the > classloader logic, you can get odd results. > > E.g. by forcing your job jar's dependencies to show up first, now you can > have an issue where one of your jars masks an older/newer version that > Hadoop needs, so the job fails for some other reason. > > But yes, I don't feel strongly enough about this to vote -1, as I don't > think there are that many people using Tika with Hadoop. > > For Bixo, I'd defer updating the Tika dependency until another version is > released. > > Don't know about Behemoth - Julien? > > -- Ken > > > > For other users, though, say, in healthcare, where code security review > is stringent, this could be a real pain, no? > > > > Am I understanding correctly what will happen? If so, do we really want > to do this? > > > > > > -----Original Message- > > From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] > > Sent: Saturday, April 18, 2015 11:48 PM > > To: dev@tika.apache.org > > Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2 > > > > +1 to pushing on Monday - if we have to roll a 1.9 quickly > > after, we can :) > > > > ++ > > Chris Mattmann, Ph.D. > > Chief Architect > > Instrument Software and Science Data Systems Section (398) > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 168-519, Mailstop: 168-527 > > Email: chris.a.mattm...@nasa.gov > > WWW: http://sunset.usc.edu/~mattmann/ > > ++ > > Adjunct Associate Professor, Computer Science Department > > University of Southern California, Los Angeles, CA 90089 USA > > ++ > > > > > > > > > > > > > > -Original Message- > > From: Tyler Palsulich > > Reply-To: "dev@tika.apache.org" > > Date: Saturday, April 18, 2015 at 11:29 PM > > To: "dev@tika.apache.org" > > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 > > > >> Hi Folks, > >> > >> If there are no blocking complaints (OSGi?) by Monday (a little longer > >> than > >> 3 days, I realize), I'll mark this as passed and finish the release > >> process. > >> > >> Of course, it's no problem for me to cut another RC, if it's needed. > >> > >> Have a great weekend! > >> Tyler > >> I've run into one problem while testing Tika 1.8 with Bixo > >> > >> It involves a dependency issue involving (of course) Guava, since that > >> project loves to break their API :( > >> > >> The bixo-core jar has these transitive dependencies on various versions > of > >> Guava: > >> > >> Hadoop - 11.0.2 > >> Cascading - 14.0.1 > >> Tika-parsers - 10.0.1 > >> cdm - 17.0 > >> > >> Everyone winds up using version 10.0.1 (note that Tika has a dependency > on > >> cdm, which wants to use 17.0) > >> > >> The problem is that Hadoop (for any recent version) uses an API from > >> Guava's cache implementation that no longer exists: > >> > >> > com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL > >> oader;)Lcom/google/common/cache/LoadingCache; > >> java.lang.NoSuchMethodError: > >> > com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL > >> oader;)Lcom/google/common/cache/LoadingCache; > >> at > >> org.apache.hadoop.io.compress.CodecPool.createCache(C
RE: [VOTE] Apache Tika 1.8 Release Candidate #2
> From: Allison, Timothy B. > Sent: April 20, 2015 5:11:04am PDT > To: dev@tika.apache.org > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 > > If I understand correctly, if we release rc2, Tika 1.8 will break in Hadoop > clusters across the land?! > Or, Hadoop folks will have to apply a classloading workaround or rebuild > 1.8/trunk with small version mod in TIKA-1606 to get Tika to work. > > For most Hadoopites, this will be a straightforward fix, and I'm assuming > that's why Ken is not more outspoken against releasing rc2 as is (Ken, let me > know if I'm wrong!). Usually it's straightforward. Though whenever you start manipulating the classloader logic, you can get odd results. E.g. by forcing your job jar's dependencies to show up first, now you can have an issue where one of your jars masks an older/newer version that Hadoop needs, so the job fails for some other reason. But yes, I don't feel strongly enough about this to vote -1, as I don't think there are that many people using Tika with Hadoop. For Bixo, I'd defer updating the Tika dependency until another version is released. Don't know about Behemoth - Julien? -- Ken > For other users, though, say, in healthcare, where code security review is > stringent, this could be a real pain, no? > > Am I understanding correctly what will happen? If so, do we really want to > do this? > > > -Original Message- > From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] > Sent: Saturday, April 18, 2015 11:48 PM > To: dev@tika.apache.org > Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2 > > +1 to pushing on Monday - if we have to roll a 1.9 quickly > after, we can :) > > ++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++ > > > > > > > -Original Message- > From: Tyler Palsulich > Reply-To: "dev@tika.apache.org" > Date: Saturday, April 18, 2015 at 11:29 PM > To: "dev@tika.apache.org" > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 > >> Hi Folks, >> >> If there are no blocking complaints (OSGi?) by Monday (a little longer >> than >> 3 days, I realize), I'll mark this as passed and finish the release >> process. >> >> Of course, it's no problem for me to cut another RC, if it's needed. >> >> Have a great weekend! >> Tyler >> I've run into one problem while testing Tika 1.8 with Bixo >> >> It involves a dependency issue involving (of course) Guava, since that >> project loves to break their API :( >> >> The bixo-core jar has these transitive dependencies on various versions of >> Guava: >> >> Hadoop - 11.0.2 >> Cascading - 14.0.1 >> Tika-parsers - 10.0.1 >> cdm - 17.0 >> >> Everyone winds up using version 10.0.1 (note that Tika has a dependency on >> cdm, which wants to use 17.0) >> >> The problem is that Hadoop (for any recent version) uses an API from >> Guava's cache implementation that no longer exists: >> >> com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL >> oader;)Lcom/google/common/cache/LoadingCache; >> java.lang.NoSuchMethodError: >> com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL >> oader;)Lcom/google/common/cache/LoadingCache; >> at >> org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62) >> at >> org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74) >> at >> org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272) >> at >> org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutp >> utFormat.java:79) >> >> So what this means is that anyone trying to use Tika with Hadoop will need >> to play games with the class loader to get the older version of Guava - >> though that can cause other issues if Hadoop (or Cascading, etc) rely on >> anything that's only in the newer Guava API. >> >> Guava 1.0.01 was released about
Re: [VOTE] Apache Tika 1.8 Release Candidate #2
Hey Tim, Yeah I think you understood it correctly - however, someone in e.g., healthcare, or at NASA for example, can always grab the latest trunk SNAPSHOT which works fine and includes Ken’s TIKA-1606 fix. If we find many users and others complaining about 1.8, we can always rapidly release 1.9-SNAPSHOT and go through the VOTE’ing process on that too, right? Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: , "Timothy B." Reply-To: "dev@tika.apache.org" Date: Monday, April 20, 2015 at 8:11 AM To: "dev@tika.apache.org" Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 >If I understand correctly, if we release rc2, Tika 1.8 will break in >Hadoop clusters across the land?! >Or, Hadoop folks will have to apply a classloading workaround or rebuild >1.8/trunk with small version mod in TIKA-1606 to get Tika to work. > >For most Hadoopites, this will be a straightforward fix, and I'm assuming >that's why Ken is not more outspoken against releasing rc2 as is (Ken, >let me know if I'm wrong!). For other users, though, say, in healthcare, >where code security review is stringent, this could be a real pain, no? > >Am I understanding correctly what will happen? If so, do we really want >to do this? > > >-Original Message- >From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] >Sent: Saturday, April 18, 2015 11:48 PM >To: dev@tika.apache.org >Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2 > >+1 to pushing on Monday - if we have to roll a 1.9 quickly >after, we can :) > >++ >Chris Mattmann, Ph.D. >Chief Architect >Instrument Software and Science Data Systems Section (398) >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >Office: 168-519, Mailstop: 168-527 >Email: chris.a.mattm...@nasa.gov >WWW: http://sunset.usc.edu/~mattmann/ >++ >Adjunct Associate Professor, Computer Science Department >University of Southern California, Los Angeles, CA 90089 USA >++ > > > > > > >-Original Message----- >From: Tyler Palsulich >Reply-To: "dev@tika.apache.org" >Date: Saturday, April 18, 2015 at 11:29 PM >To: "dev@tika.apache.org" >Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 > >>Hi Folks, >> >>If there are no blocking complaints (OSGi?) by Monday (a little longer >>than >>3 days, I realize), I'll mark this as passed and finish the release >>process. >> >>Of course, it's no problem for me to cut another RC, if it's needed. >> >>Have a great weekend! >>Tyler >>I've run into one problem while testing Tika 1.8 with Bixo >> >>It involves a dependency issue involving (of course) Guava, since that >>project loves to break their API :( >> >>The bixo-core jar has these transitive dependencies on various versions >>of >>Guava: >> >>Hadoop - 11.0.2 >>Cascading - 14.0.1 >>Tika-parsers - 10.0.1 >>cdm - 17.0 >> >>Everyone winds up using version 10.0.1 (note that Tika has a dependency >>on >>cdm, which wants to use 17.0) >> >>The problem is that Hadoop (for any recent version) uses an API from >>Guava's cache implementation that no longer exists: >> >>com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/Cache >>L >>oader;)Lcom/google/common/cache/LoadingCache; >>java.lang.NoSuchMethodError: >>com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/Cache >>L >>oader;)Lcom/google/common/cache/LoadingCache; >>at >>org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62) >>at >>org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74) >>at >>org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272) >>at >>org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOut >>p >>utFormat.java:79) >>
RE: [VOTE] Apache Tika 1.8 Release Candidate #2
If I understand correctly, if we release rc2, Tika 1.8 will break in Hadoop clusters across the land?! Or, Hadoop folks will have to apply a classloading workaround or rebuild 1.8/trunk with small version mod in TIKA-1606 to get Tika to work. For most Hadoopites, this will be a straightforward fix, and I'm assuming that's why Ken is not more outspoken against releasing rc2 as is (Ken, let me know if I'm wrong!). For other users, though, say, in healthcare, where code security review is stringent, this could be a real pain, no? Am I understanding correctly what will happen? If so, do we really want to do this? -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Saturday, April 18, 2015 11:48 PM To: dev@tika.apache.org Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2 +1 to pushing on Monday - if we have to roll a 1.9 quickly after, we can :) ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Tyler Palsulich Reply-To: "dev@tika.apache.org" Date: Saturday, April 18, 2015 at 11:29 PM To: "dev@tika.apache.org" Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 >Hi Folks, > >If there are no blocking complaints (OSGi?) by Monday (a little longer >than >3 days, I realize), I'll mark this as passed and finish the release >process. > >Of course, it's no problem for me to cut another RC, if it's needed. > >Have a great weekend! >Tyler >I've run into one problem while testing Tika 1.8 with Bixo > >It involves a dependency issue involving (of course) Guava, since that >project loves to break their API :( > >The bixo-core jar has these transitive dependencies on various versions of >Guava: > >Hadoop - 11.0.2 >Cascading - 14.0.1 >Tika-parsers - 10.0.1 >cdm - 17.0 > >Everyone winds up using version 10.0.1 (note that Tika has a dependency on >cdm, which wants to use 17.0) > >The problem is that Hadoop (for any recent version) uses an API from >Guava's cache implementation that no longer exists: > >com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL >oader;)Lcom/google/common/cache/LoadingCache; >java.lang.NoSuchMethodError: >com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL >oader;)Lcom/google/common/cache/LoadingCache; >at >org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62) >at >org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74) >at >org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272) >at >org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutp >utFormat.java:79) > >So what this means is that anyone trying to use Tika with Hadoop will need >to play games with the class loader to get the older version of Guava - >though that can cause other issues if Hadoop (or Cascading, etc) rely on >anything that's only in the newer Guava API. > >Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3 >years ago. So it seems like we should upgrade to at least 11.0.2 > >But I don't know if this is enough of an issue to require another RC. > >-- Ken > >PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track >this. > > >> From: Tyler Palsulich >> Sent: April 13, 2015 10:56:29am PDT >> To: dev@tika.apache.org, u...@tika.apache.org >> Subject: [VOTE] Apache Tika 1.8 Release Candidate #2 >> >> Hi Folks, >> >> A candidate for the Tika 1.8 release is available at: >> https://dist.apache.org/repos/dist/dev/tika/ >> >> The release candidate is a zip archive of the sources in: >> http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/ >> >> The SHA1 checksum of the archive is >> 5e22fee9079370398472e59082d171ae2d7fdd31. >> >> In addition, a staged maven repository is available here: >> https://repository.apache.org/content/repositories/orgapachetika-1009 >> >> Please vote on releasing this package as Apache Tika 1.8. The vote is >open for the next 72 hours and passes if a majority of at least three +1 >Tika PMC votes are cast. >> >> [ ] +1 Release this package as Apache Tika 1.8 >> [ ] ±0 I don't object to this release, but I haven't checked it >> [ ] -1 Do not release this package because... >> >> Thanks, >> Tyler > > >-- >Ken Krugler >+1 530-210-6378 >http://www.scaleunlimited.com >custom big data solutions & training >Hadoop, Cascading, Cassandra & Solr
Re: [VOTE] Apache Tika 1.8 Release Candidate #2
Hi Ken, Sorry for the delayed response. No, that patch is not included in this RC (as I think you know, given your resolution of TIKA-1606). Have a good night, Tyler On Sun, Apr 19, 2015 at 10:49 AM, Ken Krugler wrote: > Hi Tyler, > > Does this include Lewis's fix for > https://issues.apache.org/jira/browse/TIKA-1606? > > It's a simple change (bumping the Guava version), but as seen this can > have unexpected consequences. > > I'm fine either way. > > -- Ken > > > From: Tyler Palsulich > > Sent: April 18, 2015 8:29:22pm PDT > > To: dev@tika.apache.org > > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 > > > > Hi Folks, > > > > If there are no blocking complaints (OSGi?) by Monday (a little longer > than > > 3 days, I realize), I'll mark this as passed and finish the release > process. > > > > Of course, it's no problem for me to cut another RC, if it's needed. > > > > Have a great weekend! > > Tyler > > I've run into one problem while testing Tika 1.8 with Bixo > > > > It involves a dependency issue involving (of course) Guava, since that > > project loves to break their API :( > > > > The bixo-core jar has these transitive dependencies on various versions > of > > Guava: > > > > Hadoop - 11.0.2 > > Cascading - 14.0.1 > > Tika-parsers - 10.0.1 > >cdm - 17.0 > > > > Everyone winds up using version 10.0.1 (note that Tika has a dependency > on > > cdm, which wants to use 17.0) > > > > The problem is that Hadoop (for any recent version) uses an API from > > Guava's cache implementation that no longer exists: > > > > > com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache; > > java.lang.NoSuchMethodError: > > > com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache; > >at > > org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62) > >at > > org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74) > >at > > org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272) > >at > > > org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:79) > > > > So what this means is that anyone trying to use Tika with Hadoop will > need > > to play games with the class loader to get the older version of Guava - > > though that can cause other issues if Hadoop (or Cascading, etc) rely on > > anything that's only in the newer Guava API. > > > > Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3 > > years ago. So it seems like we should upgrade to at least 11.0.2 > > > > But I don't know if this is enough of an issue to require another RC. > > > > -- Ken > > > > PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to > track > > this. > > > > > >> From: Tyler Palsulich > >> Sent: April 13, 2015 10:56:29am PDT > >> To: dev@tika.apache.org, u...@tika.apache.org > >> Subject: [VOTE] Apache Tika 1.8 Release Candidate #2 > >> > >> Hi Folks, > >> > >> A candidate for the Tika 1.8 release is available at: > >> https://dist.apache.org/repos/dist/dev/tika/ > >> > >> The release candidate is a zip archive of the sources in: > >> http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/ > >> > >> The SHA1 checksum of the archive is > >> 5e22fee9079370398472e59082d171ae2d7fdd31. > >> > >> In addition, a staged maven repository is available here: > >> https://repository.apache.org/content/repositories/orgapachetika-1009 > >> > >> Please vote on releasing this package as Apache Tika 1.8. The vote is > > open for the next 72 hours and passes if a majority of at least three +1 > > Tika PMC votes are cast. > >> > >> [ ] +1 Release this package as Apache Tika 1.8 > >> [ ] ±0 I don't object to this release, but I haven't checked it > >> [ ] -1 Do not release this package because... > >> > >> Thanks, > >> Tyler > > > -- > Ken Krugler > +1 530-210-6378 > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Cassandra & Solr > > > > > >
RE: [VOTE] Apache Tika 1.8 Release Candidate #2
Hi Tyler, Does this include Lewis's fix for https://issues.apache.org/jira/browse/TIKA-1606? It's a simple change (bumping the Guava version), but as seen this can have unexpected consequences. I'm fine either way. -- Ken > From: Tyler Palsulich > Sent: April 18, 2015 8:29:22pm PDT > To: dev@tika.apache.org > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 > > Hi Folks, > > If there are no blocking complaints (OSGi?) by Monday (a little longer than > 3 days, I realize), I'll mark this as passed and finish the release process. > > Of course, it's no problem for me to cut another RC, if it's needed. > > Have a great weekend! > Tyler > I've run into one problem while testing Tika 1.8 with Bixo > > It involves a dependency issue involving (of course) Guava, since that > project loves to break their API :( > > The bixo-core jar has these transitive dependencies on various versions of > Guava: > > Hadoop - 11.0.2 > Cascading - 14.0.1 > Tika-parsers - 10.0.1 >cdm - 17.0 > > Everyone winds up using version 10.0.1 (note that Tika has a dependency on > cdm, which wants to use 17.0) > > The problem is that Hadoop (for any recent version) uses an API from > Guava's cache implementation that no longer exists: > > com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache; > java.lang.NoSuchMethodError: > com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache; >at > org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62) >at > org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74) >at > org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272) >at > org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:79) > > So what this means is that anyone trying to use Tika with Hadoop will need > to play games with the class loader to get the older version of Guava - > though that can cause other issues if Hadoop (or Cascading, etc) rely on > anything that's only in the newer Guava API. > > Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3 > years ago. So it seems like we should upgrade to at least 11.0.2 > > But I don't know if this is enough of an issue to require another RC. > > -- Ken > > PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track > this. > > >> From: Tyler Palsulich >> Sent: April 13, 2015 10:56:29am PDT >> To: dev@tika.apache.org, u...@tika.apache.org >> Subject: [VOTE] Apache Tika 1.8 Release Candidate #2 >> >> Hi Folks, >> >> A candidate for the Tika 1.8 release is available at: >> https://dist.apache.org/repos/dist/dev/tika/ >> >> The release candidate is a zip archive of the sources in: >> http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/ >> >> The SHA1 checksum of the archive is >> 5e22fee9079370398472e59082d171ae2d7fdd31. >> >> In addition, a staged maven repository is available here: >> https://repository.apache.org/content/repositories/orgapachetika-1009 >> >> Please vote on releasing this package as Apache Tika 1.8. The vote is > open for the next 72 hours and passes if a majority of at least three +1 > Tika PMC votes are cast. >> >> [ ] +1 Release this package as Apache Tika 1.8 >> [ ] ±0 I don't object to this release, but I haven't checked it >> [ ] -1 Do not release this package because... >> >> Thanks, >> Tyler -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr
Re: [VOTE] Apache Tika 1.8 Release Candidate #2
+1 to pushing on Monday - if we have to roll a 1.9 quickly after, we can :) ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Tyler Palsulich Reply-To: "dev@tika.apache.org" Date: Saturday, April 18, 2015 at 11:29 PM To: "dev@tika.apache.org" Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 >Hi Folks, > >If there are no blocking complaints (OSGi?) by Monday (a little longer >than >3 days, I realize), I'll mark this as passed and finish the release >process. > >Of course, it's no problem for me to cut another RC, if it's needed. > >Have a great weekend! >Tyler >I've run into one problem while testing Tika 1.8 with Bixo > >It involves a dependency issue involving (of course) Guava, since that >project loves to break their API :( > >The bixo-core jar has these transitive dependencies on various versions of >Guava: > >Hadoop - 11.0.2 >Cascading - 14.0.1 >Tika-parsers - 10.0.1 >cdm - 17.0 > >Everyone winds up using version 10.0.1 (note that Tika has a dependency on >cdm, which wants to use 17.0) > >The problem is that Hadoop (for any recent version) uses an API from >Guava's cache implementation that no longer exists: > >com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL >oader;)Lcom/google/common/cache/LoadingCache; >java.lang.NoSuchMethodError: >com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL >oader;)Lcom/google/common/cache/LoadingCache; >at >org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62) >at >org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74) >at >org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272) >at >org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutp >utFormat.java:79) > >So what this means is that anyone trying to use Tika with Hadoop will need >to play games with the class loader to get the older version of Guava - >though that can cause other issues if Hadoop (or Cascading, etc) rely on >anything that's only in the newer Guava API. > >Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3 >years ago. So it seems like we should upgrade to at least 11.0.2 > >But I don't know if this is enough of an issue to require another RC. > >-- Ken > >PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track >this. > > >> From: Tyler Palsulich >> Sent: April 13, 2015 10:56:29am PDT >> To: dev@tika.apache.org, u...@tika.apache.org >> Subject: [VOTE] Apache Tika 1.8 Release Candidate #2 >> >> Hi Folks, >> >> A candidate for the Tika 1.8 release is available at: >> https://dist.apache.org/repos/dist/dev/tika/ >> >> The release candidate is a zip archive of the sources in: >> http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/ >> >> The SHA1 checksum of the archive is >> 5e22fee9079370398472e59082d171ae2d7fdd31. >> >> In addition, a staged maven repository is available here: >> https://repository.apache.org/content/repositories/orgapachetika-1009 >> >> Please vote on releasing this package as Apache Tika 1.8. The vote is >open for the next 72 hours and passes if a majority of at least three +1 >Tika PMC votes are cast. >> >> [ ] +1 Release this package as Apache Tika 1.8 >> [ ] ±0 I don't object to this release, but I haven't checked it >> [ ] -1 Do not release this package because... >> >> Thanks, >> Tyler > > >-- >Ken Krugler >+1 530-210-6378 >http://www.scaleunlimited.com >custom big data solutions & training >Hadoop, Cascading, Cassandra & Solr
RE: [VOTE] Apache Tika 1.8 Release Candidate #2
Hi Folks, If there are no blocking complaints (OSGi?) by Monday (a little longer than 3 days, I realize), I'll mark this as passed and finish the release process. Of course, it's no problem for me to cut another RC, if it's needed. Have a great weekend! Tyler I've run into one problem while testing Tika 1.8 with Bixo It involves a dependency issue involving (of course) Guava, since that project loves to break their API :( The bixo-core jar has these transitive dependencies on various versions of Guava: Hadoop - 11.0.2 Cascading - 14.0.1 Tika-parsers - 10.0.1 cdm - 17.0 Everyone winds up using version 10.0.1 (note that Tika has a dependency on cdm, which wants to use 17.0) The problem is that Hadoop (for any recent version) uses an API from Guava's cache implementation that no longer exists: com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache; java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache; at org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62) at org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74) at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272) at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:79) So what this means is that anyone trying to use Tika with Hadoop will need to play games with the class loader to get the older version of Guava - though that can cause other issues if Hadoop (or Cascading, etc) rely on anything that's only in the newer Guava API. Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3 years ago. So it seems like we should upgrade to at least 11.0.2 But I don't know if this is enough of an issue to require another RC. -- Ken PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track this. > From: Tyler Palsulich > Sent: April 13, 2015 10:56:29am PDT > To: dev@tika.apache.org, u...@tika.apache.org > Subject: [VOTE] Apache Tika 1.8 Release Candidate #2 > > Hi Folks, > > A candidate for the Tika 1.8 release is available at: > https://dist.apache.org/repos/dist/dev/tika/ > > The release candidate is a zip archive of the sources in: > http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/ > > The SHA1 checksum of the archive is > 5e22fee9079370398472e59082d171ae2d7fdd31. > > In addition, a staged maven repository is available here: > https://repository.apache.org/content/repositories/orgapachetika-1009 > > Please vote on releasing this package as Apache Tika 1.8. The vote is open for the next 72 hours and passes if a majority of at least three +1 Tika PMC votes are cast. > > [ ] +1 Release this package as Apache Tika 1.8 > [ ] ±0 I don't object to this release, but I haven't checked it > [ ] -1 Do not release this package because... > > Thanks, > Tyler -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr
RE: [VOTE] Apache Tika 1.8 Release Candidate #2
I've run into one problem while testing Tika 1.8 with Bixo It involves a dependency issue involving (of course) Guava, since that project loves to break their API :( The bixo-core jar has these transitive dependencies on various versions of Guava: Hadoop - 11.0.2 Cascading - 14.0.1 Tika-parsers - 10.0.1 cdm - 17.0 Everyone winds up using version 10.0.1 (note that Tika has a dependency on cdm, which wants to use 17.0) The problem is that Hadoop (for any recent version) uses an API from Guava's cache implementation that no longer exists: com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache; java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache; at org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62) at org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74) at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272) at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:79) So what this means is that anyone trying to use Tika with Hadoop will need to play games with the class loader to get the older version of Guava - though that can cause other issues if Hadoop (or Cascading, etc) rely on anything that's only in the newer Guava API. Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3 years ago. So it seems like we should upgrade to at least 11.0.2 But I don't know if this is enough of an issue to require another RC. -- Ken PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track this. > From: Tyler Palsulich > Sent: April 13, 2015 10:56:29am PDT > To: dev@tika.apache.org, u...@tika.apache.org > Subject: [VOTE] Apache Tika 1.8 Release Candidate #2 > > Hi Folks, > > A candidate for the Tika 1.8 release is available at: > https://dist.apache.org/repos/dist/dev/tika/ > > The release candidate is a zip archive of the sources in: > http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/ > > The SHA1 checksum of the archive is > 5e22fee9079370398472e59082d171ae2d7fdd31. > > In addition, a staged maven repository is available here: > https://repository.apache.org/content/repositories/orgapachetika-1009 > > Please vote on releasing this package as Apache Tika 1.8. The vote is open > for the next 72 hours and passes if a majority of at least three +1 Tika PMC > votes are cast. > > [ ] +1 Release this package as Apache Tika 1.8 > [ ] ±0 I don't object to this release, but I haven't checked it > [ ] -1 Do not release this package because... > > Thanks, > Tyler -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr
Re: [VOTE] Apache Tika 1.8 Release Candidate #2
Hi, folks. All tests pass, checksum and gpg signature for tika-1.8-src.zip are fine. Checked on ArchLinux x86_64, openjdk 7u75, w/ tesseract. Thank you, Tyler. [x] +1 Release this package as Apache Tika 1.8 [ ] ±0 I don't object to this release, but I haven't checked it [ ] -1 Do not release this package because... -- Best regards, Konstantin Gribov пн, 13 апр. 2015 г. в 20:56, Tyler Palsulich : > Hi Folks, > > A candidate for the Tika 1.8 release is available at: > https://dist.apache.org/repos/dist/dev/tika/ > > The release candidate is a zip archive of the sources in: > http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/ > > The SHA1 checksum of the archive is > 5e22fee9079370398472e59082d171ae2d7fdd31. > > In addition, a staged maven repository is available here: > https://repository.apache.org/content/repositories/orgapachetika-1009 > > Please vote on releasing this package as Apache Tika 1.8. The vote is open > for the next 72 hours and passes if a majority of at least three +1 Tika > PMC votes are cast. > > [ ] +1 Release this package as Apache Tika 1.8 > [ ] ±0 I don't object to this release, but I haven't checked it > [ ] -1 Do not release this package because... > > Thanks, > Tyler >
Re: [VOTE] Apache Tika 1.8 Release Candidate #2
Hi Tyler, good job, indeed !!! [x] +1 Release this package as Apache Tika 1.8 On Wed, Apr 15, 2015 at 8:22 AM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Thanks Tyler! +1 from me: > > SIGS, checksums check out: > > > [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc > tika 1.8-src https://dist.apache.org/repos/dist/dev/tika/ > > % Total% Received % Xferd Average Speed TimeTime Time > Current > > Dload Upload Total SpentLeft > Speed > > 100 69.2M 100 69.2M0 0 1524k 0 0:00:46 0:00:46 --:--:-- > 1661k > > % Total% Received % Xferd Average Speed TimeTime Time > Current > > Dload Upload Total SpentLeft > Speed > > 100 473 100 4730 0874 0 --:--:-- --:--:-- --:--:-- > 874 > > % Total% Received % Xferd Average Speed TimeTime Time > Current > > Dload Upload Total SpentLeft > Speed > > 10033 100330 0 62 0 --:--:-- --:--:-- --:--:-- > 62 > > [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc > tika-app 1.8 https://dist.apache.org/repos/dist/dev/tika/ > > % Total% Received % Xferd Average Speed TimeTime Time > Current > > Dload Upload Total SpentLeft > Speed > > 100 44.0M 100 44.0M0 0 1742k 0 0:00:25 0:00:25 --:--:-- > 1825k > > % Total% Received % Xferd Average Speed TimeTime Time > Current > > Dload Upload Total SpentLeft > Speed > > 100 473 100 4730 0922 0 --:--:-- --:--:-- --:--:-- > 922 > > % Total% Received % Xferd Average Speed TimeTime Time > Current > > Dload Upload Total SpentLeft > Speed > > 10033 100330 0 63 0 --:--:-- --:--:-- --:--:-- > 63 > > [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc > tika-server 1.8 https://dist.apache.org/repos/dist/dev/tika/ > > % Total% Received % Xferd Average Speed TimeTime Time > Current > > Dload Upload Total SpentLeft > Speed > > 100 48.3M 100 48.3M0 0 1379k 0 0:00:35 0:00:35 --:--:-- > 1569k > > % Total% Received % Xferd Average Speed TimeTime Time > Current > > Dload Upload Total SpentLeft > Speed > > 100 473 100 4730 0891 0 --:--:-- --:--:-- --:--:-- > 892 > > % Total% Received % Xferd Average Speed TimeTime Time > Current > > Dload Upload Total SpentLeft > Speed > > 10033 100330 0 62 0 --:--:-- --:--:-- --:--:-- > 62 > > [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% > > > [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/verify_gpg_sigs > > Verifying Signature for file tika-1.8-src.zip.asc > > gpg: Signature made Mon Apr 13 13:46:39 2015 EDT using RSA key ID D4F10117 > > gpg: Good signature from "Tyler Palsulich " > > gpg: WARNING: This key is not certified with a trusted signature! > > gpg: There is no indication that the signature belongs to the > owner. > > Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4 183E 8810 BB19 D4F1 0117 > > Verifying Signature for file tika-app-1.8.jar.asc > > gpg: Signature made Mon Apr 13 13:43:13 2015 EDT using RSA key ID D4F10117 > > gpg: Good signature from "Tyler Palsulich " > > gpg: WARNING: This key is not certified with a trusted signature! > > gpg: There is no indication that the signature belongs to the > owner. > > Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4 183E 8810 BB19 D4F1 0117 > > Verifying Signature for file tika-server-1.8.jar.asc > > gpg: Signature made Mon Apr 13 13:45:00 2015 EDT using RSA key ID D4F10117 > > gpg: Good signature from "Tyler Palsulich " > > gpg: WARNING: This key is not certified with a trusted signature! > > gpg: There is no indication that the signature belongs to the > owner. > > Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4 183E 8810 BB19 D4F1 0117 > > [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% > $HOME/bin/verify_md5_checksums > > md5sum: stat '*.tar.gz': No such file or directory > > md5sum: stat '*.bz2': No such file or directory > > md5sum: stat '*.tgz': No such file or directory > > tika-1.8-src.zip: OK > > [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% > > Cheers! > > Chris > > > From: Tyler Palsulich [tpalsul...@apache.org] > Sent: Monday, April 13, 2015 10:56 AM > To: dev@tika.apache.org; u...@tika.apache.org > Subject: [VOTE] Apache Tika 1.8 Release Candidate #2 > > Hi Folks, > > A candidate for the Tika 1.8 release is available at: > https://dist.apache.org/repos/dist/dev/tika/ > > T
RE: [VOTE] Apache Tika 1.8 Release Candidate #2
Thanks Tyler! +1 from me: SIGS, checksums check out: [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc tika 1.8-src https://dist.apache.org/repos/dist/dev/tika/ % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 69.2M 100 69.2M0 0 1524k 0 0:00:46 0:00:46 --:--:-- 1661k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 473 100 4730 0874 0 --:--:-- --:--:-- --:--:-- 874 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 10033 100330 0 62 0 --:--:-- --:--:-- --:--:--62 [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc tika-app 1.8 https://dist.apache.org/repos/dist/dev/tika/ % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 44.0M 100 44.0M0 0 1742k 0 0:00:25 0:00:25 --:--:-- 1825k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 473 100 4730 0922 0 --:--:-- --:--:-- --:--:-- 922 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 10033 100330 0 63 0 --:--:-- --:--:-- --:--:--63 [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc tika-server 1.8 https://dist.apache.org/repos/dist/dev/tika/ % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 48.3M 100 48.3M0 0 1379k 0 0:00:35 0:00:35 --:--:-- 1569k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 473 100 4730 0891 0 --:--:-- --:--:-- --:--:-- 892 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 10033 100330 0 62 0 --:--:-- --:--:-- --:--:--62 [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/verify_gpg_sigs Verifying Signature for file tika-1.8-src.zip.asc gpg: Signature made Mon Apr 13 13:46:39 2015 EDT using RSA key ID D4F10117 gpg: Good signature from "Tyler Palsulich " gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4 183E 8810 BB19 D4F1 0117 Verifying Signature for file tika-app-1.8.jar.asc gpg: Signature made Mon Apr 13 13:43:13 2015 EDT using RSA key ID D4F10117 gpg: Good signature from "Tyler Palsulich " gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4 183E 8810 BB19 D4F1 0117 Verifying Signature for file tika-server-1.8.jar.asc gpg: Signature made Mon Apr 13 13:45:00 2015 EDT using RSA key ID D4F10117 gpg: Good signature from "Tyler Palsulich " gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4 183E 8810 BB19 D4F1 0117 [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/verify_md5_checksums md5sum: stat '*.tar.gz': No such file or directory md5sum: stat '*.bz2': No such file or directory md5sum: stat '*.tgz': No such file or directory tika-1.8-src.zip: OK [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% Cheers! Chris From: Tyler Palsulich [tpalsul...@apache.org] Sent: Monday, April 13, 2015 10:56 AM To: dev@tika.apache.org; u...@tika.apache.org Subject: [VOTE] Apache Tika 1.8 Release Candidate #2 Hi Folks, A candidate for the Tika 1.8 release is available at: https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/ The SHA1 checksum of the archive is 5e22fee9079370398472e59082d171ae2d7fdd31. In addition, a staged maven repository is available here: https://repository.apache.org/content/repositories/orgapachetika-1009 Please vote on releasing this package as Apache Tika 1.8. The vote is open for the next 72 hours and
Re: [VOTE] Apache Tika 1.8 Release Candidate #2
Hi Tim Great to hear that you managed to use the dataset from CommonCrawl. Thanks! Julien On 14 April 2015 at 14:15, Allison, Timothy B. wrote: > +1 > > Thank you, Tyler! > > Apologies to Hong-Thai and community for not recognizing the severity of > TIKA-1600 when I voted in favor of rc1! > > Details... > > I reran against govdocs1, and there aren't any major surprises. > > On our Rackspace vm, I _finally_ unzipped the Common Crawl slice that > Julien Nioche created for us, and I ran against that as well. That turned > up TIKA-1605 and another exceedingly rare NPE in the PDFParser. I don't > think either of these are blockers, and they're now fixed in trunk. > > There are slightly fewer metadata values for some jpegs. For the one file > that I manually reviewed, 1.8-rc was missing these values (that were > available in 1.7): > > JPEG quality > IPTC-NAA record > Plug-in 1 Data > > Comparison reports are available here (much more work remains to be done > on tika-eval): > > https://github.com/tballison/share/tree/master/tika_comparisons > > > From: Tyler Palsulich > Sent: Monday, April 13, 2015 1:56 PM > To: dev@tika.apache.org; u...@tika.apache.org > Subject: [VOTE] Apache Tika 1.8 Release Candidate #2 > > Hi Folks, > > A candidate for the Tika 1.8 release is available at: > https://dist.apache.org/repos/dist/dev/tika/ > > The release candidate is a zip archive of the sources in: > http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/ > > The SHA1 checksum of the archive is > 5e22fee9079370398472e59082d171ae2d7fdd31. > > In addition, a staged maven repository is available here: > https://repository.apache.org/content/repositories/orgapachetika-1009 > > Please vote on releasing this package as Apache Tika 1.8. The vote is open > for the next 72 hours and passes if a majority of at least three +1 Tika > PMC votes are cast. > > [ ] +1 Release this package as Apache Tika 1.8 > [ ] ±0 I don't object to this release, but I haven't checked it > [ ] -1 Do not release this package because... > > Thanks, > Tyler > -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble
Re: [VOTE] Apache Tika 1.8 Release Candidate #2
+1 Thank you, Tyler! Apologies to Hong-Thai and community for not recognizing the severity of TIKA-1600 when I voted in favor of rc1! Details... I reran against govdocs1, and there aren't any major surprises. On our Rackspace vm, I _finally_ unzipped the Common Crawl slice that Julien Nioche created for us, and I ran against that as well. That turned up TIKA-1605 and another exceedingly rare NPE in the PDFParser. I don't think either of these are blockers, and they're now fixed in trunk. There are slightly fewer metadata values for some jpegs. For the one file that I manually reviewed, 1.8-rc was missing these values (that were available in 1.7): JPEG quality IPTC-NAA record Plug-in 1 Data Comparison reports are available here (much more work remains to be done on tika-eval): https://github.com/tballison/share/tree/master/tika_comparisons From: Tyler Palsulich Sent: Monday, April 13, 2015 1:56 PM To: dev@tika.apache.org; u...@tika.apache.org Subject: [VOTE] Apache Tika 1.8 Release Candidate #2 Hi Folks, A candidate for the Tika 1.8 release is available at: https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/ The SHA1 checksum of the archive is 5e22fee9079370398472e59082d171ae2d7fdd31. In addition, a staged maven repository is available here: https://repository.apache.org/content/repositories/orgapachetika-1009 Please vote on releasing this package as Apache Tika 1.8. The vote is open for the next 72 hours and passes if a majority of at least three +1 Tika PMC votes are cast. [ ] +1 Release this package as Apache Tika 1.8 [ ] ±0 I don't object to this release, but I haven't checked it [ ] -1 Do not release this package because... Thanks, Tyler
RE: [VOTE] Apache Tika 1.8 Release Candidate #2
Hi, +1 for me. Great work, Tyler ! Hong-Thai -Message d'origine- De : Tyler Palsulich [mailto:tpalsul...@apache.org] Envoyé : lundi 13 avril 2015 19:56 À : dev@tika.apache.org; u...@tika.apache.org Objet : [VOTE] Apache Tika 1.8 Release Candidate #2 Hi Folks, A candidate for the Tika 1.8 release is available at: https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/ The SHA1 checksum of the archive is 5e22fee9079370398472e59082d171ae2d7fdd31. In addition, a staged maven repository is available here: https://repository.apache.org/content/repositories/orgapachetika-1009 Please vote on releasing this package as Apache Tika 1.8. The vote is open for the next 72 hours and passes if a majority of at least three +1 Tika PMC votes are cast. [ ] +1 Release this package as Apache Tika 1.8 [ ] ±0 I don't object to this release, but I haven't checked it [ ] -1 Do not release this package because... Thanks, Tyler