Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Tyler Palsulich
Thank you, Everyone! I'll move forward now.

Lewis, KEYS are here: https://people.apache.org/keys/group/tika.asc.

Of course, I'm also +1.

Tyler

On Mon, Apr 20, 2015 at 3:47 PM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Hi Folks,
>
> On Thu, Apr 16, 2015 at 2:42 PM,  wrote:
>
> >
> > > Hi Folks,
> > >
> > > A candidate for the Tika 1.8 release is available at:
> > >   https://dist.apache.org/repos/dist/dev/tika/
> > >
> > > The release candidate is a zip archive of the sources in:
> > >   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
> > >
> > > The SHA1 checksum of the archive is
> > >   5e22fee9079370398472e59082d171ae2d7fdd31.
> > >
> > > In addition, a staged maven repository is available here:
> > >
> https://repository.apache.org/content/repositories/orgapachetika-1009
> > >
> > > Please vote on releasing this package as Apache Tika 1.8. The vote is
> > open
> > > for the next 72 hours and passes if a majority of at least three +1
> Tika
> > > PMC votes are cast.
> >
>
>
> Where is the KEYS?
> All signatures are fine.
> Test are A OK.
> The remaining issue is with the Tika 1616 issue which was patched and
> committed to trunk.
> IMHO this is not a blocker. We could probably release 1.9 in a shorter
> release cycle to accomodate the change
>
>
> > >
> > > [X] +1 Release this package as Apache Tika 1.8
>
>
> I am +1 for releasing this as 1.8.
> Lewis
>


Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Lewis John Mcgibbney
Hi Folks,

On Thu, Apr 16, 2015 at 2:42 PM,  wrote:

>
> > Hi Folks,
> >
> > A candidate for the Tika 1.8 release is available at:
> >   https://dist.apache.org/repos/dist/dev/tika/
> >
> > The release candidate is a zip archive of the sources in:
> >   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
> >
> > The SHA1 checksum of the archive is
> >   5e22fee9079370398472e59082d171ae2d7fdd31.
> >
> > In addition, a staged maven repository is available here:
> >   https://repository.apache.org/content/repositories/orgapachetika-1009
> >
> > Please vote on releasing this package as Apache Tika 1.8. The vote is
> open
> > for the next 72 hours and passes if a majority of at least three +1 Tika
> > PMC votes are cast.
>


Where is the KEYS?
All signatures are fine.
Test are A OK.
The remaining issue is with the Tika 1616 issue which was patched and
committed to trunk.
IMHO this is not a blocker. We could probably release 1.9 in a shorter
release cycle to accomodate the change


> >
> > [X] +1 Release this package as Apache Tika 1.8


I am +1 for releasing this as 1.8.
Lewis


Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Julien Nioche
Both Nutch and Behemoth declare Hadoop 1.2.1 as a dependency and since it
does not use Guava they won't have the same issue. However, did is just the
default version and some people use them on Hadoop 2.x, in which case
they'll might need to find a workaround

On 20 April 2015 at 15:56, Julien Nioche 
wrote:

> and I haven't tested it with Nutch either...
>
> On 20 April 2015 at 15:46, Julien Nioche 
> wrote:
>
>> I haven't tested the RC with Behemoth, it will probably have the same
>> issue but I'll do like you and defer the update if that's the case.
>>
>> On 20 April 2015 at 15:23, Ken Krugler 
>> wrote:
>>
>>>
>>> > From: Allison, Timothy B.
>>> > Sent: April 20, 2015 5:11:04am PDT
>>> > To: dev@tika.apache.org
>>> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
>>> >
>>> > If I understand correctly, if we release rc2, Tika 1.8 will break in
>>> Hadoop clusters across the land?!
>>> > Or, Hadoop folks will have to apply a classloading workaround or
>>> rebuild 1.8/trunk with small version mod in TIKA-1606 to get Tika to work.
>>> >
>>> > For most Hadoopites, this will be a straightforward fix, and I'm
>>> assuming that's why Ken is not more outspoken against releasing rc2 as is
>>> (Ken, let me know if I'm wrong!).
>>>
>>> Usually it's straightforward. Though whenever you start manipulating the
>>> classloader logic, you can get odd results.
>>>
>>> E.g. by forcing your job jar's dependencies to show up first, now you
>>> can have an issue where one of your jars masks an older/newer version that
>>> Hadoop needs, so the job fails for some other reason.
>>>
>>> But yes, I don't feel strongly enough about this to vote -1, as I don't
>>> think there are that many people using Tika with Hadoop.
>>>
>>> For Bixo, I'd defer updating the Tika dependency until another version
>>> is released.
>>>
>>> Don't know about Behemoth - Julien?
>>>
>>> -- Ken
>>>
>>>
>>> > For other users, though, say, in healthcare, where code security
>>> review is stringent, this could be a real pain, no?
>>> >
>>> > Am I understanding correctly what will happen?  If so, do we really
>>> want to do this?
>>> >
>>> >
>>> > -Original Message-
>>> > From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
>>> > Sent: Saturday, April 18, 2015 11:48 PM
>>> > To: dev@tika.apache.org
>>> > Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2
>>> >
>>> > +1 to pushing on Monday - if we have to roll a 1.9 quickly
>>> > after, we can :)
>>> >
>>> > ++
>>> > Chris Mattmann, Ph.D.
>>> > Chief Architect
>>> > Instrument Software and Science Data Systems Section (398)
>>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> > Office: 168-519, Mailstop: 168-527
>>> > Email: chris.a.mattm...@nasa.gov
>>> > WWW:  http://sunset.usc.edu/~mattmann/
>>> > ++
>>> > Adjunct Associate Professor, Computer Science Department
>>> > University of Southern California, Los Angeles, CA 90089 USA
>>> > ++
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > -Original Message-
>>> > From: Tyler Palsulich 
>>> > Reply-To: "dev@tika.apache.org" 
>>> > Date: Saturday, April 18, 2015 at 11:29 PM
>>> > To: "dev@tika.apache.org" 
>>> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
>>> >
>>> >> Hi Folks,
>>> >>
>>> >> If there are no blocking complaints (OSGi?) by Monday (a little longer
>>> >> than
>>> >> 3 days, I realize), I'll mark this as passed and finish the release
>>> >> process.
>>> >>
>>> >> Of course, it's no problem for me to cut another RC, if it's needed.
>>> >>
>>> >> Have a great weekend!
>>> >> Tyler
>>> >> I've run into one problem while testing Tika 1.8 with Bixo
>>> &

RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Allison, Timothy B.
Um...Ok.  If no one else is concerned...  off we go?

-Original Message-
From: Julien Nioche [mailto:lists.digitalpeb...@gmail.com] 
Sent: Monday, April 20, 2015 10:56 AM
To: dev@tika.apache.org
Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2

and I haven't tested it with Nutch either...

On 20 April 2015 at 15:46, Julien Nioche 
wrote:

> I haven't tested the RC with Behemoth, it will probably have the same
> issue but I'll do like you and defer the update if that's the case.
>
> On 20 April 2015 at 15:23, Ken Krugler 
> wrote:
>
>>
>> > From: Allison, Timothy B.
>> > Sent: April 20, 2015 5:11:04am PDT
>> > To: dev@tika.apache.org
>> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
>> >
>> > If I understand correctly, if we release rc2, Tika 1.8 will break in
>> Hadoop clusters across the land?!
>> > Or, Hadoop folks will have to apply a classloading workaround or
>> rebuild 1.8/trunk with small version mod in TIKA-1606 to get Tika to work.
>> >
>> > For most Hadoopites, this will be a straightforward fix, and I'm
>> assuming that's why Ken is not more outspoken against releasing rc2 as is
>> (Ken, let me know if I'm wrong!).
>>
>> Usually it's straightforward. Though whenever you start manipulating the
>> classloader logic, you can get odd results.
>>
>> E.g. by forcing your job jar's dependencies to show up first, now you can
>> have an issue where one of your jars masks an older/newer version that
>> Hadoop needs, so the job fails for some other reason.
>>
>> But yes, I don't feel strongly enough about this to vote -1, as I don't
>> think there are that many people using Tika with Hadoop.
>>
>> For Bixo, I'd defer updating the Tika dependency until another version is
>> released.
>>
>> Don't know about Behemoth - Julien?
>>
>> -- Ken
>>
>>
>> > For other users, though, say, in healthcare, where code security review
>> is stringent, this could be a real pain, no?
>> >
>> > Am I understanding correctly what will happen?  If so, do we really
>> want to do this?
>> >
>> >
>> > -Original Message-
>> > From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
>> > Sent: Saturday, April 18, 2015 11:48 PM
>> > To: dev@tika.apache.org
>> > Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2
>> >
>> > +1 to pushing on Monday - if we have to roll a 1.9 quickly
>> > after, we can :)
>> >
>> > ++
>> > Chris Mattmann, Ph.D.
>> > Chief Architect
>> > Instrument Software and Science Data Systems Section (398)
>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > Office: 168-519, Mailstop: 168-527
>> > Email: chris.a.mattm...@nasa.gov
>> > WWW:  http://sunset.usc.edu/~mattmann/
>> > ++++++
>> > Adjunct Associate Professor, Computer Science Department
>> > University of Southern California, Los Angeles, CA 90089 USA
>> > ++
>> >
>> >
>> >
>> >
>> >
>> >
>> > -Original Message-
>> > From: Tyler Palsulich 
>> > Reply-To: "dev@tika.apache.org" 
>> > Date: Saturday, April 18, 2015 at 11:29 PM
>> > To: "dev@tika.apache.org" 
>> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
>> >
>> >> Hi Folks,
>> >>
>> >> If there are no blocking complaints (OSGi?) by Monday (a little longer
>> >> than
>> >> 3 days, I realize), I'll mark this as passed and finish the release
>> >> process.
>> >>
>> >> Of course, it's no problem for me to cut another RC, if it's needed.
>> >>
>> >> Have a great weekend!
>> >> Tyler
>> >> I've run into one problem while testing Tika 1.8 with Bixo
>> >>
>> >> It involves a dependency issue involving (of course) Guava, since that
>> >> project loves to break their API :(
>> >>
>> >> The bixo-core jar has these transitive dependencies on various
>> versions of
>> >> Guava:
>> >>
>> >> Hadoop - 11.0.2
>> >> Cascading - 14.0.1
>> >> Tika-parsers - 10.0.1
>> >>   cdm - 17

Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Julien Nioche
and I haven't tested it with Nutch either...

On 20 April 2015 at 15:46, Julien Nioche 
wrote:

> I haven't tested the RC with Behemoth, it will probably have the same
> issue but I'll do like you and defer the update if that's the case.
>
> On 20 April 2015 at 15:23, Ken Krugler 
> wrote:
>
>>
>> > From: Allison, Timothy B.
>> > Sent: April 20, 2015 5:11:04am PDT
>> > To: dev@tika.apache.org
>> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
>> >
>> > If I understand correctly, if we release rc2, Tika 1.8 will break in
>> Hadoop clusters across the land?!
>> > Or, Hadoop folks will have to apply a classloading workaround or
>> rebuild 1.8/trunk with small version mod in TIKA-1606 to get Tika to work.
>> >
>> > For most Hadoopites, this will be a straightforward fix, and I'm
>> assuming that's why Ken is not more outspoken against releasing rc2 as is
>> (Ken, let me know if I'm wrong!).
>>
>> Usually it's straightforward. Though whenever you start manipulating the
>> classloader logic, you can get odd results.
>>
>> E.g. by forcing your job jar's dependencies to show up first, now you can
>> have an issue where one of your jars masks an older/newer version that
>> Hadoop needs, so the job fails for some other reason.
>>
>> But yes, I don't feel strongly enough about this to vote -1, as I don't
>> think there are that many people using Tika with Hadoop.
>>
>> For Bixo, I'd defer updating the Tika dependency until another version is
>> released.
>>
>> Don't know about Behemoth - Julien?
>>
>> -- Ken
>>
>>
>> > For other users, though, say, in healthcare, where code security review
>> is stringent, this could be a real pain, no?
>> >
>> > Am I understanding correctly what will happen?  If so, do we really
>> want to do this?
>> >
>> >
>> > -Original Message-
>> > From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
>> > Sent: Saturday, April 18, 2015 11:48 PM
>> > To: dev@tika.apache.org
>> > Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2
>> >
>> > +1 to pushing on Monday - if we have to roll a 1.9 quickly
>> > after, we can :)
>> >
>> > ++
>> > Chris Mattmann, Ph.D.
>> > Chief Architect
>> > Instrument Software and Science Data Systems Section (398)
>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > Office: 168-519, Mailstop: 168-527
>> > Email: chris.a.mattm...@nasa.gov
>> > WWW:  http://sunset.usc.edu/~mattmann/
>> > ++++++
>> > Adjunct Associate Professor, Computer Science Department
>> > University of Southern California, Los Angeles, CA 90089 USA
>> > ++
>> >
>> >
>> >
>> >
>> >
>> >
>> > -Original Message-
>> > From: Tyler Palsulich 
>> > Reply-To: "dev@tika.apache.org" 
>> > Date: Saturday, April 18, 2015 at 11:29 PM
>> > To: "dev@tika.apache.org" 
>> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
>> >
>> >> Hi Folks,
>> >>
>> >> If there are no blocking complaints (OSGi?) by Monday (a little longer
>> >> than
>> >> 3 days, I realize), I'll mark this as passed and finish the release
>> >> process.
>> >>
>> >> Of course, it's no problem for me to cut another RC, if it's needed.
>> >>
>> >> Have a great weekend!
>> >> Tyler
>> >> I've run into one problem while testing Tika 1.8 with Bixo
>> >>
>> >> It involves a dependency issue involving (of course) Guava, since that
>> >> project loves to break their API :(
>> >>
>> >> The bixo-core jar has these transitive dependencies on various
>> versions of
>> >> Guava:
>> >>
>> >> Hadoop - 11.0.2
>> >> Cascading - 14.0.1
>> >> Tika-parsers - 10.0.1
>> >>   cdm - 17.0
>> >>
>> >> Everyone winds up using version 10.0.1 (note that Tika has a
>> dependency on
>> >> cdm, which wants to use 17.0)
>> >>
>> >> The problem is that Hadoop (for any recent version) 

Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Julien Nioche
I haven't tested the RC with Behemoth, it will probably have the same issue
but I'll do like you and defer the update if that's the case.

On 20 April 2015 at 15:23, Ken Krugler  wrote:

>
> > From: Allison, Timothy B.
> > Sent: April 20, 2015 5:11:04am PDT
> > To: dev@tika.apache.org
> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
> >
> > If I understand correctly, if we release rc2, Tika 1.8 will break in
> Hadoop clusters across the land?!
> > Or, Hadoop folks will have to apply a classloading workaround or rebuild
> 1.8/trunk with small version mod in TIKA-1606 to get Tika to work.
> >
> > For most Hadoopites, this will be a straightforward fix, and I'm
> assuming that's why Ken is not more outspoken against releasing rc2 as is
> (Ken, let me know if I'm wrong!).
>
> Usually it's straightforward. Though whenever you start manipulating the
> classloader logic, you can get odd results.
>
> E.g. by forcing your job jar's dependencies to show up first, now you can
> have an issue where one of your jars masks an older/newer version that
> Hadoop needs, so the job fails for some other reason.
>
> But yes, I don't feel strongly enough about this to vote -1, as I don't
> think there are that many people using Tika with Hadoop.
>
> For Bixo, I'd defer updating the Tika dependency until another version is
> released.
>
> Don't know about Behemoth - Julien?
>
> -- Ken
>
>
> > For other users, though, say, in healthcare, where code security review
> is stringent, this could be a real pain, no?
> >
> > Am I understanding correctly what will happen?  If so, do we really want
> to do this?
> >
> >
> > -----Original Message-
> > From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
> > Sent: Saturday, April 18, 2015 11:48 PM
> > To: dev@tika.apache.org
> > Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2
> >
> > +1 to pushing on Monday - if we have to roll a 1.9 quickly
> > after, we can :)
> >
> > ++
> > Chris Mattmann, Ph.D.
> > Chief Architect
> > Instrument Software and Science Data Systems Section (398)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 168-519, Mailstop: 168-527
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++
> > Adjunct Associate Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++
> >
> >
> >
> >
> >
> >
> > -Original Message-
> > From: Tyler Palsulich 
> > Reply-To: "dev@tika.apache.org" 
> > Date: Saturday, April 18, 2015 at 11:29 PM
> > To: "dev@tika.apache.org" 
> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
> >
> >> Hi Folks,
> >>
> >> If there are no blocking complaints (OSGi?) by Monday (a little longer
> >> than
> >> 3 days, I realize), I'll mark this as passed and finish the release
> >> process.
> >>
> >> Of course, it's no problem for me to cut another RC, if it's needed.
> >>
> >> Have a great weekend!
> >> Tyler
> >> I've run into one problem while testing Tika 1.8 with Bixo
> >>
> >> It involves a dependency issue involving (of course) Guava, since that
> >> project loves to break their API :(
> >>
> >> The bixo-core jar has these transitive dependencies on various versions
> of
> >> Guava:
> >>
> >> Hadoop - 11.0.2
> >> Cascading - 14.0.1
> >> Tika-parsers - 10.0.1
> >>   cdm - 17.0
> >>
> >> Everyone winds up using version 10.0.1 (note that Tika has a dependency
> on
> >> cdm, which wants to use 17.0)
> >>
> >> The problem is that Hadoop (for any recent version) uses an API from
> >> Guava's cache implementation that no longer exists:
> >>
> >>
> com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
> >> oader;)Lcom/google/common/cache/LoadingCache;
> >> java.lang.NoSuchMethodError:
> >>
> com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
> >> oader;)Lcom/google/common/cache/LoadingCache;
> >>   at
> >> org.apache.hadoop.io.compress.CodecPool.createCache(C

RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Ken Krugler

> From: Allison, Timothy B.
> Sent: April 20, 2015 5:11:04am PDT
> To: dev@tika.apache.org
> Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
> 
> If I understand correctly, if we release rc2, Tika 1.8 will break in Hadoop 
> clusters across the land?!
> Or, Hadoop folks will have to apply a classloading workaround or rebuild 
> 1.8/trunk with small version mod in TIKA-1606 to get Tika to work.
> 
> For most Hadoopites, this will be a straightforward fix, and I'm assuming 
> that's why Ken is not more outspoken against releasing rc2 as is (Ken, let me 
> know if I'm wrong!).  

Usually it's straightforward. Though whenever you start manipulating the 
classloader logic, you can get odd results.

E.g. by forcing your job jar's dependencies to show up first, now you can have 
an issue where one of your jars masks an older/newer version that Hadoop needs, 
so the job fails for some other reason.

But yes, I don't feel strongly enough about this to vote -1, as I don't think 
there are that many people using Tika with Hadoop.

For Bixo, I'd defer updating the Tika dependency until another version is 
released.

Don't know about Behemoth - Julien?

-- Ken


> For other users, though, say, in healthcare, where code security review is 
> stringent, this could be a real pain, no?
> 
> Am I understanding correctly what will happen?  If so, do we really want to 
> do this?
> 
> 
> -Original Message-
> From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
> Sent: Saturday, April 18, 2015 11:48 PM
> To: dev@tika.apache.org
> Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2
> 
> +1 to pushing on Monday - if we have to roll a 1.9 quickly
> after, we can :)
> 
> ++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
> 
> 
> 
> 
> 
> 
> -Original Message-
> From: Tyler Palsulich 
> Reply-To: "dev@tika.apache.org" 
> Date: Saturday, April 18, 2015 at 11:29 PM
> To: "dev@tika.apache.org" 
> Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
> 
>> Hi Folks,
>> 
>> If there are no blocking complaints (OSGi?) by Monday (a little longer
>> than
>> 3 days, I realize), I'll mark this as passed and finish the release
>> process.
>> 
>> Of course, it's no problem for me to cut another RC, if it's needed.
>> 
>> Have a great weekend!
>> Tyler
>> I've run into one problem while testing Tika 1.8 with Bixo
>> 
>> It involves a dependency issue involving (of course) Guava, since that
>> project loves to break their API :(
>> 
>> The bixo-core jar has these transitive dependencies on various versions of
>> Guava:
>> 
>> Hadoop - 11.0.2
>> Cascading - 14.0.1
>> Tika-parsers - 10.0.1
>>   cdm - 17.0
>> 
>> Everyone winds up using version 10.0.1 (note that Tika has a dependency on
>> cdm, which wants to use 17.0)
>> 
>> The problem is that Hadoop (for any recent version) uses an API from
>> Guava's cache implementation that no longer exists:
>> 
>> com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
>> oader;)Lcom/google/common/cache/LoadingCache;
>> java.lang.NoSuchMethodError:
>> com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
>> oader;)Lcom/google/common/cache/LoadingCache;
>>   at
>> org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
>>   at
>> org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74)
>>   at
>> org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
>>   at
>> org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutp
>> utFormat.java:79)
>> 
>> So what this means is that anyone trying to use Tika with Hadoop will need
>> to play games with the class loader to get the older version of Guava -
>> though that can cause other issues if Hadoop (or Cascading, etc) rely on
>> anything that's only in the newer Guava API.
>> 
>> Guava 1.0.01 was released about

Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Mattmann, Chris A (3980)
Hey Tim,

Yeah I think you understood it correctly - however, someone
in e.g., healthcare, or at NASA for example, can always grab
the latest trunk SNAPSHOT which works fine and includes Ken’s
TIKA-1606 fix. If we find many users and others complaining
about 1.8, we can always rapidly release 1.9-SNAPSHOT and go
through the VOTE’ing process on that too, right?

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: , "Timothy B." 
Reply-To: "dev@tika.apache.org" 
Date: Monday, April 20, 2015 at 8:11 AM
To: "dev@tika.apache.org" 
Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2

>If I understand correctly, if we release rc2, Tika 1.8 will break in
>Hadoop clusters across the land?!
>Or, Hadoop folks will have to apply a classloading workaround or rebuild
>1.8/trunk with small version mod in TIKA-1606 to get Tika to work.
>
>For most Hadoopites, this will be a straightforward fix, and I'm assuming
>that's why Ken is not more outspoken against releasing rc2 as is (Ken,
>let me know if I'm wrong!).  For other users, though, say, in healthcare,
>where code security review is stringent, this could be a real pain, no?
>
>Am I understanding correctly what will happen?  If so, do we really want
>to do this?
>
>
>-Original Message-
>From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
>Sent: Saturday, April 18, 2015 11:48 PM
>To: dev@tika.apache.org
>Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2
>
>+1 to pushing on Monday - if we have to roll a 1.9 quickly
>after, we can :)
>
>++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattm...@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++
>
>
>
>
>
>
>-Original Message-----
>From: Tyler Palsulich 
>Reply-To: "dev@tika.apache.org" 
>Date: Saturday, April 18, 2015 at 11:29 PM
>To: "dev@tika.apache.org" 
>Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
>
>>Hi Folks,
>>
>>If there are no blocking complaints (OSGi?) by Monday (a little longer
>>than
>>3 days, I realize), I'll mark this as passed and finish the release
>>process.
>>
>>Of course, it's no problem for me to cut another RC, if it's needed.
>>
>>Have a great weekend!
>>Tyler
>>I've run into one problem while testing Tika 1.8 with Bixo
>>
>>It involves a dependency issue involving (of course) Guava, since that
>>project loves to break their API :(
>>
>>The bixo-core jar has these transitive dependencies on various versions
>>of
>>Guava:
>>
>>Hadoop - 11.0.2
>>Cascading - 14.0.1
>>Tika-parsers - 10.0.1
>>cdm - 17.0
>>
>>Everyone winds up using version 10.0.1 (note that Tika has a dependency
>>on
>>cdm, which wants to use 17.0)
>>
>>The problem is that Hadoop (for any recent version) uses an API from
>>Guava's cache implementation that no longer exists:
>>
>>com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/Cache
>>L
>>oader;)Lcom/google/common/cache/LoadingCache;
>>java.lang.NoSuchMethodError:
>>com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/Cache
>>L
>>oader;)Lcom/google/common/cache/LoadingCache;
>>at
>>org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
>>at
>>org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74)
>>at
>>org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
>>at
>>org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOut
>>p
>>utFormat.java:79)
>>

RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Allison, Timothy B.
If I understand correctly, if we release rc2, Tika 1.8 will break in Hadoop 
clusters across the land?!
Or, Hadoop folks will have to apply a classloading workaround or rebuild 
1.8/trunk with small version mod in TIKA-1606 to get Tika to work.

For most Hadoopites, this will be a straightforward fix, and I'm assuming 
that's why Ken is not more outspoken against releasing rc2 as is (Ken, let me 
know if I'm wrong!).  For other users, though, say, in healthcare, where code 
security review is stringent, this could be a real pain, no?

Am I understanding correctly what will happen?  If so, do we really want to do 
this?


-Original Message-
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Saturday, April 18, 2015 11:48 PM
To: dev@tika.apache.org
Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2

+1 to pushing on Monday - if we have to roll a 1.9 quickly
after, we can :)

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Tyler Palsulich 
Reply-To: "dev@tika.apache.org" 
Date: Saturday, April 18, 2015 at 11:29 PM
To: "dev@tika.apache.org" 
Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2

>Hi Folks,
>
>If there are no blocking complaints (OSGi?) by Monday (a little longer
>than
>3 days, I realize), I'll mark this as passed and finish the release
>process.
>
>Of course, it's no problem for me to cut another RC, if it's needed.
>
>Have a great weekend!
>Tyler
>I've run into one problem while testing Tika 1.8 with Bixo
>
>It involves a dependency issue involving (of course) Guava, since that
>project loves to break their API :(
>
>The bixo-core jar has these transitive dependencies on various versions of
>Guava:
>
>Hadoop - 11.0.2
>Cascading - 14.0.1
>Tika-parsers - 10.0.1
>cdm - 17.0
>
>Everyone winds up using version 10.0.1 (note that Tika has a dependency on
>cdm, which wants to use 17.0)
>
>The problem is that Hadoop (for any recent version) uses an API from
>Guava's cache implementation that no longer exists:
>
>com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
>oader;)Lcom/google/common/cache/LoadingCache;
>java.lang.NoSuchMethodError:
>com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
>oader;)Lcom/google/common/cache/LoadingCache;
>at
>org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
>at
>org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74)
>at
>org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
>at
>org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutp
>utFormat.java:79)
>
>So what this means is that anyone trying to use Tika with Hadoop will need
>to play games with the class loader to get the older version of Guava -
>though that can cause other issues if Hadoop (or Cascading, etc) rely on
>anything that's only in the newer Guava API.
>
>Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3
>years ago. So it seems like we should upgrade to at least 11.0.2
>
>But I don't know if this is enough of an issue to require another RC.
>
>-- Ken
>
>PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track
>this.
>
>
>> From: Tyler Palsulich
>> Sent: April 13, 2015 10:56:29am PDT
>> To: dev@tika.apache.org, u...@tika.apache.org
>> Subject: [VOTE] Apache Tika 1.8 Release Candidate #2
>>
>> Hi Folks,
>>
>> A candidate for the Tika 1.8 release is available at:
>>   https://dist.apache.org/repos/dist/dev/tika/
>>
>> The release candidate is a zip archive of the sources in:
>>   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
>>
>> The SHA1 checksum of the archive is
>>   5e22fee9079370398472e59082d171ae2d7fdd31.
>>
>> In addition, a staged maven repository is available here:
>>   https://repository.apache.org/content/repositories/orgapachetika-1009
>>
>> Please vote on releasing this package as Apache Tika 1.8. The vote is
>open for the next 72 hours and passes if a majority of at least three +1
>Tika PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Tika 1.8
>> [ ] ±0 I don't object to this release, but I haven't checked it
>> [ ] -1 Do not release this package because...
>>
>> Thanks,
>> Tyler
>
>
>--
>Ken Krugler
>+1 530-210-6378
>http://www.scaleunlimited.com
>custom big data solutions & training
>Hadoop, Cascading, Cassandra & Solr



Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-19 Thread Tyler Palsulich
Hi Ken,

Sorry for the delayed response. No, that patch is not included in this RC
(as I think you know, given your resolution of TIKA-1606).

Have a good night,
Tyler

On Sun, Apr 19, 2015 at 10:49 AM, Ken Krugler 
wrote:

> Hi Tyler,
>
> Does this include Lewis's fix for
> https://issues.apache.org/jira/browse/TIKA-1606?
>
> It's a simple change (bumping the Guava version), but as seen this can
> have unexpected consequences.
>
> I'm fine either way.
>
> -- Ken
>
> > From: Tyler Palsulich
> > Sent: April 18, 2015 8:29:22pm PDT
> > To: dev@tika.apache.org
> > Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
> >
> > Hi Folks,
> >
> > If there are no blocking complaints (OSGi?) by Monday (a little longer
> than
> > 3 days, I realize), I'll mark this as passed and finish the release
> process.
> >
> > Of course, it's no problem for me to cut another RC, if it's needed.
> >
> > Have a great weekend!
> > Tyler
> > I've run into one problem while testing Tika 1.8 with Bixo
> >
> > It involves a dependency issue involving (of course) Guava, since that
> > project loves to break their API :(
> >
> > The bixo-core jar has these transitive dependencies on various versions
> of
> > Guava:
> >
> > Hadoop - 11.0.2
> > Cascading - 14.0.1
> > Tika-parsers - 10.0.1
> >cdm - 17.0
> >
> > Everyone winds up using version 10.0.1 (note that Tika has a dependency
> on
> > cdm, which wants to use 17.0)
> >
> > The problem is that Hadoop (for any recent version) uses an API from
> > Guava's cache implementation that no longer exists:
> >
> >
> com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
> > java.lang.NoSuchMethodError:
> >
> com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
> >at
> > org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
> >at
> > org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74)
> >at
> > org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
> >at
> >
> org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:79)
> >
> > So what this means is that anyone trying to use Tika with Hadoop will
> need
> > to play games with the class loader to get the older version of Guava -
> > though that can cause other issues if Hadoop (or Cascading, etc) rely on
> > anything that's only in the newer Guava API.
> >
> > Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3
> > years ago. So it seems like we should upgrade to at least 11.0.2
> >
> > But I don't know if this is enough of an issue to require another RC.
> >
> > -- Ken
> >
> > PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to
> track
> > this.
> >
> >
> >> From: Tyler Palsulich
> >> Sent: April 13, 2015 10:56:29am PDT
> >> To: dev@tika.apache.org, u...@tika.apache.org
> >> Subject: [VOTE] Apache Tika 1.8 Release Candidate #2
> >>
> >> Hi Folks,
> >>
> >> A candidate for the Tika 1.8 release is available at:
> >>  https://dist.apache.org/repos/dist/dev/tika/
> >>
> >> The release candidate is a zip archive of the sources in:
> >>  http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
> >>
> >> The SHA1 checksum of the archive is
> >>  5e22fee9079370398472e59082d171ae2d7fdd31.
> >>
> >> In addition, a staged maven repository is available here:
> >>  https://repository.apache.org/content/repositories/orgapachetika-1009
> >>
> >> Please vote on releasing this package as Apache Tika 1.8. The vote is
> > open for the next 72 hours and passes if a majority of at least three +1
> > Tika PMC votes are cast.
> >>
> >> [ ] +1 Release this package as Apache Tika 1.8
> >> [ ] ±0 I don't object to this release, but I haven't checked it
> >> [ ] -1 Do not release this package because...
> >>
> >> Thanks,
> >> Tyler
>
>
> --
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>


RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-19 Thread Ken Krugler
Hi Tyler,

Does this include Lewis's fix for 
https://issues.apache.org/jira/browse/TIKA-1606?

It's a simple change (bumping the Guava version), but as seen this can have 
unexpected consequences.

I'm fine either way.

-- Ken

> From: Tyler Palsulich
> Sent: April 18, 2015 8:29:22pm PDT
> To: dev@tika.apache.org
> Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
> 
> Hi Folks,
> 
> If there are no blocking complaints (OSGi?) by Monday (a little longer than
> 3 days, I realize), I'll mark this as passed and finish the release process.
> 
> Of course, it's no problem for me to cut another RC, if it's needed.
> 
> Have a great weekend!
> Tyler
> I've run into one problem while testing Tika 1.8 with Bixo
> 
> It involves a dependency issue involving (of course) Guava, since that
> project loves to break their API :(
> 
> The bixo-core jar has these transitive dependencies on various versions of
> Guava:
> 
> Hadoop - 11.0.2
> Cascading - 14.0.1
> Tika-parsers - 10.0.1
>cdm - 17.0
> 
> Everyone winds up using version 10.0.1 (note that Tika has a dependency on
> cdm, which wants to use 17.0)
> 
> The problem is that Hadoop (for any recent version) uses an API from
> Guava's cache implementation that no longer exists:
> 
> com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
> java.lang.NoSuchMethodError:
> com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
>at
> org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
>at
> org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74)
>at
> org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
>at
> org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:79)
> 
> So what this means is that anyone trying to use Tika with Hadoop will need
> to play games with the class loader to get the older version of Guava -
> though that can cause other issues if Hadoop (or Cascading, etc) rely on
> anything that's only in the newer Guava API.
> 
> Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3
> years ago. So it seems like we should upgrade to at least 11.0.2
> 
> But I don't know if this is enough of an issue to require another RC.
> 
> -- Ken
> 
> PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track
> this.
> 
> 
>> From: Tyler Palsulich
>> Sent: April 13, 2015 10:56:29am PDT
>> To: dev@tika.apache.org, u...@tika.apache.org
>> Subject: [VOTE] Apache Tika 1.8 Release Candidate #2
>> 
>> Hi Folks,
>> 
>> A candidate for the Tika 1.8 release is available at:
>>  https://dist.apache.org/repos/dist/dev/tika/
>> 
>> The release candidate is a zip archive of the sources in:
>>  http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
>> 
>> The SHA1 checksum of the archive is
>>  5e22fee9079370398472e59082d171ae2d7fdd31.
>> 
>> In addition, a staged maven repository is available here:
>>  https://repository.apache.org/content/repositories/orgapachetika-1009
>> 
>> Please vote on releasing this package as Apache Tika 1.8. The vote is
> open for the next 72 hours and passes if a majority of at least three +1
> Tika PMC votes are cast.
>> 
>> [ ] +1 Release this package as Apache Tika 1.8
>> [ ] ±0 I don't object to this release, but I haven't checked it
>> [ ] -1 Do not release this package because...
>> 
>> Thanks,
>> Tyler


--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr







Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-18 Thread Mattmann, Chris A (3980)
+1 to pushing on Monday - if we have to roll a 1.9 quickly
after, we can :)

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Tyler Palsulich 
Reply-To: "dev@tika.apache.org" 
Date: Saturday, April 18, 2015 at 11:29 PM
To: "dev@tika.apache.org" 
Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2

>Hi Folks,
>
>If there are no blocking complaints (OSGi?) by Monday (a little longer
>than
>3 days, I realize), I'll mark this as passed and finish the release
>process.
>
>Of course, it's no problem for me to cut another RC, if it's needed.
>
>Have a great weekend!
>Tyler
>I've run into one problem while testing Tika 1.8 with Bixo
>
>It involves a dependency issue involving (of course) Guava, since that
>project loves to break their API :(
>
>The bixo-core jar has these transitive dependencies on various versions of
>Guava:
>
>Hadoop - 11.0.2
>Cascading - 14.0.1
>Tika-parsers - 10.0.1
>cdm - 17.0
>
>Everyone winds up using version 10.0.1 (note that Tika has a dependency on
>cdm, which wants to use 17.0)
>
>The problem is that Hadoop (for any recent version) uses an API from
>Guava's cache implementation that no longer exists:
>
>com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
>oader;)Lcom/google/common/cache/LoadingCache;
>java.lang.NoSuchMethodError:
>com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
>oader;)Lcom/google/common/cache/LoadingCache;
>at
>org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
>at
>org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74)
>at
>org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
>at
>org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutp
>utFormat.java:79)
>
>So what this means is that anyone trying to use Tika with Hadoop will need
>to play games with the class loader to get the older version of Guava -
>though that can cause other issues if Hadoop (or Cascading, etc) rely on
>anything that's only in the newer Guava API.
>
>Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3
>years ago. So it seems like we should upgrade to at least 11.0.2
>
>But I don't know if this is enough of an issue to require another RC.
>
>-- Ken
>
>PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track
>this.
>
>
>> From: Tyler Palsulich
>> Sent: April 13, 2015 10:56:29am PDT
>> To: dev@tika.apache.org, u...@tika.apache.org
>> Subject: [VOTE] Apache Tika 1.8 Release Candidate #2
>>
>> Hi Folks,
>>
>> A candidate for the Tika 1.8 release is available at:
>>   https://dist.apache.org/repos/dist/dev/tika/
>>
>> The release candidate is a zip archive of the sources in:
>>   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
>>
>> The SHA1 checksum of the archive is
>>   5e22fee9079370398472e59082d171ae2d7fdd31.
>>
>> In addition, a staged maven repository is available here:
>>   https://repository.apache.org/content/repositories/orgapachetika-1009
>>
>> Please vote on releasing this package as Apache Tika 1.8. The vote is
>open for the next 72 hours and passes if a majority of at least three +1
>Tika PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Tika 1.8
>> [ ] ±0 I don't object to this release, but I haven't checked it
>> [ ] -1 Do not release this package because...
>>
>> Thanks,
>> Tyler
>
>
>--
>Ken Krugler
>+1 530-210-6378
>http://www.scaleunlimited.com
>custom big data solutions & training
>Hadoop, Cascading, Cassandra & Solr



RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-18 Thread Tyler Palsulich
Hi Folks,

If there are no blocking complaints (OSGi?) by Monday (a little longer than
3 days, I realize), I'll mark this as passed and finish the release process.

Of course, it's no problem for me to cut another RC, if it's needed.

Have a great weekend!
Tyler
I've run into one problem while testing Tika 1.8 with Bixo

It involves a dependency issue involving (of course) Guava, since that
project loves to break their API :(

The bixo-core jar has these transitive dependencies on various versions of
Guava:

Hadoop - 11.0.2
Cascading - 14.0.1
Tika-parsers - 10.0.1
cdm - 17.0

Everyone winds up using version 10.0.1 (note that Tika has a dependency on
cdm, which wants to use 17.0)

The problem is that Hadoop (for any recent version) uses an API from
Guava's cache implementation that no longer exists:

com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
java.lang.NoSuchMethodError:
com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
at
org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
at
org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74)
at
org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
at
org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:79)

So what this means is that anyone trying to use Tika with Hadoop will need
to play games with the class loader to get the older version of Guava -
though that can cause other issues if Hadoop (or Cascading, etc) rely on
anything that's only in the newer Guava API.

Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3
years ago. So it seems like we should upgrade to at least 11.0.2

But I don't know if this is enough of an issue to require another RC.

-- Ken

PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track
this.


> From: Tyler Palsulich
> Sent: April 13, 2015 10:56:29am PDT
> To: dev@tika.apache.org, u...@tika.apache.org
> Subject: [VOTE] Apache Tika 1.8 Release Candidate #2
>
> Hi Folks,
>
> A candidate for the Tika 1.8 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
>
> The SHA1 checksum of the archive is
>   5e22fee9079370398472e59082d171ae2d7fdd31.
>
> In addition, a staged maven repository is available here:
>   https://repository.apache.org/content/repositories/orgapachetika-1009
>
> Please vote on releasing this package as Apache Tika 1.8. The vote is
open for the next 72 hours and passes if a majority of at least three +1
Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.8
> [ ] ±0 I don't object to this release, but I haven't checked it
> [ ] -1 Do not release this package because...
>
> Thanks,
> Tyler


--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr


RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-15 Thread Ken Krugler
I've run into one problem while testing Tika 1.8 with Bixo

It involves a dependency issue involving (of course) Guava, since that project 
loves to break their API :(

The bixo-core jar has these transitive dependencies on various versions of 
Guava:

Hadoop - 11.0.2
Cascading - 14.0.1
Tika-parsers - 10.0.1
cdm - 17.0

Everyone winds up using version 10.0.1 (note that Tika has a dependency on cdm, 
which wants to use 17.0)

The problem is that Hadoop (for any recent version) uses an API from Guava's 
cache implementation that no longer exists:

com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
java.lang.NoSuchMethodError: 
com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
at 
org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
at org.apache.hadoop.io.compress.CodecPool.(CodecPool.java:74)
at 
org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
at 
org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:79)

So what this means is that anyone trying to use Tika with Hadoop will need to 
play games with the class loader to get the older version of Guava - though 
that can cause other issues if Hadoop (or Cascading, etc) rely on anything 
that's only in the newer Guava API.

Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3 years 
ago. So it seems like we should upgrade to at least 11.0.2

But I don't know if this is enough of an issue to require another RC.

-- Ken

PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track this.


> From: Tyler Palsulich
> Sent: April 13, 2015 10:56:29am PDT
> To: dev@tika.apache.org, u...@tika.apache.org
> Subject: [VOTE] Apache Tika 1.8 Release Candidate #2
> 
> Hi Folks,
> 
> A candidate for the Tika 1.8 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/
> 
> The release candidate is a zip archive of the sources in:
>   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
> 
> The SHA1 checksum of the archive is
>   5e22fee9079370398472e59082d171ae2d7fdd31.
> 
> In addition, a staged maven repository is available here:
>   https://repository.apache.org/content/repositories/orgapachetika-1009
> 
> Please vote on releasing this package as Apache Tika 1.8. The vote is open 
> for the next 72 hours and passes if a majority of at least three +1 Tika PMC 
> votes are cast.
> 
> [ ] +1 Release this package as Apache Tika 1.8
> [ ] ±0 I don't object to this release, but I haven't checked it
> [ ] -1 Do not release this package because...
> 
> Thanks,
> Tyler


--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr







Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-15 Thread Konstantin Gribov
Hi, folks.

All tests pass, checksum and gpg signature for tika-1.8-src.zip are fine.
Checked on ArchLinux x86_64, openjdk 7u75, w/ tesseract.

Thank you, Tyler.

[x] +1 Release this package as Apache Tika 1.8
[ ] ±0 I don't object to this release, but I haven't checked it
[ ] -1 Do not release this package because...

-- 
Best regards,
Konstantin Gribov

пн, 13 апр. 2015 г. в 20:56, Tyler Palsulich :

> Hi Folks,
>
> A candidate for the Tika 1.8 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
>
> The SHA1 checksum of the archive is
>   5e22fee9079370398472e59082d171ae2d7fdd31.
>
> In addition, a staged maven repository is available here:
>   https://repository.apache.org/content/repositories/orgapachetika-1009
>
> Please vote on releasing this package as Apache Tika 1.8. The vote is open
> for the next 72 hours and passes if a majority of at least three +1 Tika
> PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.8
> [ ] ±0 I don't object to this release, but I haven't checked it
> [ ] -1 Do not release this package because...
>
> Thanks,
> Tyler
>


Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-15 Thread Oleg Tikhonov
Hi Tyler,

good job, indeed !!!

[x] +1 Release this package as Apache Tika 1.8

On Wed, Apr 15, 2015 at 8:22 AM, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Thanks Tyler! +1 from me:
>
> SIGS, checksums check out:
>
>
> [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc
> tika 1.8-src https://dist.apache.org/repos/dist/dev/tika/
>
>   % Total% Received % Xferd  Average Speed   TimeTime Time
> Current
>
>  Dload  Upload   Total   SpentLeft
> Speed
>
> 100 69.2M  100 69.2M0 0  1524k  0  0:00:46  0:00:46 --:--:--
> 1661k
>
>   % Total% Received % Xferd  Average Speed   TimeTime Time
> Current
>
>  Dload  Upload   Total   SpentLeft
> Speed
>
> 100   473  100   4730 0874  0 --:--:-- --:--:-- --:--:--
>  874
>
>   % Total% Received % Xferd  Average Speed   TimeTime Time
> Current
>
>  Dload  Upload   Total   SpentLeft
> Speed
>
> 10033  100330 0 62  0 --:--:-- --:--:-- --:--:--
>   62
>
> [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc
> tika-app 1.8 https://dist.apache.org/repos/dist/dev/tika/
>
>   % Total% Received % Xferd  Average Speed   TimeTime Time
> Current
>
>  Dload  Upload   Total   SpentLeft
> Speed
>
> 100 44.0M  100 44.0M0 0  1742k  0  0:00:25  0:00:25 --:--:--
> 1825k
>
>   % Total% Received % Xferd  Average Speed   TimeTime Time
> Current
>
>  Dload  Upload   Total   SpentLeft
> Speed
>
> 100   473  100   4730 0922  0 --:--:-- --:--:-- --:--:--
>  922
>
>   % Total% Received % Xferd  Average Speed   TimeTime Time
> Current
>
>  Dload  Upload   Total   SpentLeft
> Speed
>
> 10033  100330 0 63  0 --:--:-- --:--:-- --:--:--
>   63
>
> [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc
> tika-server 1.8 https://dist.apache.org/repos/dist/dev/tika/
>
>   % Total% Received % Xferd  Average Speed   TimeTime Time
> Current
>
>  Dload  Upload   Total   SpentLeft
> Speed
>
> 100 48.3M  100 48.3M0 0  1379k  0  0:00:35  0:00:35 --:--:--
> 1569k
>
>   % Total% Received % Xferd  Average Speed   TimeTime Time
> Current
>
>  Dload  Upload   Total   SpentLeft
> Speed
>
> 100   473  100   4730 0891  0 --:--:-- --:--:-- --:--:--
>  892
>
>   % Total% Received % Xferd  Average Speed   TimeTime Time
> Current
>
>  Dload  Upload   Total   SpentLeft
> Speed
>
> 10033  100330 0 62  0 --:--:-- --:--:-- --:--:--
>   62
>
> [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann%
>
>
> [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/verify_gpg_sigs
>
> Verifying Signature for file tika-1.8-src.zip.asc
>
> gpg: Signature made Mon Apr 13 13:46:39 2015 EDT using RSA key ID D4F10117
>
> gpg: Good signature from "Tyler Palsulich "
>
> gpg: WARNING: This key is not certified with a trusted signature!
>
> gpg:  There is no indication that the signature belongs to the
> owner.
>
> Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4  183E 8810 BB19 D4F1 0117
>
> Verifying Signature for file tika-app-1.8.jar.asc
>
> gpg: Signature made Mon Apr 13 13:43:13 2015 EDT using RSA key ID D4F10117
>
> gpg: Good signature from "Tyler Palsulich "
>
> gpg: WARNING: This key is not certified with a trusted signature!
>
> gpg:  There is no indication that the signature belongs to the
> owner.
>
> Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4  183E 8810 BB19 D4F1 0117
>
> Verifying Signature for file tika-server-1.8.jar.asc
>
> gpg: Signature made Mon Apr 13 13:45:00 2015 EDT using RSA key ID D4F10117
>
> gpg: Good signature from "Tyler Palsulich "
>
> gpg: WARNING: This key is not certified with a trusted signature!
>
> gpg:  There is no indication that the signature belongs to the
> owner.
>
> Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4  183E 8810 BB19 D4F1 0117
>
> [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann%
> $HOME/bin/verify_md5_checksums
>
> md5sum: stat '*.tar.gz': No such file or directory
>
> md5sum: stat '*.bz2': No such file or directory
>
> md5sum: stat '*.tgz': No such file or directory
>
> tika-1.8-src.zip: OK
>
> [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann%
>
> Cheers!
>
> Chris
>
> 
> From: Tyler Palsulich [tpalsul...@apache.org]
> Sent: Monday, April 13, 2015 10:56 AM
> To: dev@tika.apache.org; u...@tika.apache.org
> Subject: [VOTE] Apache Tika 1.8 Release Candidate #2
>
> Hi Folks,
>
> A candidate for the Tika 1.8 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/
>
> T

RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-14 Thread Mattmann, Chris A (3980)
Thanks Tyler! +1 from me:

SIGS, checksums check out:


[chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc tika 
1.8-src https://dist.apache.org/repos/dist/dev/tika/

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

100 69.2M  100 69.2M0 0  1524k  0  0:00:46  0:00:46 --:--:-- 1661k

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

100   473  100   4730 0874  0 --:--:-- --:--:-- --:--:--   874

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

10033  100330 0 62  0 --:--:-- --:--:-- --:--:--62

[chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc 
tika-app 1.8 https://dist.apache.org/repos/dist/dev/tika/

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

100 44.0M  100 44.0M0 0  1742k  0  0:00:25  0:00:25 --:--:-- 1825k

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

100   473  100   4730 0922  0 --:--:-- --:--:-- --:--:--   922

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

10033  100330 0 63  0 --:--:-- --:--:-- --:--:--63

[chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc 
tika-server 1.8 https://dist.apache.org/repos/dist/dev/tika/

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

100 48.3M  100 48.3M0 0  1379k  0  0:00:35  0:00:35 --:--:-- 1569k

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

100   473  100   4730 0891  0 --:--:-- --:--:-- --:--:--   892

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

10033  100330 0 62  0 --:--:-- --:--:-- --:--:--62

[chipotle:~/tmp/apache-tika-1.8-rc2] mattmann%


[chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/verify_gpg_sigs

Verifying Signature for file tika-1.8-src.zip.asc

gpg: Signature made Mon Apr 13 13:46:39 2015 EDT using RSA key ID D4F10117

gpg: Good signature from "Tyler Palsulich "

gpg: WARNING: This key is not certified with a trusted signature!

gpg:  There is no indication that the signature belongs to the owner.

Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4  183E 8810 BB19 D4F1 0117

Verifying Signature for file tika-app-1.8.jar.asc

gpg: Signature made Mon Apr 13 13:43:13 2015 EDT using RSA key ID D4F10117

gpg: Good signature from "Tyler Palsulich "

gpg: WARNING: This key is not certified with a trusted signature!

gpg:  There is no indication that the signature belongs to the owner.

Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4  183E 8810 BB19 D4F1 0117

Verifying Signature for file tika-server-1.8.jar.asc

gpg: Signature made Mon Apr 13 13:45:00 2015 EDT using RSA key ID D4F10117

gpg: Good signature from "Tyler Palsulich "

gpg: WARNING: This key is not certified with a trusted signature!

gpg:  There is no indication that the signature belongs to the owner.

Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4  183E 8810 BB19 D4F1 0117

[chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/verify_md5_checksums

md5sum: stat '*.tar.gz': No such file or directory

md5sum: stat '*.bz2': No such file or directory

md5sum: stat '*.tgz': No such file or directory

tika-1.8-src.zip: OK

[chipotle:~/tmp/apache-tika-1.8-rc2] mattmann%

Cheers!

Chris


From: Tyler Palsulich [tpalsul...@apache.org]
Sent: Monday, April 13, 2015 10:56 AM
To: dev@tika.apache.org; u...@tika.apache.org
Subject: [VOTE] Apache Tika 1.8 Release Candidate #2

Hi Folks,

A candidate for the Tika 1.8 release is available at:
  https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:
  http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/

The SHA1 checksum of the archive is
  5e22fee9079370398472e59082d171ae2d7fdd31.

In addition, a staged maven repository is available here:
  https://repository.apache.org/content/repositories/orgapachetika-1009

Please vote on releasing this package as Apache Tika 1.8. The vote is open for 
the next 72 hours and

Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-14 Thread Julien Nioche
Hi Tim

Great to hear that you managed to use the dataset from CommonCrawl. Thanks!

Julien

On 14 April 2015 at 14:15, Allison, Timothy B.  wrote:

> +1
>
> Thank you, Tyler!
>
> Apologies to Hong-Thai and community for not recognizing the severity of
> TIKA-1600 when I voted in favor of rc1!
>
> Details...
>
> I reran against govdocs1, and there aren't any major surprises.
>
> On our Rackspace vm, I  _finally_ unzipped the Common Crawl slice that
> Julien Nioche created for us, and I ran against that as well.  That turned
> up TIKA-1605 and another exceedingly rare NPE in the PDFParser.  I don't
> think either of these are blockers, and they're now fixed in trunk.
>
> There are slightly fewer metadata values for some jpegs.  For the one file
> that I manually reviewed, 1.8-rc was missing these values (that were
> available in 1.7):
>
> JPEG quality
> IPTC-NAA record
> Plug-in 1 Data
>
> Comparison reports are available here (much more work remains to be done
> on tika-eval):
>
> https://github.com/tballison/share/tree/master/tika_comparisons
>
> 
> From: Tyler Palsulich 
> Sent: Monday, April 13, 2015 1:56 PM
> To: dev@tika.apache.org; u...@tika.apache.org
> Subject: [VOTE] Apache Tika 1.8 Release Candidate #2
>
> Hi Folks,
>
> A candidate for the Tika 1.8 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
>
> The SHA1 checksum of the archive is
>   5e22fee9079370398472e59082d171ae2d7fdd31.
>
> In addition, a staged maven repository is available here:
>   https://repository.apache.org/content/repositories/orgapachetika-1009
>
> Please vote on releasing this package as Apache Tika 1.8. The vote is open
> for the next 72 hours and passes if a majority of at least three +1 Tika
> PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.8
> [ ] ±0 I don't object to this release, but I haven't checked it
> [ ] -1 Do not release this package because...
>
> Thanks,
> Tyler
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble


Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-14 Thread Allison, Timothy B.
+1

Thank you, Tyler!

Apologies to Hong-Thai and community for not recognizing the severity of 
TIKA-1600 when I voted in favor of rc1!

Details...

I reran against govdocs1, and there aren't any major surprises.

On our Rackspace vm, I  _finally_ unzipped the Common Crawl slice that Julien 
Nioche created for us, and I ran against that as well.  That turned up 
TIKA-1605 and another exceedingly rare NPE in the PDFParser.  I don't think 
either of these are blockers, and they're now fixed in trunk.

There are slightly fewer metadata values for some jpegs.  For the one file that 
I manually reviewed, 1.8-rc was missing these values (that were available in 
1.7):

JPEG quality
IPTC-NAA record
Plug-in 1 Data

Comparison reports are available here (much more work remains to be done on 
tika-eval):

https://github.com/tballison/share/tree/master/tika_comparisons 


From: Tyler Palsulich 
Sent: Monday, April 13, 2015 1:56 PM
To: dev@tika.apache.org; u...@tika.apache.org
Subject: [VOTE] Apache Tika 1.8 Release Candidate #2

Hi Folks,

A candidate for the Tika 1.8 release is available at:
  https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:
  http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/

The SHA1 checksum of the archive is
  5e22fee9079370398472e59082d171ae2d7fdd31.

In addition, a staged maven repository is available here:
  https://repository.apache.org/content/repositories/orgapachetika-1009

Please vote on releasing this package as Apache Tika 1.8. The vote is open
for the next 72 hours and passes if a majority of at least three +1 Tika
PMC votes are cast.

[ ] +1 Release this package as Apache Tika 1.8
[ ] ±0 I don't object to this release, but I haven't checked it
[ ] -1 Do not release this package because...

Thanks,
Tyler


RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-14 Thread Hong-Thai Nguyen
Hi,

+1 for me.

Great work, Tyler !

Hong-Thai

-Message d'origine-
De : Tyler Palsulich [mailto:tpalsul...@apache.org] 
Envoyé : lundi 13 avril 2015 19:56
À : dev@tika.apache.org; u...@tika.apache.org
Objet : [VOTE] Apache Tika 1.8 Release Candidate #2

Hi Folks,

A candidate for the Tika 1.8 release is available at:
  https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:
  http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/

The SHA1 checksum of the archive is
  5e22fee9079370398472e59082d171ae2d7fdd31.

In addition, a staged maven repository is available here:
  https://repository.apache.org/content/repositories/orgapachetika-1009

Please vote on releasing this package as Apache Tika 1.8. The vote is open for 
the next 72 hours and passes if a majority of at least three +1 Tika PMC votes 
are cast.

[ ] +1 Release this package as Apache Tika 1.8 [ ] ±0 I don't object to this 
release, but I haven't checked it [ ] -1 Do not release this package because...

Thanks,
Tyler