[RESULT] [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Tyler Palsulich
Hi Everyone,

The VOTE to release Tika 1.8 RC #2 has passed with the following tally:

+1:
Chris Mattmann
Hong-Thai Nguyen
Konstantin Gribov
Lewis John Mcgibbney
Oleg Tikhonov
Tim Allison
Tyler Palsulich

±0:
None

-1:
None

I'll move forward with the release process now.

Thank you all for your VOTE and collaboration,
Tyler


Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Julien Nioche
I haven't tested the RC with Behemoth, it will probably have the same issue
but I'll do like you and defer the update if that's the case.

On 20 April 2015 at 15:23, Ken Krugler kkrugler_li...@transpac.com wrote:


  From: Allison, Timothy B.
  Sent: April 20, 2015 5:11:04am PDT
  To: dev@tika.apache.org
  Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
 
  If I understand correctly, if we release rc2, Tika 1.8 will break in
 Hadoop clusters across the land?!
  Or, Hadoop folks will have to apply a classloading workaround or rebuild
 1.8/trunk with small version mod in TIKA-1606 to get Tika to work.
 
  For most Hadoopites, this will be a straightforward fix, and I'm
 assuming that's why Ken is not more outspoken against releasing rc2 as is
 (Ken, let me know if I'm wrong!).

 Usually it's straightforward. Though whenever you start manipulating the
 classloader logic, you can get odd results.

 E.g. by forcing your job jar's dependencies to show up first, now you can
 have an issue where one of your jars masks an older/newer version that
 Hadoop needs, so the job fails for some other reason.

 But yes, I don't feel strongly enough about this to vote -1, as I don't
 think there are that many people using Tika with Hadoop.

 For Bixo, I'd defer updating the Tika dependency until another version is
 released.

 Don't know about Behemoth - Julien?

 -- Ken


  For other users, though, say, in healthcare, where code security review
 is stringent, this could be a real pain, no?
 
  Am I understanding correctly what will happen?  If so, do we really want
 to do this?
 
 
  -Original Message-
  From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
  Sent: Saturday, April 18, 2015 11:48 PM
  To: dev@tika.apache.org
  Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2
 
  +1 to pushing on Monday - if we have to roll a 1.9 quickly
  after, we can :)
 
  ++
  Chris Mattmann, Ph.D.
  Chief Architect
  Instrument Software and Science Data Systems Section (398)
  NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
  Office: 168-519, Mailstop: 168-527
  Email: chris.a.mattm...@nasa.gov
  WWW:  http://sunset.usc.edu/~mattmann/
  ++
  Adjunct Associate Professor, Computer Science Department
  University of Southern California, Los Angeles, CA 90089 USA
  ++
 
 
 
 
 
 
  -Original Message-
  From: Tyler Palsulich tpalsul...@gmail.com
  Reply-To: dev@tika.apache.org dev@tika.apache.org
  Date: Saturday, April 18, 2015 at 11:29 PM
  To: dev@tika.apache.org dev@tika.apache.org
  Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
 
  Hi Folks,
 
  If there are no blocking complaints (OSGi?) by Monday (a little longer
  than
  3 days, I realize), I'll mark this as passed and finish the release
  process.
 
  Of course, it's no problem for me to cut another RC, if it's needed.
 
  Have a great weekend!
  Tyler
  I've run into one problem while testing Tika 1.8 with Bixo
 
  It involves a dependency issue involving (of course) Guava, since that
  project loves to break their API :(
 
  The bixo-core jar has these transitive dependencies on various versions
 of
  Guava:
 
  Hadoop - 11.0.2
  Cascading - 14.0.1
  Tika-parsers - 10.0.1
cdm - 17.0
 
  Everyone winds up using version 10.0.1 (note that Tika has a dependency
 on
  cdm, which wants to use 17.0)
 
  The problem is that Hadoop (for any recent version) uses an API from
  Guava's cache implementation that no longer exists:
 
 
 com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
  oader;)Lcom/google/common/cache/LoadingCache;
  java.lang.NoSuchMethodError:
 
 com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
  oader;)Lcom/google/common/cache/LoadingCache;
at
  org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
at
  org.apache.hadoop.io.compress.CodecPool.clinit(CodecPool.java:74)
at
  org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
at
 
 org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutp
  utFormat.java:79)
 
  So what this means is that anyone trying to use Tika with Hadoop will
 need
  to play games with the class loader to get the older version of Guava -
  though that can cause other issues if Hadoop (or Cascading, etc) rely on
  anything that's only in the newer Guava API.
 
  Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3
  years ago. So it seems like we should upgrade to at least 11.0.2
 
  But I don't know if this is enough of an issue to require another RC.
 
  -- Ken
 
  PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to
 track
  this.
 
 
  From: Tyler Palsulich
  Sent: April 13, 2015 10:56:29am PDT
  To: dev@tika.apache.org, u

Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Mattmann, Chris A (3980)
Hey Tim,

Yeah I think you understood it correctly - however, someone
in e.g., healthcare, or at NASA for example, can always grab
the latest trunk SNAPSHOT which works fine and includes Ken’s
TIKA-1606 fix. If we find many users and others complaining
about 1.8, we can always rapidly release 1.9-SNAPSHOT and go
through the VOTE’ing process on that too, right?

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Allison, Timothy B. talli...@mitre.org
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Monday, April 20, 2015 at 8:11 AM
To: dev@tika.apache.org dev@tika.apache.org
Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2

If I understand correctly, if we release rc2, Tika 1.8 will break in
Hadoop clusters across the land?!
Or, Hadoop folks will have to apply a classloading workaround or rebuild
1.8/trunk with small version mod in TIKA-1606 to get Tika to work.

For most Hadoopites, this will be a straightforward fix, and I'm assuming
that's why Ken is not more outspoken against releasing rc2 as is (Ken,
let me know if I'm wrong!).  For other users, though, say, in healthcare,
where code security review is stringent, this could be a real pain, no?

Am I understanding correctly what will happen?  If so, do we really want
to do this?


-Original Message-
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
Sent: Saturday, April 18, 2015 11:48 PM
To: dev@tika.apache.org
Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2

+1 to pushing on Monday - if we have to roll a 1.9 quickly
after, we can :)

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Tyler Palsulich tpalsul...@gmail.com
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Saturday, April 18, 2015 at 11:29 PM
To: dev@tika.apache.org dev@tika.apache.org
Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2

Hi Folks,

If there are no blocking complaints (OSGi?) by Monday (a little longer
than
3 days, I realize), I'll mark this as passed and finish the release
process.

Of course, it's no problem for me to cut another RC, if it's needed.

Have a great weekend!
Tyler
I've run into one problem while testing Tika 1.8 with Bixo

It involves a dependency issue involving (of course) Guava, since that
project loves to break their API :(

The bixo-core jar has these transitive dependencies on various versions
of
Guava:

Hadoop - 11.0.2
Cascading - 14.0.1
Tika-parsers - 10.0.1
cdm - 17.0

Everyone winds up using version 10.0.1 (note that Tika has a dependency
on
cdm, which wants to use 17.0)

The problem is that Hadoop (for any recent version) uses an API from
Guava's cache implementation that no longer exists:

com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/Cache
L
oader;)Lcom/google/common/cache/LoadingCache;
java.lang.NoSuchMethodError:
com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/Cache
L
oader;)Lcom/google/common/cache/LoadingCache;
at
org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
at
org.apache.hadoop.io.compress.CodecPool.clinit(CodecPool.java:74)
at
org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
at
org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOut
p
utFormat.java:79)

So what this means is that anyone trying to use Tika with Hadoop will
need
to play games with the class loader to get the older version of Guava -
though that can cause other issues if Hadoop (or Cascading, etc) rely on
anything that's only in the newer Guava API.

Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3
years ago. So it seems like we should upgrade to at least 11.0.2

But I don't know if this is enough of an issue to require another RC.

-- Ken

PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to
track

RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Ken Krugler

 From: Allison, Timothy B.
 Sent: April 20, 2015 5:11:04am PDT
 To: dev@tika.apache.org
 Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
 
 If I understand correctly, if we release rc2, Tika 1.8 will break in Hadoop 
 clusters across the land?!
 Or, Hadoop folks will have to apply a classloading workaround or rebuild 
 1.8/trunk with small version mod in TIKA-1606 to get Tika to work.
 
 For most Hadoopites, this will be a straightforward fix, and I'm assuming 
 that's why Ken is not more outspoken against releasing rc2 as is (Ken, let me 
 know if I'm wrong!).  

Usually it's straightforward. Though whenever you start manipulating the 
classloader logic, you can get odd results.

E.g. by forcing your job jar's dependencies to show up first, now you can have 
an issue where one of your jars masks an older/newer version that Hadoop needs, 
so the job fails for some other reason.

But yes, I don't feel strongly enough about this to vote -1, as I don't think 
there are that many people using Tika with Hadoop.

For Bixo, I'd defer updating the Tika dependency until another version is 
released.

Don't know about Behemoth - Julien?

-- Ken


 For other users, though, say, in healthcare, where code security review is 
 stringent, this could be a real pain, no?
 
 Am I understanding correctly what will happen?  If so, do we really want to 
 do this?
 
 
 -Original Message-
 From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
 Sent: Saturday, April 18, 2015 11:48 PM
 To: dev@tika.apache.org
 Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2
 
 +1 to pushing on Monday - if we have to roll a 1.9 quickly
 after, we can :)
 
 ++
 Chris Mattmann, Ph.D.
 Chief Architect
 Instrument Software and Science Data Systems Section (398)
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 168-519, Mailstop: 168-527
 Email: chris.a.mattm...@nasa.gov
 WWW:  http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Associate Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++
 
 
 
 
 
 
 -Original Message-
 From: Tyler Palsulich tpalsul...@gmail.com
 Reply-To: dev@tika.apache.org dev@tika.apache.org
 Date: Saturday, April 18, 2015 at 11:29 PM
 To: dev@tika.apache.org dev@tika.apache.org
 Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
 
 Hi Folks,
 
 If there are no blocking complaints (OSGi?) by Monday (a little longer
 than
 3 days, I realize), I'll mark this as passed and finish the release
 process.
 
 Of course, it's no problem for me to cut another RC, if it's needed.
 
 Have a great weekend!
 Tyler
 I've run into one problem while testing Tika 1.8 with Bixo
 
 It involves a dependency issue involving (of course) Guava, since that
 project loves to break their API :(
 
 The bixo-core jar has these transitive dependencies on various versions of
 Guava:
 
 Hadoop - 11.0.2
 Cascading - 14.0.1
 Tika-parsers - 10.0.1
   cdm - 17.0
 
 Everyone winds up using version 10.0.1 (note that Tika has a dependency on
 cdm, which wants to use 17.0)
 
 The problem is that Hadoop (for any recent version) uses an API from
 Guava's cache implementation that no longer exists:
 
 com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
 oader;)Lcom/google/common/cache/LoadingCache;
 java.lang.NoSuchMethodError:
 com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
 oader;)Lcom/google/common/cache/LoadingCache;
   at
 org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
   at
 org.apache.hadoop.io.compress.CodecPool.clinit(CodecPool.java:74)
   at
 org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
   at
 org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutp
 utFormat.java:79)
 
 So what this means is that anyone trying to use Tika with Hadoop will need
 to play games with the class loader to get the older version of Guava -
 though that can cause other issues if Hadoop (or Cascading, etc) rely on
 anything that's only in the newer Guava API.
 
 Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3
 years ago. So it seems like we should upgrade to at least 11.0.2
 
 But I don't know if this is enough of an issue to require another RC.
 
 -- Ken
 
 PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track
 this.
 
 
 From: Tyler Palsulich
 Sent: April 13, 2015 10:56:29am PDT
 To: dev@tika.apache.org, u...@tika.apache.org
 Subject: [VOTE] Apache Tika 1.8 Release Candidate #2
 
 Hi Folks,
 
 A candidate for the Tika 1.8 release is available at:
  https://dist.apache.org/repos/dist/dev/tika/
 
 The release candidate is a zip archive of the sources in:
  http://svn.apache.org/repos/asf/tika/tags/1.8

Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Julien Nioche
and I haven't tested it with Nutch either...

On 20 April 2015 at 15:46, Julien Nioche lists.digitalpeb...@gmail.com
wrote:

 I haven't tested the RC with Behemoth, it will probably have the same
 issue but I'll do like you and defer the update if that's the case.

 On 20 April 2015 at 15:23, Ken Krugler kkrugler_li...@transpac.com
 wrote:


  From: Allison, Timothy B.
  Sent: April 20, 2015 5:11:04am PDT
  To: dev@tika.apache.org
  Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
 
  If I understand correctly, if we release rc2, Tika 1.8 will break in
 Hadoop clusters across the land?!
  Or, Hadoop folks will have to apply a classloading workaround or
 rebuild 1.8/trunk with small version mod in TIKA-1606 to get Tika to work.
 
  For most Hadoopites, this will be a straightforward fix, and I'm
 assuming that's why Ken is not more outspoken against releasing rc2 as is
 (Ken, let me know if I'm wrong!).

 Usually it's straightforward. Though whenever you start manipulating the
 classloader logic, you can get odd results.

 E.g. by forcing your job jar's dependencies to show up first, now you can
 have an issue where one of your jars masks an older/newer version that
 Hadoop needs, so the job fails for some other reason.

 But yes, I don't feel strongly enough about this to vote -1, as I don't
 think there are that many people using Tika with Hadoop.

 For Bixo, I'd defer updating the Tika dependency until another version is
 released.

 Don't know about Behemoth - Julien?

 -- Ken


  For other users, though, say, in healthcare, where code security review
 is stringent, this could be a real pain, no?
 
  Am I understanding correctly what will happen?  If so, do we really
 want to do this?
 
 
  -Original Message-
  From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
  Sent: Saturday, April 18, 2015 11:48 PM
  To: dev@tika.apache.org
  Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2
 
  +1 to pushing on Monday - if we have to roll a 1.9 quickly
  after, we can :)
 
  ++
  Chris Mattmann, Ph.D.
  Chief Architect
  Instrument Software and Science Data Systems Section (398)
  NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
  Office: 168-519, Mailstop: 168-527
  Email: chris.a.mattm...@nasa.gov
  WWW:  http://sunset.usc.edu/~mattmann/
  ++
  Adjunct Associate Professor, Computer Science Department
  University of Southern California, Los Angeles, CA 90089 USA
  ++
 
 
 
 
 
 
  -Original Message-
  From: Tyler Palsulich tpalsul...@gmail.com
  Reply-To: dev@tika.apache.org dev@tika.apache.org
  Date: Saturday, April 18, 2015 at 11:29 PM
  To: dev@tika.apache.org dev@tika.apache.org
  Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
 
  Hi Folks,
 
  If there are no blocking complaints (OSGi?) by Monday (a little longer
  than
  3 days, I realize), I'll mark this as passed and finish the release
  process.
 
  Of course, it's no problem for me to cut another RC, if it's needed.
 
  Have a great weekend!
  Tyler
  I've run into one problem while testing Tika 1.8 with Bixo
 
  It involves a dependency issue involving (of course) Guava, since that
  project loves to break their API :(
 
  The bixo-core jar has these transitive dependencies on various
 versions of
  Guava:
 
  Hadoop - 11.0.2
  Cascading - 14.0.1
  Tika-parsers - 10.0.1
cdm - 17.0
 
  Everyone winds up using version 10.0.1 (note that Tika has a
 dependency on
  cdm, which wants to use 17.0)
 
  The problem is that Hadoop (for any recent version) uses an API from
  Guava's cache implementation that no longer exists:
 
 
 com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
  oader;)Lcom/google/common/cache/LoadingCache;
  java.lang.NoSuchMethodError:
 
 com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
  oader;)Lcom/google/common/cache/LoadingCache;
at
  org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
at
  org.apache.hadoop.io.compress.CodecPool.clinit(CodecPool.java:74)
at
  org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
at
 
 org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutp
  utFormat.java:79)
 
  So what this means is that anyone trying to use Tika with Hadoop will
 need
  to play games with the class loader to get the older version of Guava -
  though that can cause other issues if Hadoop (or Cascading, etc) rely
 on
  anything that's only in the newer Guava API.
 
  Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3
  years ago. So it seems like we should upgrade to at least 11.0.2
 
  But I don't know if this is enough of an issue to require another RC.
 
  -- Ken
 
  PS - I've created https://issues.apache.org/jira

Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Julien Nioche
Both Nutch and Behemoth declare Hadoop 1.2.1 as a dependency and since it
does not use Guava they won't have the same issue. However, did is just the
default version and some people use them on Hadoop 2.x, in which case
they'll might need to find a workaround

On 20 April 2015 at 15:56, Julien Nioche lists.digitalpeb...@gmail.com
wrote:

 and I haven't tested it with Nutch either...

 On 20 April 2015 at 15:46, Julien Nioche lists.digitalpeb...@gmail.com
 wrote:

 I haven't tested the RC with Behemoth, it will probably have the same
 issue but I'll do like you and defer the update if that's the case.

 On 20 April 2015 at 15:23, Ken Krugler kkrugler_li...@transpac.com
 wrote:


  From: Allison, Timothy B.
  Sent: April 20, 2015 5:11:04am PDT
  To: dev@tika.apache.org
  Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
 
  If I understand correctly, if we release rc2, Tika 1.8 will break in
 Hadoop clusters across the land?!
  Or, Hadoop folks will have to apply a classloading workaround or
 rebuild 1.8/trunk with small version mod in TIKA-1606 to get Tika to work.
 
  For most Hadoopites, this will be a straightforward fix, and I'm
 assuming that's why Ken is not more outspoken against releasing rc2 as is
 (Ken, let me know if I'm wrong!).

 Usually it's straightforward. Though whenever you start manipulating the
 classloader logic, you can get odd results.

 E.g. by forcing your job jar's dependencies to show up first, now you
 can have an issue where one of your jars masks an older/newer version that
 Hadoop needs, so the job fails for some other reason.

 But yes, I don't feel strongly enough about this to vote -1, as I don't
 think there are that many people using Tika with Hadoop.

 For Bixo, I'd defer updating the Tika dependency until another version
 is released.

 Don't know about Behemoth - Julien?

 -- Ken


  For other users, though, say, in healthcare, where code security
 review is stringent, this could be a real pain, no?
 
  Am I understanding correctly what will happen?  If so, do we really
 want to do this?
 
 
  -Original Message-
  From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
  Sent: Saturday, April 18, 2015 11:48 PM
  To: dev@tika.apache.org
  Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2
 
  +1 to pushing on Monday - if we have to roll a 1.9 quickly
  after, we can :)
 
  ++
  Chris Mattmann, Ph.D.
  Chief Architect
  Instrument Software and Science Data Systems Section (398)
  NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
  Office: 168-519, Mailstop: 168-527
  Email: chris.a.mattm...@nasa.gov
  WWW:  http://sunset.usc.edu/~mattmann/
  ++
  Adjunct Associate Professor, Computer Science Department
  University of Southern California, Los Angeles, CA 90089 USA
  ++
 
 
 
 
 
 
  -Original Message-
  From: Tyler Palsulich tpalsul...@gmail.com
  Reply-To: dev@tika.apache.org dev@tika.apache.org
  Date: Saturday, April 18, 2015 at 11:29 PM
  To: dev@tika.apache.org dev@tika.apache.org
  Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
 
  Hi Folks,
 
  If there are no blocking complaints (OSGi?) by Monday (a little longer
  than
  3 days, I realize), I'll mark this as passed and finish the release
  process.
 
  Of course, it's no problem for me to cut another RC, if it's needed.
 
  Have a great weekend!
  Tyler
  I've run into one problem while testing Tika 1.8 with Bixo
 
  It involves a dependency issue involving (of course) Guava, since that
  project loves to break their API :(
 
  The bixo-core jar has these transitive dependencies on various
 versions of
  Guava:
 
  Hadoop - 11.0.2
  Cascading - 14.0.1
  Tika-parsers - 10.0.1
cdm - 17.0
 
  Everyone winds up using version 10.0.1 (note that Tika has a
 dependency on
  cdm, which wants to use 17.0)
 
  The problem is that Hadoop (for any recent version) uses an API from
  Guava's cache implementation that no longer exists:
 
 
 com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
  oader;)Lcom/google/common/cache/LoadingCache;
  java.lang.NoSuchMethodError:
 
 com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
  oader;)Lcom/google/common/cache/LoadingCache;
at
  org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
at
  org.apache.hadoop.io.compress.CodecPool.clinit(CodecPool.java:74)
at
  org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
at
 
 org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutp
  utFormat.java:79)
 
  So what this means is that anyone trying to use Tika with Hadoop will
 need
  to play games with the class loader to get the older version of Guava
 -
  though that can cause other issues if Hadoop

Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Tyler Palsulich
Thank you, Everyone! I'll move forward now.

Lewis, KEYS are here: https://people.apache.org/keys/group/tika.asc.

Of course, I'm also +1.

Tyler

On Mon, Apr 20, 2015 at 3:47 PM, Lewis John Mcgibbney 
lewis.mcgibb...@gmail.com wrote:

 Hi Folks,

 On Thu, Apr 16, 2015 at 2:42 PM, dev-digest-h...@tika.apache.org wrote:

 
   Hi Folks,
  
   A candidate for the Tika 1.8 release is available at:
 https://dist.apache.org/repos/dist/dev/tika/
  
   The release candidate is a zip archive of the sources in:
 http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
  
   The SHA1 checksum of the archive is
 5e22fee9079370398472e59082d171ae2d7fdd31.
  
   In addition, a staged maven repository is available here:
  
 https://repository.apache.org/content/repositories/orgapachetika-1009
  
   Please vote on releasing this package as Apache Tika 1.8. The vote is
  open
   for the next 72 hours and passes if a majority of at least three +1
 Tika
   PMC votes are cast.
 


 Where is the KEYS?
 All signatures are fine.
 Test are A OK.
 The remaining issue is with the Tika 1616 issue which was patched and
 committed to trunk.
 IMHO this is not a blocker. We could probably release 1.9 in a shorter
 release cycle to accomodate the change


  
   [X] +1 Release this package as Apache Tika 1.8


 I am +1 for releasing this as 1.8.
 Lewis



RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Allison, Timothy B.
Um...Ok.  If no one else is concerned...  off we go?

-Original Message-
From: Julien Nioche [mailto:lists.digitalpeb...@gmail.com] 
Sent: Monday, April 20, 2015 10:56 AM
To: dev@tika.apache.org
Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2

and I haven't tested it with Nutch either...

On 20 April 2015 at 15:46, Julien Nioche lists.digitalpeb...@gmail.com
wrote:

 I haven't tested the RC with Behemoth, it will probably have the same
 issue but I'll do like you and defer the update if that's the case.

 On 20 April 2015 at 15:23, Ken Krugler kkrugler_li...@transpac.com
 wrote:


  From: Allison, Timothy B.
  Sent: April 20, 2015 5:11:04am PDT
  To: dev@tika.apache.org
  Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
 
  If I understand correctly, if we release rc2, Tika 1.8 will break in
 Hadoop clusters across the land?!
  Or, Hadoop folks will have to apply a classloading workaround or
 rebuild 1.8/trunk with small version mod in TIKA-1606 to get Tika to work.
 
  For most Hadoopites, this will be a straightforward fix, and I'm
 assuming that's why Ken is not more outspoken against releasing rc2 as is
 (Ken, let me know if I'm wrong!).

 Usually it's straightforward. Though whenever you start manipulating the
 classloader logic, you can get odd results.

 E.g. by forcing your job jar's dependencies to show up first, now you can
 have an issue where one of your jars masks an older/newer version that
 Hadoop needs, so the job fails for some other reason.

 But yes, I don't feel strongly enough about this to vote -1, as I don't
 think there are that many people using Tika with Hadoop.

 For Bixo, I'd defer updating the Tika dependency until another version is
 released.

 Don't know about Behemoth - Julien?

 -- Ken


  For other users, though, say, in healthcare, where code security review
 is stringent, this could be a real pain, no?
 
  Am I understanding correctly what will happen?  If so, do we really
 want to do this?
 
 
  -Original Message-
  From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
  Sent: Saturday, April 18, 2015 11:48 PM
  To: dev@tika.apache.org
  Subject: Re: [VOTE] Apache Tika 1.8 Release Candidate #2
 
  +1 to pushing on Monday - if we have to roll a 1.9 quickly
  after, we can :)
 
  ++
  Chris Mattmann, Ph.D.
  Chief Architect
  Instrument Software and Science Data Systems Section (398)
  NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
  Office: 168-519, Mailstop: 168-527
  Email: chris.a.mattm...@nasa.gov
  WWW:  http://sunset.usc.edu/~mattmann/
  ++
  Adjunct Associate Professor, Computer Science Department
  University of Southern California, Los Angeles, CA 90089 USA
  ++
 
 
 
 
 
 
  -Original Message-
  From: Tyler Palsulich tpalsul...@gmail.com
  Reply-To: dev@tika.apache.org dev@tika.apache.org
  Date: Saturday, April 18, 2015 at 11:29 PM
  To: dev@tika.apache.org dev@tika.apache.org
  Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
 
  Hi Folks,
 
  If there are no blocking complaints (OSGi?) by Monday (a little longer
  than
  3 days, I realize), I'll mark this as passed and finish the release
  process.
 
  Of course, it's no problem for me to cut another RC, if it's needed.
 
  Have a great weekend!
  Tyler
  I've run into one problem while testing Tika 1.8 with Bixo
 
  It involves a dependency issue involving (of course) Guava, since that
  project loves to break their API :(
 
  The bixo-core jar has these transitive dependencies on various
 versions of
  Guava:
 
  Hadoop - 11.0.2
  Cascading - 14.0.1
  Tika-parsers - 10.0.1
cdm - 17.0
 
  Everyone winds up using version 10.0.1 (note that Tika has a
 dependency on
  cdm, which wants to use 17.0)
 
  The problem is that Hadoop (for any recent version) uses an API from
  Guava's cache implementation that no longer exists:
 
 
 com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
  oader;)Lcom/google/common/cache/LoadingCache;
  java.lang.NoSuchMethodError:
 
 com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
  oader;)Lcom/google/common/cache/LoadingCache;
at
  org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
at
  org.apache.hadoop.io.compress.CodecPool.clinit(CodecPool.java:74)
at
  org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
at
 
 org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutp
  utFormat.java:79)
 
  So what this means is that anyone trying to use Tika with Hadoop will
 need
  to play games with the class loader to get the older version of Guava -
  though that can cause other issues if Hadoop (or Cascading, etc) rely
 on
  anything that's only in the newer Guava API.
 
  Guava 1.0.01

Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Lewis John Mcgibbney
Hi Folks,

On Thu, Apr 16, 2015 at 2:42 PM, dev-digest-h...@tika.apache.org wrote:


  Hi Folks,
 
  A candidate for the Tika 1.8 release is available at:
https://dist.apache.org/repos/dist/dev/tika/
 
  The release candidate is a zip archive of the sources in:
http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
 
  The SHA1 checksum of the archive is
5e22fee9079370398472e59082d171ae2d7fdd31.
 
  In addition, a staged maven repository is available here:
https://repository.apache.org/content/repositories/orgapachetika-1009
 
  Please vote on releasing this package as Apache Tika 1.8. The vote is
 open
  for the next 72 hours and passes if a majority of at least three +1 Tika
  PMC votes are cast.



Where is the KEYS?
All signatures are fine.
Test are A OK.
The remaining issue is with the Tika 1616 issue which was patched and
committed to trunk.
IMHO this is not a blocker. We could probably release 1.9 in a shorter
release cycle to accomodate the change


 
  [X] +1 Release this package as Apache Tika 1.8


I am +1 for releasing this as 1.8.
Lewis


Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-19 Thread Tyler Palsulich
Hi Ken,

Sorry for the delayed response. No, that patch is not included in this RC
(as I think you know, given your resolution of TIKA-1606).

Have a good night,
Tyler

On Sun, Apr 19, 2015 at 10:49 AM, Ken Krugler kkrugler_li...@transpac.com
wrote:

 Hi Tyler,

 Does this include Lewis's fix for
 https://issues.apache.org/jira/browse/TIKA-1606?

 It's a simple change (bumping the Guava version), but as seen this can
 have unexpected consequences.

 I'm fine either way.

 -- Ken

  From: Tyler Palsulich
  Sent: April 18, 2015 8:29:22pm PDT
  To: dev@tika.apache.org
  Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
 
  Hi Folks,
 
  If there are no blocking complaints (OSGi?) by Monday (a little longer
 than
  3 days, I realize), I'll mark this as passed and finish the release
 process.
 
  Of course, it's no problem for me to cut another RC, if it's needed.
 
  Have a great weekend!
  Tyler
  I've run into one problem while testing Tika 1.8 with Bixo
 
  It involves a dependency issue involving (of course) Guava, since that
  project loves to break their API :(
 
  The bixo-core jar has these transitive dependencies on various versions
 of
  Guava:
 
  Hadoop - 11.0.2
  Cascading - 14.0.1
  Tika-parsers - 10.0.1
 cdm - 17.0
 
  Everyone winds up using version 10.0.1 (note that Tika has a dependency
 on
  cdm, which wants to use 17.0)
 
  The problem is that Hadoop (for any recent version) uses an API from
  Guava's cache implementation that no longer exists:
 
 
 com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
  java.lang.NoSuchMethodError:
 
 com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
 at
  org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
 at
  org.apache.hadoop.io.compress.CodecPool.clinit(CodecPool.java:74)
 at
  org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
 at
 
 org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:79)
 
  So what this means is that anyone trying to use Tika with Hadoop will
 need
  to play games with the class loader to get the older version of Guava -
  though that can cause other issues if Hadoop (or Cascading, etc) rely on
  anything that's only in the newer Guava API.
 
  Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3
  years ago. So it seems like we should upgrade to at least 11.0.2
 
  But I don't know if this is enough of an issue to require another RC.
 
  -- Ken
 
  PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to
 track
  this.
 
 
  From: Tyler Palsulich
  Sent: April 13, 2015 10:56:29am PDT
  To: dev@tika.apache.org, u...@tika.apache.org
  Subject: [VOTE] Apache Tika 1.8 Release Candidate #2
 
  Hi Folks,
 
  A candidate for the Tika 1.8 release is available at:
   https://dist.apache.org/repos/dist/dev/tika/
 
  The release candidate is a zip archive of the sources in:
   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
 
  The SHA1 checksum of the archive is
   5e22fee9079370398472e59082d171ae2d7fdd31.
 
  In addition, a staged maven repository is available here:
   https://repository.apache.org/content/repositories/orgapachetika-1009
 
  Please vote on releasing this package as Apache Tika 1.8. The vote is
  open for the next 72 hours and passes if a majority of at least three +1
  Tika PMC votes are cast.
 
  [ ] +1 Release this package as Apache Tika 1.8
  [ ] ±0 I don't object to this release, but I haven't checked it
  [ ] -1 Do not release this package because...
 
  Thanks,
  Tyler


 --
 Ken Krugler
 +1 530-210-6378
 http://www.scaleunlimited.com
 custom big data solutions  training
 Hadoop, Cascading, Cassandra  Solr








RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-19 Thread Ken Krugler
Hi Tyler,

Does this include Lewis's fix for 
https://issues.apache.org/jira/browse/TIKA-1606?

It's a simple change (bumping the Guava version), but as seen this can have 
unexpected consequences.

I'm fine either way.

-- Ken

 From: Tyler Palsulich
 Sent: April 18, 2015 8:29:22pm PDT
 To: dev@tika.apache.org
 Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2
 
 Hi Folks,
 
 If there are no blocking complaints (OSGi?) by Monday (a little longer than
 3 days, I realize), I'll mark this as passed and finish the release process.
 
 Of course, it's no problem for me to cut another RC, if it's needed.
 
 Have a great weekend!
 Tyler
 I've run into one problem while testing Tika 1.8 with Bixo
 
 It involves a dependency issue involving (of course) Guava, since that
 project loves to break their API :(
 
 The bixo-core jar has these transitive dependencies on various versions of
 Guava:
 
 Hadoop - 11.0.2
 Cascading - 14.0.1
 Tika-parsers - 10.0.1
cdm - 17.0
 
 Everyone winds up using version 10.0.1 (note that Tika has a dependency on
 cdm, which wants to use 17.0)
 
 The problem is that Hadoop (for any recent version) uses an API from
 Guava's cache implementation that no longer exists:
 
 com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
 java.lang.NoSuchMethodError:
 com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
at
 org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
at
 org.apache.hadoop.io.compress.CodecPool.clinit(CodecPool.java:74)
at
 org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
at
 org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:79)
 
 So what this means is that anyone trying to use Tika with Hadoop will need
 to play games with the class loader to get the older version of Guava -
 though that can cause other issues if Hadoop (or Cascading, etc) rely on
 anything that's only in the newer Guava API.
 
 Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3
 years ago. So it seems like we should upgrade to at least 11.0.2
 
 But I don't know if this is enough of an issue to require another RC.
 
 -- Ken
 
 PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track
 this.
 
 
 From: Tyler Palsulich
 Sent: April 13, 2015 10:56:29am PDT
 To: dev@tika.apache.org, u...@tika.apache.org
 Subject: [VOTE] Apache Tika 1.8 Release Candidate #2
 
 Hi Folks,
 
 A candidate for the Tika 1.8 release is available at:
  https://dist.apache.org/repos/dist/dev/tika/
 
 The release candidate is a zip archive of the sources in:
  http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
 
 The SHA1 checksum of the archive is
  5e22fee9079370398472e59082d171ae2d7fdd31.
 
 In addition, a staged maven repository is available here:
  https://repository.apache.org/content/repositories/orgapachetika-1009
 
 Please vote on releasing this package as Apache Tika 1.8. The vote is
 open for the next 72 hours and passes if a majority of at least three +1
 Tika PMC votes are cast.
 
 [ ] +1 Release this package as Apache Tika 1.8
 [ ] ±0 I don't object to this release, but I haven't checked it
 [ ] -1 Do not release this package because...
 
 Thanks,
 Tyler


--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions  training
Hadoop, Cascading, Cassandra  Solr







RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-18 Thread Tyler Palsulich
Hi Folks,

If there are no blocking complaints (OSGi?) by Monday (a little longer than
3 days, I realize), I'll mark this as passed and finish the release process.

Of course, it's no problem for me to cut another RC, if it's needed.

Have a great weekend!
Tyler
I've run into one problem while testing Tika 1.8 with Bixo

It involves a dependency issue involving (of course) Guava, since that
project loves to break their API :(

The bixo-core jar has these transitive dependencies on various versions of
Guava:

Hadoop - 11.0.2
Cascading - 14.0.1
Tika-parsers - 10.0.1
cdm - 17.0

Everyone winds up using version 10.0.1 (note that Tika has a dependency on
cdm, which wants to use 17.0)

The problem is that Hadoop (for any recent version) uses an API from
Guava's cache implementation that no longer exists:

com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
java.lang.NoSuchMethodError:
com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
at
org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
at
org.apache.hadoop.io.compress.CodecPool.clinit(CodecPool.java:74)
at
org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
at
org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:79)

So what this means is that anyone trying to use Tika with Hadoop will need
to play games with the class loader to get the older version of Guava -
though that can cause other issues if Hadoop (or Cascading, etc) rely on
anything that's only in the newer Guava API.

Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3
years ago. So it seems like we should upgrade to at least 11.0.2

But I don't know if this is enough of an issue to require another RC.

-- Ken

PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track
this.


 From: Tyler Palsulich
 Sent: April 13, 2015 10:56:29am PDT
 To: dev@tika.apache.org, u...@tika.apache.org
 Subject: [VOTE] Apache Tika 1.8 Release Candidate #2

 Hi Folks,

 A candidate for the Tika 1.8 release is available at:
   https://dist.apache.org/repos/dist/dev/tika/

 The release candidate is a zip archive of the sources in:
   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/

 The SHA1 checksum of the archive is
   5e22fee9079370398472e59082d171ae2d7fdd31.

 In addition, a staged maven repository is available here:
   https://repository.apache.org/content/repositories/orgapachetika-1009

 Please vote on releasing this package as Apache Tika 1.8. The vote is
open for the next 72 hours and passes if a majority of at least three +1
Tika PMC votes are cast.

 [ ] +1 Release this package as Apache Tika 1.8
 [ ] ±0 I don't object to this release, but I haven't checked it
 [ ] -1 Do not release this package because...

 Thanks,
 Tyler


--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions  training
Hadoop, Cascading, Cassandra  Solr


Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-18 Thread Mattmann, Chris A (3980)
+1 to pushing on Monday - if we have to roll a 1.9 quickly
after, we can :)

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Tyler Palsulich tpalsul...@gmail.com
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Saturday, April 18, 2015 at 11:29 PM
To: dev@tika.apache.org dev@tika.apache.org
Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2

Hi Folks,

If there are no blocking complaints (OSGi?) by Monday (a little longer
than
3 days, I realize), I'll mark this as passed and finish the release
process.

Of course, it's no problem for me to cut another RC, if it's needed.

Have a great weekend!
Tyler
I've run into one problem while testing Tika 1.8 with Bixo

It involves a dependency issue involving (of course) Guava, since that
project loves to break their API :(

The bixo-core jar has these transitive dependencies on various versions of
Guava:

Hadoop - 11.0.2
Cascading - 14.0.1
Tika-parsers - 10.0.1
cdm - 17.0

Everyone winds up using version 10.0.1 (note that Tika has a dependency on
cdm, which wants to use 17.0)

The problem is that Hadoop (for any recent version) uses an API from
Guava's cache implementation that no longer exists:

com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
oader;)Lcom/google/common/cache/LoadingCache;
java.lang.NoSuchMethodError:
com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheL
oader;)Lcom/google/common/cache/LoadingCache;
at
org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
at
org.apache.hadoop.io.compress.CodecPool.clinit(CodecPool.java:74)
at
org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
at
org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutp
utFormat.java:79)

So what this means is that anyone trying to use Tika with Hadoop will need
to play games with the class loader to get the older version of Guava -
though that can cause other issues if Hadoop (or Cascading, etc) rely on
anything that's only in the newer Guava API.

Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3
years ago. So it seems like we should upgrade to at least 11.0.2

But I don't know if this is enough of an issue to require another RC.

-- Ken

PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track
this.


 From: Tyler Palsulich
 Sent: April 13, 2015 10:56:29am PDT
 To: dev@tika.apache.org, u...@tika.apache.org
 Subject: [VOTE] Apache Tika 1.8 Release Candidate #2

 Hi Folks,

 A candidate for the Tika 1.8 release is available at:
   https://dist.apache.org/repos/dist/dev/tika/

 The release candidate is a zip archive of the sources in:
   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/

 The SHA1 checksum of the archive is
   5e22fee9079370398472e59082d171ae2d7fdd31.

 In addition, a staged maven repository is available here:
   https://repository.apache.org/content/repositories/orgapachetika-1009

 Please vote on releasing this package as Apache Tika 1.8. The vote is
open for the next 72 hours and passes if a majority of at least three +1
Tika PMC votes are cast.

 [ ] +1 Release this package as Apache Tika 1.8
 [ ] ±0 I don't object to this release, but I haven't checked it
 [ ] -1 Do not release this package because...

 Thanks,
 Tyler


--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions  training
Hadoop, Cascading, Cassandra  Solr



Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-15 Thread Konstantin Gribov
Hi, folks.

All tests pass, checksum and gpg signature for tika-1.8-src.zip are fine.
Checked on ArchLinux x86_64, openjdk 7u75, w/ tesseract.

Thank you, Tyler.

[x] +1 Release this package as Apache Tika 1.8
[ ] ±0 I don't object to this release, but I haven't checked it
[ ] -1 Do not release this package because...

-- 
Best regards,
Konstantin Gribov

пн, 13 апр. 2015 г. в 20:56, Tyler Palsulich tpalsul...@apache.org:

 Hi Folks,

 A candidate for the Tika 1.8 release is available at:
   https://dist.apache.org/repos/dist/dev/tika/

 The release candidate is a zip archive of the sources in:
   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/

 The SHA1 checksum of the archive is
   5e22fee9079370398472e59082d171ae2d7fdd31.

 In addition, a staged maven repository is available here:
   https://repository.apache.org/content/repositories/orgapachetika-1009

 Please vote on releasing this package as Apache Tika 1.8. The vote is open
 for the next 72 hours and passes if a majority of at least three +1 Tika
 PMC votes are cast.

 [ ] +1 Release this package as Apache Tika 1.8
 [ ] ±0 I don't object to this release, but I haven't checked it
 [ ] -1 Do not release this package because...

 Thanks,
 Tyler



RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-15 Thread Ken Krugler
I've run into one problem while testing Tika 1.8 with Bixo

It involves a dependency issue involving (of course) Guava, since that project 
loves to break their API :(

The bixo-core jar has these transitive dependencies on various versions of 
Guava:

Hadoop - 11.0.2
Cascading - 14.0.1
Tika-parsers - 10.0.1
cdm - 17.0

Everyone winds up using version 10.0.1 (note that Tika has a dependency on cdm, 
which wants to use 17.0)

The problem is that Hadoop (for any recent version) uses an API from Guava's 
cache implementation that no longer exists:

com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
java.lang.NoSuchMethodError: 
com.google.common.cache.CacheBuilder.build(Lcom/google/common/cache/CacheLoader;)Lcom/google/common/cache/LoadingCache;
at 
org.apache.hadoop.io.compress.CodecPool.createCache(CodecPool.java:62)
at org.apache.hadoop.io.compress.CodecPool.clinit(CodecPool.java:74)
at 
org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:1272)
at 
org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:79)

So what this means is that anyone trying to use Tika with Hadoop will need to 
play games with the class loader to get the older version of Guava - though 
that can cause other issues if Hadoop (or Cascading, etc) rely on anything 
that's only in the newer Guava API.

Guava 1.0.01 was released about 3.5 years ago; 11.0.2 was from about 3 years 
ago. So it seems like we should upgrade to at least 11.0.2

But I don't know if this is enough of an issue to require another RC.

-- Ken

PS - I've created https://issues.apache.org/jira/browse/TIKA-1606 to track this.


 From: Tyler Palsulich
 Sent: April 13, 2015 10:56:29am PDT
 To: dev@tika.apache.org, u...@tika.apache.org
 Subject: [VOTE] Apache Tika 1.8 Release Candidate #2
 
 Hi Folks,
 
 A candidate for the Tika 1.8 release is available at:
   https://dist.apache.org/repos/dist/dev/tika/
 
 The release candidate is a zip archive of the sources in:
   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/
 
 The SHA1 checksum of the archive is
   5e22fee9079370398472e59082d171ae2d7fdd31.
 
 In addition, a staged maven repository is available here:
   https://repository.apache.org/content/repositories/orgapachetika-1009
 
 Please vote on releasing this package as Apache Tika 1.8. The vote is open 
 for the next 72 hours and passes if a majority of at least three +1 Tika PMC 
 votes are cast.
 
 [ ] +1 Release this package as Apache Tika 1.8
 [ ] ±0 I don't object to this release, but I haven't checked it
 [ ] -1 Do not release this package because...
 
 Thanks,
 Tyler


--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions  training
Hadoop, Cascading, Cassandra  Solr







Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-15 Thread Oleg Tikhonov
Hi Tyler,

good job, indeed !!!

[x] +1 Release this package as Apache Tika 1.8

On Wed, Apr 15, 2015 at 8:22 AM, Mattmann, Chris A (3980) 
chris.a.mattm...@jpl.nasa.gov wrote:

 Thanks Tyler! +1 from me:

 SIGS, checksums check out:


 [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc
 tika 1.8-src https://dist.apache.org/repos/dist/dev/tika/

   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current

  Dload  Upload   Total   SpentLeft
 Speed

 100 69.2M  100 69.2M0 0  1524k  0  0:00:46  0:00:46 --:--:--
 1661k

   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current

  Dload  Upload   Total   SpentLeft
 Speed

 100   473  100   4730 0874  0 --:--:-- --:--:-- --:--:--
  874

   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current

  Dload  Upload   Total   SpentLeft
 Speed

 10033  100330 0 62  0 --:--:-- --:--:-- --:--:--
   62

 [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc
 tika-app 1.8 https://dist.apache.org/repos/dist/dev/tika/

   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current

  Dload  Upload   Total   SpentLeft
 Speed

 100 44.0M  100 44.0M0 0  1742k  0  0:00:25  0:00:25 --:--:--
 1825k

   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current

  Dload  Upload   Total   SpentLeft
 Speed

 100   473  100   4730 0922  0 --:--:-- --:--:-- --:--:--
  922

   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current

  Dload  Upload   Total   SpentLeft
 Speed

 10033  100330 0 63  0 --:--:-- --:--:-- --:--:--
   63

 [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc
 tika-server 1.8 https://dist.apache.org/repos/dist/dev/tika/

   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current

  Dload  Upload   Total   SpentLeft
 Speed

 100 48.3M  100 48.3M0 0  1379k  0  0:00:35  0:00:35 --:--:--
 1569k

   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current

  Dload  Upload   Total   SpentLeft
 Speed

 100   473  100   4730 0891  0 --:--:-- --:--:-- --:--:--
  892

   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current

  Dload  Upload   Total   SpentLeft
 Speed

 10033  100330 0 62  0 --:--:-- --:--:-- --:--:--
   62

 [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann%


 [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/verify_gpg_sigs

 Verifying Signature for file tika-1.8-src.zip.asc

 gpg: Signature made Mon Apr 13 13:46:39 2015 EDT using RSA key ID D4F10117

 gpg: Good signature from Tyler Palsulich tpalsul...@apache.org

 gpg: WARNING: This key is not certified with a trusted signature!

 gpg:  There is no indication that the signature belongs to the
 owner.

 Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4  183E 8810 BB19 D4F1 0117

 Verifying Signature for file tika-app-1.8.jar.asc

 gpg: Signature made Mon Apr 13 13:43:13 2015 EDT using RSA key ID D4F10117

 gpg: Good signature from Tyler Palsulich tpalsul...@apache.org

 gpg: WARNING: This key is not certified with a trusted signature!

 gpg:  There is no indication that the signature belongs to the
 owner.

 Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4  183E 8810 BB19 D4F1 0117

 Verifying Signature for file tika-server-1.8.jar.asc

 gpg: Signature made Mon Apr 13 13:45:00 2015 EDT using RSA key ID D4F10117

 gpg: Good signature from Tyler Palsulich tpalsul...@apache.org

 gpg: WARNING: This key is not certified with a trusted signature!

 gpg:  There is no indication that the signature belongs to the
 owner.

 Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4  183E 8810 BB19 D4F1 0117

 [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann%
 $HOME/bin/verify_md5_checksums

 md5sum: stat '*.tar.gz': No such file or directory

 md5sum: stat '*.bz2': No such file or directory

 md5sum: stat '*.tgz': No such file or directory

 tika-1.8-src.zip: OK

 [chipotle:~/tmp/apache-tika-1.8-rc2] mattmann%

 Cheers!

 Chris

 
 From: Tyler Palsulich [tpalsul...@apache.org]
 Sent: Monday, April 13, 2015 10:56 AM
 To: dev@tika.apache.org; u...@tika.apache.org
 Subject: [VOTE] Apache Tika 1.8 Release Candidate #2

 Hi Folks,

 A candidate for the Tika 1.8 release is available at:
   https://dist.apache.org/repos/dist/dev/tika/

 The release candidate is a zip archive of the sources in:
   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2

Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-14 Thread Julien Nioche
Hi Tim

Great to hear that you managed to use the dataset from CommonCrawl. Thanks!

Julien

On 14 April 2015 at 14:15, Allison, Timothy B. talli...@mitre.org wrote:

 +1

 Thank you, Tyler!

 Apologies to Hong-Thai and community for not recognizing the severity of
 TIKA-1600 when I voted in favor of rc1!

 Details...

 I reran against govdocs1, and there aren't any major surprises.

 On our Rackspace vm, I  _finally_ unzipped the Common Crawl slice that
 Julien Nioche created for us, and I ran against that as well.  That turned
 up TIKA-1605 and another exceedingly rare NPE in the PDFParser.  I don't
 think either of these are blockers, and they're now fixed in trunk.

 There are slightly fewer metadata values for some jpegs.  For the one file
 that I manually reviewed, 1.8-rc was missing these values (that were
 available in 1.7):

 JPEG quality
 IPTC-NAA record
 Plug-in 1 Data

 Comparison reports are available here (much more work remains to be done
 on tika-eval):

 https://github.com/tballison/share/tree/master/tika_comparisons

 
 From: Tyler Palsulich tpalsul...@apache.org
 Sent: Monday, April 13, 2015 1:56 PM
 To: dev@tika.apache.org; u...@tika.apache.org
 Subject: [VOTE] Apache Tika 1.8 Release Candidate #2

 Hi Folks,

 A candidate for the Tika 1.8 release is available at:
   https://dist.apache.org/repos/dist/dev/tika/

 The release candidate is a zip archive of the sources in:
   http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/

 The SHA1 checksum of the archive is
   5e22fee9079370398472e59082d171ae2d7fdd31.

 In addition, a staged maven repository is available here:
   https://repository.apache.org/content/repositories/orgapachetika-1009

 Please vote on releasing this package as Apache Tika 1.8. The vote is open
 for the next 72 hours and passes if a majority of at least three +1 Tika
 PMC votes are cast.

 [ ] +1 Release this package as Apache Tika 1.8
 [ ] ±0 I don't object to this release, but I haven't checked it
 [ ] -1 Do not release this package because...

 Thanks,
 Tyler




-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble


RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-14 Thread Mattmann, Chris A (3980)
Thanks Tyler! +1 from me:

SIGS, checksums check out:


[chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc tika 
1.8-src https://dist.apache.org/repos/dist/dev/tika/

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

100 69.2M  100 69.2M0 0  1524k  0  0:00:46  0:00:46 --:--:-- 1661k

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

100   473  100   4730 0874  0 --:--:-- --:--:-- --:--:--   874

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

10033  100330 0 62  0 --:--:-- --:--:-- --:--:--62

[chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc 
tika-app 1.8 https://dist.apache.org/repos/dist/dev/tika/

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

100 44.0M  100 44.0M0 0  1742k  0  0:00:25  0:00:25 --:--:-- 1825k

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

100   473  100   4730 0922  0 --:--:-- --:--:-- --:--:--   922

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

10033  100330 0 63  0 --:--:-- --:--:-- --:--:--63

[chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/stage_apache_rc 
tika-server 1.8 https://dist.apache.org/repos/dist/dev/tika/

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

100 48.3M  100 48.3M0 0  1379k  0  0:00:35  0:00:35 --:--:-- 1569k

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

100   473  100   4730 0891  0 --:--:-- --:--:-- --:--:--   892

  % Total% Received % Xferd  Average Speed   TimeTime Time  Current

 Dload  Upload   Total   SpentLeft  Speed

10033  100330 0 62  0 --:--:-- --:--:-- --:--:--62

[chipotle:~/tmp/apache-tika-1.8-rc2] mattmann%


[chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/verify_gpg_sigs

Verifying Signature for file tika-1.8-src.zip.asc

gpg: Signature made Mon Apr 13 13:46:39 2015 EDT using RSA key ID D4F10117

gpg: Good signature from Tyler Palsulich tpalsul...@apache.org

gpg: WARNING: This key is not certified with a trusted signature!

gpg:  There is no indication that the signature belongs to the owner.

Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4  183E 8810 BB19 D4F1 0117

Verifying Signature for file tika-app-1.8.jar.asc

gpg: Signature made Mon Apr 13 13:43:13 2015 EDT using RSA key ID D4F10117

gpg: Good signature from Tyler Palsulich tpalsul...@apache.org

gpg: WARNING: This key is not certified with a trusted signature!

gpg:  There is no indication that the signature belongs to the owner.

Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4  183E 8810 BB19 D4F1 0117

Verifying Signature for file tika-server-1.8.jar.asc

gpg: Signature made Mon Apr 13 13:45:00 2015 EDT using RSA key ID D4F10117

gpg: Good signature from Tyler Palsulich tpalsul...@apache.org

gpg: WARNING: This key is not certified with a trusted signature!

gpg:  There is no indication that the signature belongs to the owner.

Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4  183E 8810 BB19 D4F1 0117

[chipotle:~/tmp/apache-tika-1.8-rc2] mattmann% $HOME/bin/verify_md5_checksums

md5sum: stat '*.tar.gz': No such file or directory

md5sum: stat '*.bz2': No such file or directory

md5sum: stat '*.tgz': No such file or directory

tika-1.8-src.zip: OK

[chipotle:~/tmp/apache-tika-1.8-rc2] mattmann%

Cheers!

Chris


From: Tyler Palsulich [tpalsul...@apache.org]
Sent: Monday, April 13, 2015 10:56 AM
To: dev@tika.apache.org; u...@tika.apache.org
Subject: [VOTE] Apache Tika 1.8 Release Candidate #2

Hi Folks,

A candidate for the Tika 1.8 release is available at:
  https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:
  http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/

The SHA1 checksum of the archive is
  5e22fee9079370398472e59082d171ae2d7fdd31.

In addition, a staged maven repository is available here:
  https://repository.apache.org/content/repositories/orgapachetika-1009

Please vote on releasing this package

RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-14 Thread Hong-Thai Nguyen
Hi,

+1 for me.

Great work, Tyler !

Hong-Thai

-Message d'origine-
De : Tyler Palsulich [mailto:tpalsul...@apache.org] 
Envoyé : lundi 13 avril 2015 19:56
À : dev@tika.apache.org; u...@tika.apache.org
Objet : [VOTE] Apache Tika 1.8 Release Candidate #2

Hi Folks,

A candidate for the Tika 1.8 release is available at:
  https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:
  http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/

The SHA1 checksum of the archive is
  5e22fee9079370398472e59082d171ae2d7fdd31.

In addition, a staged maven repository is available here:
  https://repository.apache.org/content/repositories/orgapachetika-1009

Please vote on releasing this package as Apache Tika 1.8. The vote is open for 
the next 72 hours and passes if a majority of at least three +1 Tika PMC votes 
are cast.

[ ] +1 Release this package as Apache Tika 1.8 [ ] ±0 I don't object to this 
release, but I haven't checked it [ ] -1 Do not release this package because...

Thanks,
Tyler


Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-14 Thread Allison, Timothy B.
+1

Thank you, Tyler!

Apologies to Hong-Thai and community for not recognizing the severity of 
TIKA-1600 when I voted in favor of rc1!

Details...

I reran against govdocs1, and there aren't any major surprises.

On our Rackspace vm, I  _finally_ unzipped the Common Crawl slice that Julien 
Nioche created for us, and I ran against that as well.  That turned up 
TIKA-1605 and another exceedingly rare NPE in the PDFParser.  I don't think 
either of these are blockers, and they're now fixed in trunk.

There are slightly fewer metadata values for some jpegs.  For the one file that 
I manually reviewed, 1.8-rc was missing these values (that were available in 
1.7):

JPEG quality
IPTC-NAA record
Plug-in 1 Data

Comparison reports are available here (much more work remains to be done on 
tika-eval):

https://github.com/tballison/share/tree/master/tika_comparisons 


From: Tyler Palsulich tpalsul...@apache.org
Sent: Monday, April 13, 2015 1:56 PM
To: dev@tika.apache.org; u...@tika.apache.org
Subject: [VOTE] Apache Tika 1.8 Release Candidate #2

Hi Folks,

A candidate for the Tika 1.8 release is available at:
  https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:
  http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/

The SHA1 checksum of the archive is
  5e22fee9079370398472e59082d171ae2d7fdd31.

In addition, a staged maven repository is available here:
  https://repository.apache.org/content/repositories/orgapachetika-1009

Please vote on releasing this package as Apache Tika 1.8. The vote is open
for the next 72 hours and passes if a majority of at least three +1 Tika
PMC votes are cast.

[ ] +1 Release this package as Apache Tika 1.8
[ ] ±0 I don't object to this release, but I haven't checked it
[ ] -1 Do not release this package because...

Thanks,
Tyler


[VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-13 Thread Tyler Palsulich
Hi Folks,

A candidate for the Tika 1.8 release is available at:
  https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:
  http://svn.apache.org/repos/asf/tika/tags/1.8-rc2/

The SHA1 checksum of the archive is
  5e22fee9079370398472e59082d171ae2d7fdd31.

In addition, a staged maven repository is available here:
  https://repository.apache.org/content/repositories/orgapachetika-1009

Please vote on releasing this package as Apache Tika 1.8. The vote is open
for the next 72 hours and passes if a majority of at least three +1 Tika
PMC votes are cast.

[ ] +1 Release this package as Apache Tika 1.8
[ ] ±0 I don't object to this release, but I haven't checked it
[ ] -1 Do not release this package because...

Thanks,
Tyler