Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-15 Thread Tamás Cservenák
Howdy,

There were some Maven Central issues in the past few days, hopefully fixed.
https://status.maven.org/#week

Thanks
Tamas


On Mon, Dec 13, 2021 at 11:18 PM Lewis John McGibbney 
wrote:

> I performed another build of the tika-2.2.0-src.zip artifact which failed.
> I've captured the failure output
>
> https://paste.apache.org/o9iju
>
> % mvn -version
> Apache Maven 3.8.4 (9b656c72d54e5bacbed989b64718c159fe39b537)
> Maven home: /usr/local/Cellar/maven/3.8.4/libexec
> Java version: 11.0.10, vendor: Oracle Corporation, runtime:
> /Library/Java/JavaVirtualMachines/jdk-11.0.10.jdk/Contents/Home
> Default locale: en_US, platform encoding: UTF-8
> OS name: "mac os x", version: "10.15.7", arch: "x86_64", family: "mac"
>
> Can anyone else reproduce this failure?
>
> lewismc
>
> On 2021/12/13 22:13:41 Lewis John McGibbney wrote:
> > Hi Tim,
> >
> > On 2021/12/13 21:37:47 Tim Allison wrote:
> > > A candidate for the Tika 2.2.0 release is available at:
> > > https://dist.apache.org/repos/dist/dev/tika/
> >
> > I downloaded the tika-2.2.0-src.zip artifact
> > >
> 9083fa1973f7146d2869bbdfa2dbdd493e12ac04235b9a4017a01b0b475684a2bc4377149a5a36b68722525fa3de68c7e06b2f7095af0c1e9f8510fba23e2b8d.
> >
> > .sha512 signature good
> > .asc signature is good
> > pom.xml versions all match
> > good NOTICE.txt
> > good CHANGES.txt
> >
> > >
> > > In addition, a staged maven repository is available here:
> > >
> https://repository.apache.org/content/repositories/orgapachetika-1073/org/apache/tika
> > >
> >
> > I added the following to Any23 master pom.xml and ran our unit test suite
> >
> > 
> >   
> > apache-repo-snapshots
> > https://repository.apache.org/content/repositories/snapshots/
> 
> > 
> >   false
> > 
> > 
> >   true
> > 
> >   
> > 
> >
> > Everything passes successfully.
> >
> > >
> > > [X] +1 Release this package as Apache Tika 2.2.0
> >
> > I did notice that the tika DL's module(s) are pulling in the enire
> Hadoop dependency chain. I wonder if we can cut down on this... that is
> however a concern outside of this release candidate review.
> >
> > Thanks for the quick turnaround.
> > lewismc
> >
>


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-14 Thread Konstantin Gribov
Hi, folks.

Built successfully on ArchLinux, OpenJDK 11 & 17 (Temurin-11.0.13+8 &
17.0.1+12) w/ Tesseract 4.1.1, Leptonica 1.82.0 except:
*
org.apache.tika.parser.ocr.TesseractOCRParserTest.confirmMultiPageTiffHandling
(still extracts "Page?2" instead of "Page 2" on my laptop);
* bunch of potential CVEs reported in age-recognizer due to old Netty,
Hadoop, Avro, Mesos, Spark (web framework), Log4j 1.x, Jackson, Commons
BeanUtils, Scala, Commons Collections, Zookeeper, I'm not sure if any
affect Tika;
* some slf4j and log4j2 issues in tests (multiple bindings or absent
implementation).

I think we can ignore CVE-2021-45046 [1]
 now and update to log4j
2.16.0 in a few weeks, it has a much more narrow scope and we don't use
MDC/ThreadContext in a vulnerable way from what I see.

Checksums and GPG signatures seem fine.

[x] +1 Release this package as Apache Tika 2.2.0
[ ] -1 Do not release this package because...

[1]: https://www.cve.org/CVERecord?id=CVE-2021-45046

-- 
Best regards,
Konstantin Gribov.


On Wed, Dec 15, 2021 at 1:04 AM Oleg Tikhonov 
wrote:

> +1
>
> > On 15 Dec 2021, at 0:01, Tim Allison  wrote:
> >
> > +1
> >
> > On Tue, Dec 14, 2021 at 4:31 PM Lewis John McGibbney  >
> > wrote:
> >
> >> I'll submit a PR for the README but I think it's also worthwile to
> augment
> >> the release management guide so that the message to review the release
> >> candidate includes this information.
> >> lewismc
> >>
> >> On 2021/12/14 20:17:05 Tim Allison wrote:
> >>> Y, you're right. Lewis, where should we mention the Docker requirement
> >>> on our site?
> >>>
> >>> On Tue, Dec 14, 2021 at 3:06 PM Lewis John McGibbney <
> lewi...@apache.org>
> >> wrote:
> 
>  Hi Ken,
> 
>  On 2021/12/13 22:38:49 Ken Krugler wrote:
> > That error looks like you’ve got a connection issue with the Maven
> >> central repo…
> >
> > — Ken
> 
>  Yes you are correct :)
> 
>  Once that issue sorted itself out my local build passed so my +1
> >> stands.
> 
>  I this it is worthwhile us stating that Docker is a prerequisite for
> >> installing from source. This is required for the tika-pipes* modules.
> 
>  lewismc
> >>>
> >>
>
>


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-14 Thread Oleg Tikhonov
+1 

> On 15 Dec 2021, at 0:01, Tim Allison  wrote:
> 
> +1
> 
> On Tue, Dec 14, 2021 at 4:31 PM Lewis John McGibbney 
> wrote:
> 
>> I'll submit a PR for the README but I think it's also worthwile to augment
>> the release management guide so that the message to review the release
>> candidate includes this information.
>> lewismc
>> 
>> On 2021/12/14 20:17:05 Tim Allison wrote:
>>> Y, you're right. Lewis, where should we mention the Docker requirement
>>> on our site?
>>> 
>>> On Tue, Dec 14, 2021 at 3:06 PM Lewis John McGibbney 
>> wrote:
 
 Hi Ken,
 
 On 2021/12/13 22:38:49 Ken Krugler wrote:
> That error looks like you’ve got a connection issue with the Maven
>> central repo…
> 
> — Ken
 
 Yes you are correct :)
 
 Once that issue sorted itself out my local build passed so my +1
>> stands.
 
 I this it is worthwhile us stating that Docker is a prerequisite for
>> installing from source. This is required for the tika-pipes* modules.
 
 lewismc
>>> 
>> 



Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-14 Thread Tim Allison
+1

On Tue, Dec 14, 2021 at 4:31 PM Lewis John McGibbney 
wrote:

> I'll submit a PR for the README but I think it's also worthwile to augment
> the release management guide so that the message to review the release
> candidate includes this information.
> lewismc
>
> On 2021/12/14 20:17:05 Tim Allison wrote:
> > Y, you're right. Lewis, where should we mention the Docker requirement
> > on our site?
> >
> > On Tue, Dec 14, 2021 at 3:06 PM Lewis John McGibbney 
> wrote:
> > >
> > > Hi Ken,
> > >
> > > On 2021/12/13 22:38:49 Ken Krugler wrote:
> > > > That error looks like you’ve got a connection issue with the Maven
> central repo…
> > > >
> > > > — Ken
> > >
> > > Yes you are correct :)
> > >
> > > Once that issue sorted itself out my local build passed so my +1
> stands.
> > >
> > > I this it is worthwhile us stating that Docker is a prerequisite for
> installing from source. This is required for the tika-pipes* modules.
> > >
> > > lewismc
> >
>


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-14 Thread Lewis John McGibbney
I'll submit a PR for the README but I think it's also worthwile to augment the 
release management guide so that the message to review the release candidate 
includes this information.
lewismc

On 2021/12/14 20:17:05 Tim Allison wrote:
> Y, you're right. Lewis, where should we mention the Docker requirement
> on our site?
> 
> On Tue, Dec 14, 2021 at 3:06 PM Lewis John McGibbney  
> wrote:
> >
> > Hi Ken,
> >
> > On 2021/12/13 22:38:49 Ken Krugler wrote:
> > > That error looks like you’ve got a connection issue with the Maven 
> > > central repo…
> > >
> > > — Ken
> >
> > Yes you are correct :)
> >
> > Once that issue sorted itself out my local build passed so my +1 stands.
> >
> > I this it is worthwhile us stating that Docker is a prerequisite for 
> > installing from source. This is required for the tika-pipes* modules.
> >
> > lewismc
> 


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-14 Thread Tim Allison
Y, you're right. Lewis, where should we mention the Docker requirement
on our site?

On Tue, Dec 14, 2021 at 3:06 PM Lewis John McGibbney  wrote:
>
> Hi Ken,
>
> On 2021/12/13 22:38:49 Ken Krugler wrote:
> > That error looks like you’ve got a connection issue with the Maven central 
> > repo…
> >
> > — Ken
>
> Yes you are correct :)
>
> Once that issue sorted itself out my local build passed so my +1 stands.
>
> I this it is worthwhile us stating that Docker is a prerequisite for 
> installing from source. This is required for the tika-pipes* modules.
>
> lewismc


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-14 Thread Lewis John McGibbney
Hi Ken,

On 2021/12/13 22:38:49 Ken Krugler wrote:
> That error looks like you’ve got a connection issue with the Maven central 
> repo…
> 
> — Ken

Yes you are correct :)

Once that issue sorted itself out my local build passed so my +1 stands.

I this it is worthwhile us stating that Docker is a prerequisite for installing 
from source. This is required for the tika-pipes* modules.

lewismc


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-13 Thread Tim Allison
Agreed

On Mon, Dec 13, 2021 at 5:39 PM Ken Krugler 
wrote:

> That error looks like you’ve got a connection issue with the Maven central
> repo…
>
> — Ken
>
>
> > On Dec 13, 2021, at 2:18 PM, Lewis John McGibbney 
> wrote:
> >
> > I performed another build of the tika-2.2.0-src.zip artifact which
> failed. I've captured the failure output
> >
> > https://paste.apache.org/o9iju
> >
> > % mvn -version
> > Apache Maven 3.8.4 (9b656c72d54e5bacbed989b64718c159fe39b537)
> > Maven home: /usr/local/Cellar/maven/3.8.4/libexec
> > Java version: 11.0.10, vendor: Oracle Corporation, runtime:
> /Library/Java/JavaVirtualMachines/jdk-11.0.10.jdk/Contents/Home
> > Default locale: en_US, platform encoding: UTF-8
> > OS name: "mac os x", version: "10.15.7", arch: "x86_64", family: "mac"
> >
> > Can anyone else reproduce this failure?
> >
> > lewismc
> >
> > On 2021/12/13 22:13:41 Lewis John McGibbney wrote:
> >> Hi Tim,
> >>
> >> On 2021/12/13 21:37:47 Tim Allison wrote:
> >>> A candidate for the Tika 2.2.0 release is available at:
> >>> https://dist.apache.org/repos/dist/dev/tika/
> >>
> >> I downloaded the tika-2.2.0-src.zip artifact
> >>>
> 9083fa1973f7146d2869bbdfa2dbdd493e12ac04235b9a4017a01b0b475684a2bc4377149a5a36b68722525fa3de68c7e06b2f7095af0c1e9f8510fba23e2b8d.
> >>
> >> .sha512 signature good
> >> .asc signature is good
> >> pom.xml versions all match
> >> good NOTICE.txt
> >> good CHANGES.txt
> >>
> >>>
> >>> In addition, a staged maven repository is available here:
> >>>
> https://repository.apache.org/content/repositories/orgapachetika-1073/org/apache/tika
> >>>
> >>
> >> I added the following to Any23 master pom.xml and ran our unit test
> suite
> >>
> >> 
> >>  
> >>apache-repo-snapshots
> >>https://repository.apache.org/content/repositories/snapshots/
> 
> >>
> >>  false
> >>
> >>
> >>  true
> >>
> >>  
> >> 
> >>
> >> Everything passes successfully.
> >>
> >>>
> >>> [X] +1 Release this package as Apache Tika 2.2.0
> >>
> >> I did notice that the tika DL's module(s) are pulling in the enire
> Hadoop dependency chain. I wonder if we can cut down on this... that is
> however a concern outside of this release candidate review.
> >>
> >> Thanks for the quick turnaround.
> >> lewismc
> >>
>
> --
> Ken Krugler
> http://www.scaleunlimited.com
> Custom big data solutions
> Flink, Pinot, Solr, Elasticsearch
>
>
>
>


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-13 Thread Ken Krugler
That error looks like you’ve got a connection issue with the Maven central repo…

— Ken


> On Dec 13, 2021, at 2:18 PM, Lewis John McGibbney  wrote:
> 
> I performed another build of the tika-2.2.0-src.zip artifact which failed. 
> I've captured the failure output
> 
> https://paste.apache.org/o9iju
> 
> % mvn -version
> Apache Maven 3.8.4 (9b656c72d54e5bacbed989b64718c159fe39b537)
> Maven home: /usr/local/Cellar/maven/3.8.4/libexec
> Java version: 11.0.10, vendor: Oracle Corporation, runtime: 
> /Library/Java/JavaVirtualMachines/jdk-11.0.10.jdk/Contents/Home
> Default locale: en_US, platform encoding: UTF-8
> OS name: "mac os x", version: "10.15.7", arch: "x86_64", family: "mac"
> 
> Can anyone else reproduce this failure?
> 
> lewismc
> 
> On 2021/12/13 22:13:41 Lewis John McGibbney wrote:
>> Hi Tim,
>> 
>> On 2021/12/13 21:37:47 Tim Allison wrote:
>>> A candidate for the Tika 2.2.0 release is available at:
>>> https://dist.apache.org/repos/dist/dev/tika/
>> 
>> I downloaded the tika-2.2.0-src.zip artifact
>>> 9083fa1973f7146d2869bbdfa2dbdd493e12ac04235b9a4017a01b0b475684a2bc4377149a5a36b68722525fa3de68c7e06b2f7095af0c1e9f8510fba23e2b8d.
>> 
>> .sha512 signature good
>> .asc signature is good
>> pom.xml versions all match
>> good NOTICE.txt
>> good CHANGES.txt
>> 
>>> 
>>> In addition, a staged maven repository is available here:
>>> https://repository.apache.org/content/repositories/orgapachetika-1073/org/apache/tika
>>> 
>> 
>> I added the following to Any23 master pom.xml and ran our unit test suite
>> 
>> 
>>  
>>apache-repo-snapshots
>>https://repository.apache.org/content/repositories/snapshots/
>>
>>  false
>>
>>
>>  true
>>
>>  
>> 
>> 
>> Everything passes successfully.
>> 
>>> 
>>> [X] +1 Release this package as Apache Tika 2.2.0
>> 
>> I did notice that the tika DL's module(s) are pulling in the enire Hadoop 
>> dependency chain. I wonder if we can cut down on this... that is however a 
>> concern outside of this release candidate review.
>> 
>> Thanks for the quick turnaround.
>> lewismc
>> 

--
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch





Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-13 Thread Lewis John McGibbney
I performed another build of the tika-2.2.0-src.zip artifact which failed. I've 
captured the failure output

https://paste.apache.org/o9iju

% mvn -version
Apache Maven 3.8.4 (9b656c72d54e5bacbed989b64718c159fe39b537)
Maven home: /usr/local/Cellar/maven/3.8.4/libexec
Java version: 11.0.10, vendor: Oracle Corporation, runtime: 
/Library/Java/JavaVirtualMachines/jdk-11.0.10.jdk/Contents/Home
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "10.15.7", arch: "x86_64", family: "mac"

Can anyone else reproduce this failure?

lewismc

On 2021/12/13 22:13:41 Lewis John McGibbney wrote:
> Hi Tim,
> 
> On 2021/12/13 21:37:47 Tim Allison wrote:
> > A candidate for the Tika 2.2.0 release is available at:
> > https://dist.apache.org/repos/dist/dev/tika/
> 
> I downloaded the tika-2.2.0-src.zip artifact
> > 9083fa1973f7146d2869bbdfa2dbdd493e12ac04235b9a4017a01b0b475684a2bc4377149a5a36b68722525fa3de68c7e06b2f7095af0c1e9f8510fba23e2b8d.
> 
> .sha512 signature good
> .asc signature is good
> pom.xml versions all match
> good NOTICE.txt
> good CHANGES.txt
> 
> > 
> > In addition, a staged maven repository is available here:
> > https://repository.apache.org/content/repositories/orgapachetika-1073/org/apache/tika
> > 
> 
> I added the following to Any23 master pom.xml and ran our unit test suite
> 
> 
>   
> apache-repo-snapshots
> https://repository.apache.org/content/repositories/snapshots/
> 
>   false
> 
> 
>   true
> 
>   
> 
> 
> Everything passes successfully.
> 
> > 
> > [X] +1 Release this package as Apache Tika 2.2.0
> 
> I did notice that the tika DL's module(s) are pulling in the enire Hadoop 
> dependency chain. I wonder if we can cut down on this... that is however a 
> concern outside of this release candidate review.
> 
> Thanks for the quick turnaround.
> lewismc
> 


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-13 Thread Lewis John McGibbney
I forgot to mention. I performed a full clean install on the tika-2.2.0-src.zip 
artifact and everything installed and tested successfully.

On 2021/12/13 22:13:41 Lewis John McGibbney wrote:
> Hi Tim,
> 
> On 2021/12/13 21:37:47 Tim Allison wrote:
> > A candidate for the Tika 2.2.0 release is available at:
> > https://dist.apache.org/repos/dist/dev/tika/
> 
> I downloaded the tika-2.2.0-src.zip artifact
> > 9083fa1973f7146d2869bbdfa2dbdd493e12ac04235b9a4017a01b0b475684a2bc4377149a5a36b68722525fa3de68c7e06b2f7095af0c1e9f8510fba23e2b8d.
> 
> .sha512 signature good
> .asc signature is good
> pom.xml versions all match
> good NOTICE.txt
> good CHANGES.txt
> 
> > 
> > In addition, a staged maven repository is available here:
> > https://repository.apache.org/content/repositories/orgapachetika-1073/org/apache/tika
> > 
> 
> I added the following to Any23 master pom.xml and ran our unit test suite
> 
> 
>   
> apache-repo-snapshots
> https://repository.apache.org/content/repositories/snapshots/
> 
>   false
> 
> 
>   true
> 
>   
> 
> 
> Everything passes successfully.
> 
> > 
> > [X] +1 Release this package as Apache Tika 2.2.0
> 
> I did notice that the tika DL's module(s) are pulling in the enire Hadoop 
> dependency chain. I wonder if we can cut down on this... that is however a 
> concern outside of this release candidate review.
> 
> Thanks for the quick turnaround.
> lewismc
> 


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-13 Thread Lewis John McGibbney
Hi Tim,

On 2021/12/13 21:37:47 Tim Allison wrote:
> A candidate for the Tika 2.2.0 release is available at:
> https://dist.apache.org/repos/dist/dev/tika/

I downloaded the tika-2.2.0-src.zip artifact
> 9083fa1973f7146d2869bbdfa2dbdd493e12ac04235b9a4017a01b0b475684a2bc4377149a5a36b68722525fa3de68c7e06b2f7095af0c1e9f8510fba23e2b8d.

.sha512 signature good
.asc signature is good
pom.xml versions all match
good NOTICE.txt
good CHANGES.txt

> 
> In addition, a staged maven repository is available here:
> https://repository.apache.org/content/repositories/orgapachetika-1073/org/apache/tika
> 

I added the following to Any23 master pom.xml and ran our unit test suite


  
apache-repo-snapshots
https://repository.apache.org/content/repositories/snapshots/

  false


  true

  


Everything passes successfully.

> 
> [X] +1 Release this package as Apache Tika 2.2.0

I did notice that the tika DL's module(s) are pulling in the enire Hadoop 
dependency chain. I wonder if we can cut down on this... that is however a 
concern outside of this release candidate review.

Thanks for the quick turnaround.
lewismc