Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-13 Thread Tim Allison
Agreed

On Mon, Dec 13, 2021 at 5:39 PM Ken Krugler 
wrote:

> That error looks like you’ve got a connection issue with the Maven central
> repo…
>
> — Ken
>
>
> > On Dec 13, 2021, at 2:18 PM, Lewis John McGibbney 
> wrote:
> >
> > I performed another build of the tika-2.2.0-src.zip artifact which
> failed. I've captured the failure output
> >
> > https://paste.apache.org/o9iju
> >
> > % mvn -version
> > Apache Maven 3.8.4 (9b656c72d54e5bacbed989b64718c159fe39b537)
> > Maven home: /usr/local/Cellar/maven/3.8.4/libexec
> > Java version: 11.0.10, vendor: Oracle Corporation, runtime:
> /Library/Java/JavaVirtualMachines/jdk-11.0.10.jdk/Contents/Home
> > Default locale: en_US, platform encoding: UTF-8
> > OS name: "mac os x", version: "10.15.7", arch: "x86_64", family: "mac"
> >
> > Can anyone else reproduce this failure?
> >
> > lewismc
> >
> > On 2021/12/13 22:13:41 Lewis John McGibbney wrote:
> >> Hi Tim,
> >>
> >> On 2021/12/13 21:37:47 Tim Allison wrote:
> >>> A candidate for the Tika 2.2.0 release is available at:
> >>> https://dist.apache.org/repos/dist/dev/tika/
> >>
> >> I downloaded the tika-2.2.0-src.zip artifact
> >>>
> 9083fa1973f7146d2869bbdfa2dbdd493e12ac04235b9a4017a01b0b475684a2bc4377149a5a36b68722525fa3de68c7e06b2f7095af0c1e9f8510fba23e2b8d.
> >>
> >> .sha512 signature good
> >> .asc signature is good
> >> pom.xml versions all match
> >> good NOTICE.txt
> >> good CHANGES.txt
> >>
> >>>
> >>> In addition, a staged maven repository is available here:
> >>>
> https://repository.apache.org/content/repositories/orgapachetika-1073/org/apache/tika
> >>>
> >>
> >> I added the following to Any23 master pom.xml and ran our unit test
> suite
> >>
> >> 
> >>  
> >>apache-repo-snapshots
> >>https://repository.apache.org/content/repositories/snapshots/
> 
> >>
> >>  false
> >>
> >>
> >>  true
> >>
> >>  
> >> 
> >>
> >> Everything passes successfully.
> >>
> >>>
> >>> [X] +1 Release this package as Apache Tika 2.2.0
> >>
> >> I did notice that the tika DL's module(s) are pulling in the enire
> Hadoop dependency chain. I wonder if we can cut down on this... that is
> however a concern outside of this release candidate review.
> >>
> >> Thanks for the quick turnaround.
> >> lewismc
> >>
>
> --
> Ken Krugler
> http://www.scaleunlimited.com
> Custom big data solutions
> Flink, Pinot, Solr, Elasticsearch
>
>
>
>


Re: Log4j 2.16.0 a more complete fix to Log4Shell

2021-12-13 Thread Tim Allison
I'll dig deeper tomorrow, but I think we're ok with 2.15. I like what
they've done with 2.16.0. :D

On Mon, Dec 13, 2021 at 7:57 PM Dave Fisher  wrote:
>
> You’ll need to evaluate that yourself.
>
> Sent from my iPhone
>
> > On Dec 13, 2021, at 4:56 PM, Tim Allison  wrote:
> >
> > Do we have to do a respin of the release candidate or is this marginally 
> > better?
> >
> >> On Mon, Dec 13, 2021 at 7:43 PM Dave Fisher  wrote:
> >>
> >> https://lists.apache.org/thread/d6v4r6nosxysyq9rvnr779336yf0woz4
>


Re: Log4j 2.16.0 a more complete fix to Log4Shell

2021-12-13 Thread Dave Fisher
You’ll need to evaluate that yourself.

Sent from my iPhone

> On Dec 13, 2021, at 4:56 PM, Tim Allison  wrote:
> 
> Do we have to do a respin of the release candidate or is this marginally 
> better?
> 
>> On Mon, Dec 13, 2021 at 7:43 PM Dave Fisher  wrote:
>> 
>> https://lists.apache.org/thread/d6v4r6nosxysyq9rvnr779336yf0woz4



[GitHub] [tika] tballison merged pull request #463: log4j 2.16.0

2021-12-13 Thread GitBox


tballison merged pull request #463:
URL: https://github.com/apache/tika/pull/463


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: Log4j 2.16.0 a more complete fix to Log4Shell

2021-12-13 Thread Tim Allison
Do we have to do a respin of the release candidate or is this marginally better?

On Mon, Dec 13, 2021 at 7:43 PM Dave Fisher  wrote:
>
> https://lists.apache.org/thread/d6v4r6nosxysyq9rvnr779336yf0woz4


Log4j 2.16.0 a more complete fix to Log4Shell

2021-12-13 Thread Dave Fisher
https://lists.apache.org/thread/d6v4r6nosxysyq9rvnr779336yf0woz4


[GitHub] [tika] sullis opened a new pull request #463: log4j 2.16.0

2021-12-13 Thread GitBox


sullis opened a new pull request #463:
URL: https://github.com/apache/tika/pull/463


   
   
   Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! 
Your help is appreciated!
   
   Before opening the pull request, please verify that
   * there is an open issue on the [Tika issue 
tracker](https://issues.apache.org/jira/projects/TIKA) which describes the 
problem or the improvement. We cannot accept pull requests without an issue 
because the change wouldn't be listed in the release notes.
   * the issue ID (`TIKA-`)
 - is referenced in the title of the pull request
 - and placed in front of your commit messages surrounded by square 
brackets (`[TIKA-] Issue or pull request title`)
   * commits are squashed into a single one (or few commits for larger changes)
   * Tika is successfully built and unit tests pass by running `mvn clean test`
   * there should be no conflicts when merging the pull request branch into the 
*recent* `main` branch. If there are conflicts, please try to rebase the pull 
request branch on top of a freshly pulled `main` branch.
   
   We will be able to faster integrate your pull request if these conditions 
are met. If you have any questions how to fix your problem or about using Tika 
in general, please sign up for the [Tika mailing 
list](http://tika.apache.org/mail-lists.html). Thanks!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-13 Thread Ken Krugler
That error looks like you’ve got a connection issue with the Maven central repo…

— Ken


> On Dec 13, 2021, at 2:18 PM, Lewis John McGibbney  wrote:
> 
> I performed another build of the tika-2.2.0-src.zip artifact which failed. 
> I've captured the failure output
> 
> https://paste.apache.org/o9iju
> 
> % mvn -version
> Apache Maven 3.8.4 (9b656c72d54e5bacbed989b64718c159fe39b537)
> Maven home: /usr/local/Cellar/maven/3.8.4/libexec
> Java version: 11.0.10, vendor: Oracle Corporation, runtime: 
> /Library/Java/JavaVirtualMachines/jdk-11.0.10.jdk/Contents/Home
> Default locale: en_US, platform encoding: UTF-8
> OS name: "mac os x", version: "10.15.7", arch: "x86_64", family: "mac"
> 
> Can anyone else reproduce this failure?
> 
> lewismc
> 
> On 2021/12/13 22:13:41 Lewis John McGibbney wrote:
>> Hi Tim,
>> 
>> On 2021/12/13 21:37:47 Tim Allison wrote:
>>> A candidate for the Tika 2.2.0 release is available at:
>>> https://dist.apache.org/repos/dist/dev/tika/
>> 
>> I downloaded the tika-2.2.0-src.zip artifact
>>> 9083fa1973f7146d2869bbdfa2dbdd493e12ac04235b9a4017a01b0b475684a2bc4377149a5a36b68722525fa3de68c7e06b2f7095af0c1e9f8510fba23e2b8d.
>> 
>> .sha512 signature good
>> .asc signature is good
>> pom.xml versions all match
>> good NOTICE.txt
>> good CHANGES.txt
>> 
>>> 
>>> In addition, a staged maven repository is available here:
>>> https://repository.apache.org/content/repositories/orgapachetika-1073/org/apache/tika
>>> 
>> 
>> I added the following to Any23 master pom.xml and ran our unit test suite
>> 
>> 
>>  
>>apache-repo-snapshots
>>https://repository.apache.org/content/repositories/snapshots/
>>
>>  false
>>
>>
>>  true
>>
>>  
>> 
>> 
>> Everything passes successfully.
>> 
>>> 
>>> [X] +1 Release this package as Apache Tika 2.2.0
>> 
>> I did notice that the tika DL's module(s) are pulling in the enire Hadoop 
>> dependency chain. I wonder if we can cut down on this... that is however a 
>> concern outside of this release candidate review.
>> 
>> Thanks for the quick turnaround.
>> lewismc
>> 

--
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch





[jira] [Commented] (TIKA-3417) Running tika-docker as non-root user

2021-12-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458753#comment-17458753
 ] 

ASF GitHub Bot commented on TIKA-3417:
--

dameikle commented on pull request #4:
URL: https://github.com/apache/tika-docker/pull/4#issuecomment-992974424


   Hi @wjwilson-ibm :wave: 
   
   Looks like the auto-build has been disabled on Docker Hub, so I just 
manually updated the 1.27 tag at logicalspark/docker-tikaserver. Can you see if 
this works for you please?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Running tika-docker as non-root user
> 
>
> Key: TIKA-3417
> URL: https://issues.apache.org/jira/browse/TIKA-3417
> Project: Tika
>  Issue Type: Improvement
>  Components: docker, tika-docker
>Reporter: Lewis John McGibbney
>Assignee: Philip Southam
>Priority: Major
>
> The PR and context can be found at 
> https://github.com/apache/tika-docker/pull/4



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [tika-docker] dameikle commented on pull request #4: [TIKA-3417] Running tika-docker as non-root user

2021-12-13 Thread GitBox


dameikle commented on pull request #4:
URL: https://github.com/apache/tika-docker/pull/4#issuecomment-992974424


   Hi @wjwilson-ibm :wave: 
   
   Looks like the auto-build has been disabled on Docker Hub, so I just 
manually updated the 1.27 tag at logicalspark/docker-tikaserver. Can you see if 
this works for you please?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-13 Thread Lewis John McGibbney
I performed another build of the tika-2.2.0-src.zip artifact which failed. I've 
captured the failure output

https://paste.apache.org/o9iju

% mvn -version
Apache Maven 3.8.4 (9b656c72d54e5bacbed989b64718c159fe39b537)
Maven home: /usr/local/Cellar/maven/3.8.4/libexec
Java version: 11.0.10, vendor: Oracle Corporation, runtime: 
/Library/Java/JavaVirtualMachines/jdk-11.0.10.jdk/Contents/Home
Default locale: en_US, platform encoding: UTF-8
OS name: "mac os x", version: "10.15.7", arch: "x86_64", family: "mac"

Can anyone else reproduce this failure?

lewismc

On 2021/12/13 22:13:41 Lewis John McGibbney wrote:
> Hi Tim,
> 
> On 2021/12/13 21:37:47 Tim Allison wrote:
> > A candidate for the Tika 2.2.0 release is available at:
> > https://dist.apache.org/repos/dist/dev/tika/
> 
> I downloaded the tika-2.2.0-src.zip artifact
> > 9083fa1973f7146d2869bbdfa2dbdd493e12ac04235b9a4017a01b0b475684a2bc4377149a5a36b68722525fa3de68c7e06b2f7095af0c1e9f8510fba23e2b8d.
> 
> .sha512 signature good
> .asc signature is good
> pom.xml versions all match
> good NOTICE.txt
> good CHANGES.txt
> 
> > 
> > In addition, a staged maven repository is available here:
> > https://repository.apache.org/content/repositories/orgapachetika-1073/org/apache/tika
> > 
> 
> I added the following to Any23 master pom.xml and ran our unit test suite
> 
> 
>   
> apache-repo-snapshots
> https://repository.apache.org/content/repositories/snapshots/
> 
>   false
> 
> 
>   true
> 
>   
> 
> 
> Everything passes successfully.
> 
> > 
> > [X] +1 Release this package as Apache Tika 2.2.0
> 
> I did notice that the tika DL's module(s) are pulling in the enire Hadoop 
> dependency chain. I wonder if we can cut down on this... that is however a 
> concern outside of this release candidate review.
> 
> Thanks for the quick turnaround.
> lewismc
> 


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-13 Thread Lewis John McGibbney
I forgot to mention. I performed a full clean install on the tika-2.2.0-src.zip 
artifact and everything installed and tested successfully.

On 2021/12/13 22:13:41 Lewis John McGibbney wrote:
> Hi Tim,
> 
> On 2021/12/13 21:37:47 Tim Allison wrote:
> > A candidate for the Tika 2.2.0 release is available at:
> > https://dist.apache.org/repos/dist/dev/tika/
> 
> I downloaded the tika-2.2.0-src.zip artifact
> > 9083fa1973f7146d2869bbdfa2dbdd493e12ac04235b9a4017a01b0b475684a2bc4377149a5a36b68722525fa3de68c7e06b2f7095af0c1e9f8510fba23e2b8d.
> 
> .sha512 signature good
> .asc signature is good
> pom.xml versions all match
> good NOTICE.txt
> good CHANGES.txt
> 
> > 
> > In addition, a staged maven repository is available here:
> > https://repository.apache.org/content/repositories/orgapachetika-1073/org/apache/tika
> > 
> 
> I added the following to Any23 master pom.xml and ran our unit test suite
> 
> 
>   
> apache-repo-snapshots
> https://repository.apache.org/content/repositories/snapshots/
> 
>   false
> 
> 
>   true
> 
>   
> 
> 
> Everything passes successfully.
> 
> > 
> > [X] +1 Release this package as Apache Tika 2.2.0
> 
> I did notice that the tika DL's module(s) are pulling in the enire Hadoop 
> dependency chain. I wonder if we can cut down on this... that is however a 
> concern outside of this release candidate review.
> 
> Thanks for the quick turnaround.
> lewismc
> 


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-13 Thread Lewis John McGibbney
Hi Tim,

On 2021/12/13 21:37:47 Tim Allison wrote:
> A candidate for the Tika 2.2.0 release is available at:
> https://dist.apache.org/repos/dist/dev/tika/

I downloaded the tika-2.2.0-src.zip artifact
> 9083fa1973f7146d2869bbdfa2dbdd493e12ac04235b9a4017a01b0b475684a2bc4377149a5a36b68722525fa3de68c7e06b2f7095af0c1e9f8510fba23e2b8d.

.sha512 signature good
.asc signature is good
pom.xml versions all match
good NOTICE.txt
good CHANGES.txt

> 
> In addition, a staged maven repository is available here:
> https://repository.apache.org/content/repositories/orgapachetika-1073/org/apache/tika
> 

I added the following to Any23 master pom.xml and ran our unit test suite


  
apache-repo-snapshots
https://repository.apache.org/content/repositories/snapshots/

  false


  true

  


Everything passes successfully.

> 
> [X] +1 Release this package as Apache Tika 2.2.0

I did notice that the tika DL's module(s) are pulling in the enire Hadoop 
dependency chain. I wonder if we can cut down on this... that is however a 
concern outside of this release candidate review.

Thanks for the quick turnaround.
lewismc


[VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-13 Thread Tim Allison
A candidate for the Tika 2.2.0 release is available at:
https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:
https://github.com/apache/tika/tree/2.2.0-rc1/

The SHA-512 checksum of the archive is
9083fa1973f7146d2869bbdfa2dbdd493e12ac04235b9a4017a01b0b475684a2bc4377149a5a36b68722525fa3de68c7e06b2f7095af0c1e9f8510fba23e2b8d.

In addition, a staged maven repository is available here:
https://repository.apache.org/content/repositories/orgapachetika-1073/org/apache/tika

Please vote on releasing this package as Apache Tika 2.2.0.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Tika PMC votes are cast.

Here's my +1

[ ] +1 Release this package as Apache Tika 2.2.0
[ ] -1 Do not release this package because...


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458686#comment-17458686
 ] 

Tim Allison commented on TIKA-3164:
---

Oh my goodness, thank you [~bob]!  There's no rush. POI 5.x will go out with an 
upgraded PDFBox early in the new year.  Thank you!

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-13 Thread Tim Allison
Yes.  That was the reasoning behind my -0.  I don't think this will
destroy our resources, but yes, please do migrate to 2.x asap.


On Mon, Dec 13, 2021 at 3:13 PM Eric Pugh
 wrote:
>
> Isn’t the goal of Tika 2 to mean that we no longer work on Tika 1?   Does the 
> Tika community have enough developer bandwidth to continue to maintain Tika 1 
> while also pushing forward on Tika 2?
>
> I worry that we’ll fall into that situation where people just end up using 
> Tika 1 for forever, especially if there are new updates to it that are 
> happening, which then encourages folks not to move to Tika 2.
>
>
>
>
> > On Dec 13, 2021, at 2:49 PM, Tim Allison  wrote:
> >
> > Sounds like 2 +1 to my -0. :D  I'll start working on this now.
> >
> > On Mon, Dec 13, 2021 at 2:09 PM Nicholas DiPiazza
> >  wrote:
> >>
> >> I prefer upgrade to log4j2
> >>
> >> On Mon, Dec 13, 2021, 12:05 PM Tim Allison  wrote:
> >>
> >>> All,
> >>>  I'm currently in the process of building the rc1 for Tika 2.x. On
> >>> TIKA-3616, Luís Filipe Nassif asked if we could upgrade log4j to
> >>> log4j2 in the 1.x branch.  I think we avoided that because it would be
> >>> a breaking change(?).  There are security vulns in log4j and it hit
> >>> EOL
> >>> in August 2015.
> >>>  Should we upgrade the Tika 1.x branch for log4j2?
> >>>
> >>>  Best,
> >>>
> >>>   Tim
> >>>
> >>>
> >>> [1]
> >>> https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457595#comment-17457595
> >>>
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
> http://www.opensourceconnections.com  
> | My Free/Busy 
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
> 
> This e-mail and all contents, including attachments, is considered to be 
> Company Confidential unless explicitly stated otherwise, regardless of 
> whether attachments are marked as such.
>


[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-13 Thread Bob Paulin (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458683#comment-17458683
 ] 

Bob Paulin commented on TIKA-3164:
--

Hey [~tallison] .  See the mention but will likely not get to this for a few 
day.  Did a few tests yesterday and I'm able to recreate your results on my 
machine but don't have any specific recommendations yet.

> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-13 Thread Eric Pugh
Isn’t the goal of Tika 2 to mean that we no longer work on Tika 1?   Does the 
Tika community have enough developer bandwidth to continue to maintain Tika 1 
while also pushing forward on Tika 2?

I worry that we’ll fall into that situation where people just end up using Tika 
1 for forever, especially if there are new updates to it that are happening, 
which then encourages folks not to move to Tika 2.




> On Dec 13, 2021, at 2:49 PM, Tim Allison  wrote:
> 
> Sounds like 2 +1 to my -0. :D  I'll start working on this now.
> 
> On Mon, Dec 13, 2021 at 2:09 PM Nicholas DiPiazza
>  wrote:
>> 
>> I prefer upgrade to log4j2
>> 
>> On Mon, Dec 13, 2021, 12:05 PM Tim Allison  wrote:
>> 
>>> All,
>>>  I'm currently in the process of building the rc1 for Tika 2.x. On
>>> TIKA-3616, Luís Filipe Nassif asked if we could upgrade log4j to
>>> log4j2 in the 1.x branch.  I think we avoided that because it would be
>>> a breaking change(?).  There are security vulns in log4j and it hit
>>> EOL
>>> in August 2015.
>>>  Should we upgrade the Tika 1.x branch for log4j2?
>>> 
>>>  Best,
>>> 
>>>   Tim
>>> 
>>> 
>>> [1]
>>> https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457595#comment-17457595
>>> 

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



[jira] [Created] (TIKA-3618) Upgrade to log4j2 in 1.x branch

2021-12-13 Thread Tim Allison (Jira)
Tim Allison created TIKA-3618:
-

 Summary: Upgrade to log4j2 in 1.x branch
 Key: TIKA-3618
 URL: https://issues.apache.org/jira/browse/TIKA-3618
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-13 Thread Tim Allison
Sounds like 2 +1 to my -0. :D  I'll start working on this now.

On Mon, Dec 13, 2021 at 2:09 PM Nicholas DiPiazza
 wrote:
>
> I prefer upgrade to log4j2
>
> On Mon, Dec 13, 2021, 12:05 PM Tim Allison  wrote:
>
> > All,
> >   I'm currently in the process of building the rc1 for Tika 2.x. On
> > TIKA-3616, Luís Filipe Nassif asked if we could upgrade log4j to
> > log4j2 in the 1.x branch.  I think we avoided that because it would be
> > a breaking change(?).  There are security vulns in log4j and it hit
> > EOL
> > in August 2015.
> >   Should we upgrade the Tika 1.x branch for log4j2?
> >
> >   Best,
> >
> >Tim
> >
> >
> > [1]
> > https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457595#comment-17457595
> >


Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-13 Thread Nicholas DiPiazza
I prefer upgrade to log4j2

On Mon, Dec 13, 2021, 12:05 PM Tim Allison  wrote:

> All,
>   I'm currently in the process of building the rc1 for Tika 2.x. On
> TIKA-3616, Luís Filipe Nassif asked if we could upgrade log4j to
> log4j2 in the 1.x branch.  I think we avoided that because it would be
> a breaking change(?).  There are security vulns in log4j and it hit
> EOL
> in August 2015.
>   Should we upgrade the Tika 1.x branch for log4j2?
>
>   Best,
>
>Tim
>
>
> [1]
> https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457595#comment-17457595
>


[DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-13 Thread Tim Allison
All,
  I'm currently in the process of building the rc1 for Tika 2.x. On
TIKA-3616, Luís Filipe Nassif asked if we could upgrade log4j to
log4j2 in the 1.x branch.  I think we avoided that because it would be
a breaking change(?).  There are security vulns in log4j and it hit
EOL
in August 2015.
  Should we upgrade the Tika 1.x branch for log4j2?

  Best,

   Tim


[1] 
https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457595#comment-17457595


[jira] [Commented] (TIKA-3584) tika async should keep going even if unable to restart a forked process

2021-12-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458542#comment-17458542
 ] 

Hudson commented on TIKA-3584:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #383 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/383/])
TIKA-3584 -- general upgrades for 2.2.0 (tallison: 
[https://github.com/apache/tika/commit/63cdd350b7962b07fd4e5e8b0931991f6494c056])
* (edit) tika-parent/pom.xml
* (edit) CHANGES.txt


> tika async should keep going even if unable to restart a forked process
> ---
>
> Key: TIKA-3584
> URL: https://issues.apache.org/jira/browse/TIKA-3584
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>
> I initially set up async to crash if a client couldn't start/restart a forked 
> process after {{startupTimeoutMillis}}.  I'm seeing now that if I peg a 
> machine (e.g. configure too many forked processes or send evil documents), 
> there are times when the forked process cannot restart even within a fairly 
> large window.
> I propose that we sleep some configurable amount if there's a timeout during 
> the start/restart.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3417) Running tika-docker as non-root user

2021-12-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458482#comment-17458482
 ] 

ASF GitHub Bot commented on TIKA-3417:
--

wjwilson-ibm edited a comment on pull request #4:
URL: https://github.com/apache/tika-docker/pull/4#issuecomment-986775187


   @dameikle Yes to logicalspark/docker-tikaserver is our source and greatly 
appreciated.  Will it be a new version like 1.28 since we aren't allowed to 
pull latest?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Running tika-docker as non-root user
> 
>
> Key: TIKA-3417
> URL: https://issues.apache.org/jira/browse/TIKA-3417
> Project: Tika
>  Issue Type: Improvement
>  Components: docker, tika-docker
>Reporter: Lewis John McGibbney
>Assignee: Philip Southam
>Priority: Major
>
> The PR and context can be found at 
> https://github.com/apache/tika-docker/pull/4



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [tika-docker] wjwilson-ibm edited a comment on pull request #4: [TIKA-3417] Running tika-docker as non-root user

2021-12-13 Thread GitBox


wjwilson-ibm edited a comment on pull request #4:
URL: https://github.com/apache/tika-docker/pull/4#issuecomment-986775187


   @dameikle Yes to logicalspark/docker-tikaserver is our source and greatly 
appreciated.  Will it be a new version like 1.28 since we aren't allowed to 
pull latest?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (TIKA-3164) Upgrade to POI 5.0.0 when available

2021-12-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458428#comment-17458428
 ] 

Tim Allison commented on TIKA-3164:
---

I'm still not able to get the bundle to work; see the TIKA-3164-v2 branch.

I'm now getting two exceptions, one in the ForkParser test, one in the poi 
bundle test.

ForkParser test

{noformat}
java.lang.NoClassDefFoundError: 
org/apache/logging/log4j/spi/LoggerContextFactory
[ERROR] at 
org.apache.poi.openxml4j.util.ZipSecureFile.(ZipSecureFile.java:37)
[ERROR] at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.(OOXMLParser.java:103)
[ERROR] at sun.misc.Unsafe.ensureClassInitialized(Native Method)
[ERROR] at 
sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
[ERROR] at 
sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:156)
[ERROR] at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1088)
[ERROR] at java.lang.reflect.Field.getFieldAccessor(Field.java:1069)
[ERROR] at java.lang.reflect.Field.getLong(Field.java:611)
{noformat}

and testPoiTikaBundle
{noformat}
java.lang.RuntimeException: XPathFactory#newInstance() failed to create an 
XPathFactory for the default object model: http://java.sun.com/jaxp/xpath/dom 
with the XPathFactoryConfigurationException: 
javax.xml.xpath.XPathFactoryConfigurationException: No XPathFctory 
implementation found for the object model: http://java.sun.com/jaxp/xpath/dom
at org.apache.tika.bundle.BundleIT.testPoiTikaBundle(BundleIT.java:313)
{noformat}


> Upgrade to POI 5.0.0 when available
> ---
>
> Key: TIKA-3164
> URL: https://issues.apache.org/jira/browse/TIKA-3164
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3615) Missing class file while upgrade to TIka 2.1.0

2021-12-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458365#comment-17458365
 ] 

Tim Allison commented on TIKA-3615:
---

ZipContainerDetector is an interface in 2.x.  Try DefaultZipContainerDetector, 
which in turn loads: 

{noformat}
org.apache.tika.detect.zip.IPADetector
org.apache.tika.detect.zip.JarDetector
org.apache.tika.detect.zip.KMZDetector
org.apache.tika.detect.zip.OpenDocumentDetector
org.apache.tika.detect.zip.StarOfficeDetector
{noformat}

> Missing class file while upgrade to TIka 2.1.0
> --
>
> Key: TIKA-3615
> URL: https://issues.apache.org/jira/browse/TIKA-3615
> Project: Tika
>  Issue Type: Test
>  Components: detector, parser
>Affects Versions: 2.1.0
>Reporter: Vamsi Molli
>Priority: Major
>
> Class does not find an exception for following :
> 
> 
> 
> 
> 
> 
> 
> 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (TIKA-3615) Missing class file while upgrade to TIka 2.1.0

2021-12-13 Thread Vamsi Molli (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458348#comment-17458348
 ] 

Vamsi Molli edited comment on TIKA-3615 at 12/13/21, 12:09 PM:
---

Added this still seeing same error.

group: 'org.apache.tika', name: 'tika-parser-zip-commons', version: '2.1.0'


was (Author: vamsi452):
Added this still seeing same error

> Missing class file while upgrade to TIka 2.1.0
> --
>
> Key: TIKA-3615
> URL: https://issues.apache.org/jira/browse/TIKA-3615
> Project: Tika
>  Issue Type: Test
>  Components: detector, parser
>Affects Versions: 2.1.0
>Reporter: Vamsi Molli
>Priority: Major
>
> Class does not find an exception for following :
> 
> 
> 
> 
> 
> 
> 
> 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3615) Missing class file while upgrade to TIka 2.1.0

2021-12-13 Thread Vamsi Molli (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458348#comment-17458348
 ] 

Vamsi Molli commented on TIKA-3615:
---

Added this still seeing same error

> Missing class file while upgrade to TIka 2.1.0
> --
>
> Key: TIKA-3615
> URL: https://issues.apache.org/jira/browse/TIKA-3615
> Project: Tika
>  Issue Type: Test
>  Components: detector, parser
>Affects Versions: 2.1.0
>Reporter: Vamsi Molli
>Priority: Major
>
> Class does not find an exception for following :
> 
> 
> 
> 
> 
> 
> 
> 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3615) Missing class file while upgrade to TIka 2.1.0

2021-12-13 Thread Vamsi Molli (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458335#comment-17458335
 ] 

Vamsi Molli commented on TIKA-3615:
---

[~tilman]  Getting following error:

"Message": "Failed parsing Tika config. 
Error:org.apache.tika.exception.TikaConfigException: Unable to find a detector 
class: org.apache.tika.detect.zip.ZipContainerDetector",

> Missing class file while upgrade to TIka 2.1.0
> --
>
> Key: TIKA-3615
> URL: https://issues.apache.org/jira/browse/TIKA-3615
> Project: Tika
>  Issue Type: Test
>  Components: detector, parser
>Affects Versions: 2.1.0
>Reporter: Vamsi Molli
>Priority: Major
>
> Class does not find an exception for following :
> 
> 
> 
> 
> 
> 
> 
> 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3616) Upgrade log4j2

2021-12-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458328#comment-17458328
 ] 

Tim Allison commented on TIKA-3616:
---

I haven't heard any objections from fellow devs, so I'll roll the release 
candidate in the next few hours.  Voting will remain open for 72 hours, and 
then I can make the release.  This assumes no surprises.

> Upgrade log4j2
> --
>
> Key: TIKA-3616
> URL: https://issues.apache.org/jira/browse/TIKA-3616
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>
> RCE...might be difficult to trigger in Tika, but why ask for a PoC...
> This only affects 2.x.  We were still using the old log4j in 1.x



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: log4j2 rce and next 2.x release?

2021-12-13 Thread Tim Allison
Or, I should have said: yes if you're using pipes and/or async.

On Sun, Dec 12, 2021 at 4:59 AM Cristian Zamfir  wrote:
>
> Thanks Tim,
> Sounds good. Just checking, I suppose this option needs to be added
> explicitly to ,  and  to override the default
> settings, even if these are not specified at all in tikaConfig.xml, is that
> right?
>
>
>
> On Sat, Dec 11, 2021 at 2:05 PM Tim Allison  wrote:
>
> > Cristian,
> >   Until the next release, you can add: -Dlog4j2.formatMsgNoLookups=true.
> >
> > If you're running Tika server in 1.x with spawnChild mode, add
> > -JDlog4j2.formatMsgNoLookups=true
> > In 2.x add -Dlog4j2.formatMsgNoLookups=true to the forkedJvmArgs
> > element in the ,  and  elements in
> > tikaConfig.xml
> >
> > On Sat, Dec 11, 2021 at 3:42 AM Cristian Zamfir 
> > wrote:
> > >
> > > It would be great to also update the Docker containers, it is a critical
> > > vulnerability IMO. Thanks!
> > >
> > >
> > > On Fri, Dec 10, 2021 at 5:41 PM Tim Allison  wrote:
> > >
> > > > All,
> > > >   As you've probably heard, a dire rce was recently announced in
> > > > log4j2.  I suspect it would be fairly easy to develop a PoC to show
> > > > that we're vulnerable.  It isn't as straightforward as webapps that
> > > > are logging direct user input, but I don't think it would take much.
> > > >   Should we push for a 2.x release in the next few days?
> > > >
> > > >   Best,
> > > >
> > > >  Tim
> > > >
> > > --
> > > Cristian Zamfir
> > > Co-founder/VP of Reliability and Security - Cyberhaven
> > > https://cyberhaven.com
> > > https://www.linkedin.com/in/cristizamfir/
> > > Mobile: +41 (798) 241-698 / +1 (617) 651-1306
> >


[jira] [Commented] (TIKA-3616) Upgrade log4j2

2021-12-13 Thread Abhijit Rajwade (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458246#comment-17458246
 ] 

Abhijit Rajwade commented on TIKA-3616:
---

What is release date of Version 2.1.1?

> Upgrade log4j2
> --
>
> Key: TIKA-3616
> URL: https://issues.apache.org/jira/browse/TIKA-3616
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
> Fix For: 2.1.1
>
>
> RCE...might be difficult to trigger in Tika, but why ask for a PoC...
> This only affects 2.x.  We were still using the old log4j in 1.x



--
This message was sent by Atlassian Jira
(v8.20.1#820001)