[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-14 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826996#comment-17826996
 ] 

Tilman Hausherr commented on TIKA-4199:
---

The original error you reported wasn't really a bug in commons compress, rather 
a change that more bytes were read than tika expected, see my first comment in 
COMPRESS-661. It resulted in several fixes in tika.

> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.9.2, 3.0.0
>
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-14 Thread Alexander Veit (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826992#comment-17826992
 ] 

Alexander Veit commented on TIKA-4199:
--

The same error also occurs with Tika 2.9.1 and commons-compress 1.26.1.

> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.9.2, 3.0.0
>
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824933#comment-17824933
 ] 

Hudson commented on TIKA-4199:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1544 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1544/])
TIKA-4199: revert "complete delegate class", field "in" is a dummy; remove 
workaround for commons-compress 1.26 (tilman: 
[https://github.com/apache/tika/commit/8b398201a969b952bfee3166cec1395ae409071b])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pkg-module/src/main/java/org/apache/tika/parser/pkg/PackageParser.java
TIKA-4199: adjust test results now that commons compress bug has been fixed 
(tilman: 
[https://github.com/apache/tika/commit/5b259d60a490699252ea582aaec02a3575e4f7ff])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/microsoft/ooxml/TruncatedOOXMLTest.java
TIKA-4199: update commons-compress (tilman: 
[https://github.com/apache/tika/commit/4d6acfc109f842421030e05c33794bc8090caebb])
* (edit) tika-parent/pom.xml


> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.9.2, 3.0.0
>
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-05 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823577#comment-17823577
 ] 

Hudson commented on TIKA-4199:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1540 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1540/])
TIKA-4199: add comment, print to stderr (tilman: 
[https://github.com/apache/tika/commit/32ef34ff49ccd6a8a7e595861216e6fdeded])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/pkg/Seven7ParserTest.java


> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Priority: Major
> Fix For: 2.9.2, 3.0.0
>
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819093#comment-17819093
 ] 

Hudson commented on TIKA-4199:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1520 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1520/])
TIKA-4199: replace deprecated (tilman: 
[https://github.com/apache/tika/commit/a305ab772277db6cdcbd60653e6cf1eb147a1df7])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pkg-module/src/main/java/org/apache/tika/parser/pkg/PackageParser.java


> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Priority: Major
> Fix For: 2.9.2, 3.0.0
>
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818937#comment-17818937
 ] 

Tilman Hausherr commented on TIKA-4199:
---

I tried an another solution
{code:java}
if (archive.markSupported())
{
archive = new ArchiveInputStreamWrapper(archive);
}
{code}
which also works. The wrapper delegates all except markSupported. I'll wait a 
few days if the commons compress people fix this. If not then I'll commit that 
solution.

> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Priority: Major
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818905#comment-17818905
 ] 

Hudson commented on TIKA-4199:
--

FAILURE: Integrated in Jenkins build Tika » tika-main-jdk11 #1517 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1517/])
TIKA-4199: complete delegate class (tilman: 
[https://github.com/apache/tika/commit/a3a830359f088f216ffaca31bf640e296d72531a])
* (edit) tika-core/src/main/java/org/apache/tika/io/BoundedInputStream.java


> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Priority: Major
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818877#comment-17818877
 ] 

Hudson commented on TIKA-4199:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1516 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1516/])
TIKA-4199: complete delegate class (tilman: 
[https://github.com/apache/tika/commit/e5d57528d92fe41bd2c7ba4545323e8b9cae4883])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pkg-module/src/main/java/org/apache/tika/parser/pkg/PackageParser.java


> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Priority: Major
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818871#comment-17818871
 ] 

Tim Allison commented on TIKA-4199:
---

I opened TIKA-4201 to add a hard limit to the read in the IWorksParser.

> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Priority: Major
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818867#comment-17818867
 ] 

Tilman Hausherr commented on TIKA-4199:
---

{quote}I'm not declaring this a problem with commons-compress!
{quote}
My bet was 51% it's with Tika but from the latest test code you inspired me to 
write in COMPRESS-661, it might be them or BufferedInputStream itself.

I also found another incomplete delegate class (BoundedInputStream), I'll 
complete that one too.

> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Priority: Major
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818853#comment-17818853
 ] 

Tim Allison commented on TIKA-4199:
---

As I look at the IWorkPackageParser and the detectType(), I think we should 
rework the mark/reset there. There's currently no hard limit on the number of 
bytes read when trying to extract the root element. So, it is entirely possible 
that more than the mark() value is read. I think we happened to get lucky 
earlier, and we're relying on the same luck by doubling the mark value.

> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Priority: Major
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818846#comment-17818846
 ] 

Tim Allison commented on TIKA-4199:
---

Thank you [~tilman] for working on this! I'm sorry I opened a duplicate ticket.

To confirm, the current workaround is to write each embedded file to disc 
instead of handling in memory --> {{tis.getPath()}}

If I have any time, I'll see if I can create a small reproducer for the 
commons-compress team that uses mark/reset on a wrapped ArchiveInputStream. To 
be clear, without looking further, I'm not declaring this a problem with 
commons-compress! :D


> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Priority: Major
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818823#comment-17818823
 ] 

Tilman Hausherr commented on TIKA-4199:
---

After merging I discovered that the SevenZWrapper class is incomplete 
(markSupported / mark / reset was missing, and many more). I tested reverting 
my one-line change, and some of the previously failing tests (e.g. the 7z 
tests) were now succeeding. So this kindof suggests that the cause is related 
to markSupported  / mark / reset. If we ever find that cause, then the one-line 
change in {{PackageParser}} can be removed because it makes things slower.

> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Priority: Major
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818774#comment-17818774
 ] 

Tilman Hausherr commented on TIKA-4199:
---

I'm working on it

https://github.com/apache/pdfbox/pull/180

> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Priority: Major
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)