[jira] (TIKA-4166) dependency updates for Tika 3.0

2024-03-09 Thread Tilman Hausherr (Jira)


[ https://issues.apache.org/jira/browse/TIKA-4166 ]


Tilman Hausherr deleted comment on TIKA-4166:
---

was (Author: tilman):
I've reverted it and will investigate / fix this later. Seems to be a problem 
with angus-activation.

> dependency updates for Tika 3.0
> ---
>
> Key: TIKA-4166
> URL: https://issues.apache.org/jira/browse/TIKA-4166
> Project: Tika
>  Issue Type: Task
>  Components: build
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 3.0.0-BETA
>
>
> Separate ticket for updates for 3.0, especially those not found by dependabot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-03-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824975#comment-17824975
 ] 

Hudson commented on TIKA-4166:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1548 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1548/])
TIKA-4166: update jaxb and prevent convergence problem (tilman: 
[https://github.com/apache/tika/commit/0f077da2ac33d9fdd1320b339a097e9af51de983])
* (edit) tika-parent/pom.xml


> dependency updates for Tika 3.0
> ---
>
> Key: TIKA-4166
> URL: https://issues.apache.org/jira/browse/TIKA-4166
> Project: Tika
>  Issue Type: Task
>  Components: build
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 3.0.0-BETA
>
>
> Separate ticket for updates for 3.0, especially those not found by dependabot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-03-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824970#comment-17824970
 ] 

Hudson commented on TIKA-4166:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1547 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1547/])
TIKA-4166: revert jaxb update (tilman: 
[https://github.com/apache/tika/commit/d477bfd3b69cea5119ba6257bef796b92b81b70a])
* (edit) tika-parent/pom.xml


> dependency updates for Tika 3.0
> ---
>
> Key: TIKA-4166
> URL: https://issues.apache.org/jira/browse/TIKA-4166
> Project: Tika
>  Issue Type: Task
>  Components: build
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 3.0.0-BETA
>
>
> Separate ticket for updates for 3.0, especially those not found by dependabot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4208) OOM error in SAS7BDATParser

2024-03-09 Thread Nick Burch (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824965#comment-17824965
 ] 

Nick Burch commented on TIKA-4208:
--

I would expect that the json output version would need a bit more memory, as 
we'll have to hold all the content in memory before outputting instead of just 
streaming the text/html out as we go along. I wouldn't expect it to be 4gb vs 
32gb though!

Any ideas anyone? Is it possible we've got an extra layer (or 2?) of buffering 
above and beyond what we need for the {{-J}} option?

> OOM error in SAS7BDATParser
> ---
>
> Key: TIKA-4208
> URL: https://issues.apache.org/jira/browse/TIKA-4208
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 3.0.0-BETA
>Reporter: Gregory Lepore
>Priority: Minor
>
> For this ARC file:
> [https://eotarchive.s3.amazonaws.com/crawl-data/EOT-2004/segments/NARA-000/warc/NARA-PEOT-2004-20041019023240-02598-crawling008-c_NARA-PEOT-2004-20041019053819-01693-crawling007.archive.org.arc.gz]
> I'm getting an OOM error:
> Exception in thread "main" java.lang.OutOfMemoryError: Requested array size 
> exceeds VM limit 
>    at java.base/java.util.Arrays.copyOf(Arrays.java:3537) 
>    at 
> java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:228)
>  
>    at 
> java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:740)
>  
>    at java.base/java.lang.StringBuffer.append(StringBuffer.java:410) 
>    at java.base/java.io.StringWriter.write(StringWriter.java:99) 
>    at 
> org.apache.tika.sax.ToTextContentHandler.characters(ToTextContentHandler.java:96)
>  
>    at 
> org.apache.tika.sax.ToXMLContentHandler.writeEscaped(ToXMLContentHandler.java:229)
>  
>    at 
> org.apache.tika.sax.ToXMLContentHandler.characters(ToXMLContentHandler.java:154)
>  
>    at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>    at 
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:253)
>  
>    at 
> org.apache.tika.parser.RecursiveParserWrapper$RecursivelySecureContentHandler.characters(RecursiveParserWrapper.java:370)
>  
>    at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>    at 
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:253)
>  
>    at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>    at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>    at 
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:143)
>  
>    at 
> org.apache.tika.sax.SafeContentHandler.access$101(SafeContentHandler.java:47) 
>    at 
> org.apache.tika.sax.SafeContentHandler.lambda$new$0(SafeContentHandler.java:57)
>  
>    at 
> org.apache.tika.sax.SafeContentHandler$$Lambda$327/0x7f94a022d1a8.write(Unknown
>  Source) 
>    at 
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:106) 
>    at 
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:250)
>  
>    at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:270)
>  
>    at 
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:295)
>  
>    at 
> org.apache.tika.parser.sas.SAS7BDATParser.parse(SAS7BDATParser.java:146) 
>    at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) 
>    at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) 
>    at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:203) 
>    at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:153) 
>    at 
> org.apache.tika.parser.RecursiveParserWrapper$EmbeddedParserDecorator.parse(RecursiveParserWrapper.java:259)
>  
>    at 
> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:71) 
>    at 
> org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
>  
>    at 
> org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:455)
> when extracting JSON with both the app and server version of 3.0.0 BETA.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-03-09 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824953#comment-17824953
 ] 

Tilman Hausherr commented on TIKA-4166:
---

I've reverted it and will investigate / fix this later. Seems to be a problem 
with angus-activation.

> dependency updates for Tika 3.0
> ---
>
> Key: TIKA-4166
> URL: https://issues.apache.org/jira/browse/TIKA-4166
> Project: Tika
>  Issue Type: Task
>  Components: build
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 3.0.0-BETA
>
>
> Separate ticket for updates for 3.0, especially those not found by dependabot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-03-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824951#comment-17824951
 ] 

Hudson commented on TIKA-4166:
--

FAILURE: Integrated in Jenkins build Tika » tika-main-jdk11 #1546 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1546/])
TIKA-4166: update jaxb (tilman: 
[https://github.com/apache/tika/commit/5f4e380ffc53cb7788df54e6ec875f9de7008b21])
* (edit) tika-parent/pom.xml


> dependency updates for Tika 3.0
> ---
>
> Key: TIKA-4166
> URL: https://issues.apache.org/jira/browse/TIKA-4166
> Project: Tika
>  Issue Type: Task
>  Components: build
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 3.0.0-BETA
>
>
> Separate ticket for updates for 3.0, especially those not found by dependabot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-03-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824946#comment-17824946
 ] 

Hudson commented on TIKA-4166:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1545 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1545/])
TIKA-4166: update aws (tilman: 
[https://github.com/apache/tika/commit/1dd99bf4516244b7b2f9ef0df18c4cf4f2135bd3])
* (edit) tika-parent/pom.xml


> dependency updates for Tika 3.0
> ---
>
> Key: TIKA-4166
> URL: https://issues.apache.org/jira/browse/TIKA-4166
> Project: Tika
>  Issue Type: Task
>  Components: build
>Reporter: Tilman Hausherr
>Priority: Minor
> Fix For: 3.0.0-BETA
>
>
> Separate ticket for updates for 3.0, especially those not found by dependabot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824933#comment-17824933
 ] 

Hudson commented on TIKA-4199:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1544 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1544/])
TIKA-4199: revert "complete delegate class", field "in" is a dummy; remove 
workaround for commons-compress 1.26 (tilman: 
[https://github.com/apache/tika/commit/8b398201a969b952bfee3166cec1395ae409071b])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pkg-module/src/main/java/org/apache/tika/parser/pkg/PackageParser.java
TIKA-4199: adjust test results now that commons compress bug has been fixed 
(tilman: 
[https://github.com/apache/tika/commit/5b259d60a490699252ea582aaec02a3575e4f7ff])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/microsoft/ooxml/TruncatedOOXMLTest.java
TIKA-4199: update commons-compress (tilman: 
[https://github.com/apache/tika/commit/4d6acfc109f842421030e05c33794bc8090caebb])
* (edit) tika-parent/pom.xml


> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.9.2, 3.0.0
>
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-09 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved TIKA-4199.
---
Resolution: Fixed

Commons-Compress has been updated to 1.26.1, I have reverted the workaround and 
a change that wasn't helpful.

> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.9.2, 3.0.0
>
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-09 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reassigned TIKA-4199:
-

Assignee: Tilman Hausherr

> commons-compress 1.26.0 breaks Apache Tika 2.9.1
> 
>
> Key: TIKA-4199
> URL: https://issues.apache.org/jira/browse/TIKA-4199
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.9.1
>Reporter: Alexander Veit
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.9.2, 3.0.0
>
>
> An update to commons-compress 1.26.0 to fix CVE-2024-25710 and CVE-2024-26308 
> breaks Tika.
>  
> For more information see https://issues.apache.org/jira/browse/COMPRESS-661.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)