[jira] [Commented] (TIKA-4137) Building current Tika main branch fails under Java 20/21

2024-05-15 Thread Roberto Franchini (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846672#comment-17846672
 ] 

Roberto Franchini commented on TIKA-4137:
-

Could you please backport this small fix on 2.9.x ? 

We are on 2.9.x and  moving to Java 21 from 17.

> Building current Tika main branch fails under Java 20/21
> 
>
> Key: TIKA-4137
> URL: https://issues.apache.org/jira/browse/TIKA-4137
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.0.0-BETA
>Reporter: Thorsten Heit
>Priority: Major
> Fix For: 3.0.0-BETA
>
> Attachments: org.apache.tika.server.core.StackTraceOffTest.txt, 
> org.apache.tika.server.core.StackTraceTest.txt, 
> org.apache.tika.server.core.TikaResourceFetcherTest.txt, 
> org.apache.tika.server.core.TikaResourceTest.txt
>
>
> When I execute "mvn verify" on the current main branch using  Java 11 or Java 
> 17, the build completes. With Java 20 and 21 the same command fails because 
> now a couple of JUnit tests in tika-server-core fail:
> {noformat}
> (...)
> [INFO] Running org.apache.tika.server.core.StackTraceTest
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.034 
> s <<< FAILURE! -- in org.apache.tika.server.core.StackTraceTest
> [ERROR] org.apache.tika.server.core.StackTraceTest.testEmptyParser -- Time 
> elapsed: 0.007 s <<< FAILURE!
> org.opentest4j.AssertionFailedError: bad type: /tika ==> expected: <200> but 
> was: <500>
>   at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>   at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>   at 
> org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
>   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
>   at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:559)
>   at 
> org.apache.tika.server.core.StackTraceTest.testEmptyParser(StackTraceTest.java:132)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:580)
>   at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
>   at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
> WARN  [main] 21:28:26,651 org.apache.tika.pipes.PipesServer received -1 from 
> client; shutting down
> ERROR [main] 21:28:26,652 org.apache.tika.pipes.PipesServer exiting: 1
> [INFO] 
> [INFO] Results:
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   StackTraceOffTest.testEmptyParser:137 bad type: /tika ==> expected: 
> <200> but was: <500>
> [ERROR]   StackTraceTest.testEmptyParser:132 bad type: /tika ==> expected: 
> <200> but was: <500>
> [ERROR]   
> TikaResourceFetcherTest.testHeader:101->CXFTestBase.assertContains:66 hello 
> world not found in:
>  xmlns="http://www.w3.org/1999/xhtml";>
> 
> 
> 
> 
> 
>  content="org.apache.tika.parser.DefaultParser"/>
> 
>  content="org.apache.tika.parser.mock.MockParser"/>
> 
> 
> 
> 
> ==> expected:  but was: 
> [ERROR]   
> TikaResourceFetcherTest.testQueryPart:109->CXFTestBase.assertContains:66 
> hello world not found in:
>  xmlns="http://www.w3.org/1999/xhtml";>
> 
> 
> 
> 
> 
>  content="org.apache.tika.parser.DefaultParser"/>
> 
>  content="org.apache.tika.parser.mock.MockParser"/>
> 
> 
> 
> 
> ==> expected:  but was: 
> [ERROR]   TikaResourceTest.testHeaders:91->CXFTestBase.assertContains:66 
>  not found in:
>  xmlns="http://www.w3.org/1999/xhtml";>
> 
> 
> 
> 
> 
>  content="org.apache.tika.parser.DefaultParser"/>
> 
>  content="org.apache.tika.parser.mock.MockParser"/>
> 
> 
> 
>  content="R5FG5V2U44YXOZTMKGVNTTSPGLF2JH ==> expected:  but was: 
> [ERROR]   
> TikaResourceTest.testNoWriteLimitOnStreamingWrite:187->CXFTestBase.assertContains:66
>  separation. not found in:
> http://www.w3.org/1999/xhtml";>
> 
> 
>  content="org.apache.tika.parser.DefaultParser"/>
>  content="org.apache.tika.parser.mock.MockParser"/>
> 
>  content="AQWEMUMSJVFZWYGM4TKXRTQ5Q436X4DN"/>
> 
> 

[jira] [Updated] (TIKA-4137) Building current Tika main branch fails under Java 20/21

2024-05-15 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-4137:
--
Fix Version/s: 2.9.3

> Building current Tika main branch fails under Java 20/21
> 
>
> Key: TIKA-4137
> URL: https://issues.apache.org/jira/browse/TIKA-4137
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.0.0-BETA
>Reporter: Thorsten Heit
>Priority: Major
> Fix For: 3.0.0-BETA, 2.9.3
>
> Attachments: org.apache.tika.server.core.StackTraceOffTest.txt, 
> org.apache.tika.server.core.StackTraceTest.txt, 
> org.apache.tika.server.core.TikaResourceFetcherTest.txt, 
> org.apache.tika.server.core.TikaResourceTest.txt
>
>
> When I execute "mvn verify" on the current main branch using  Java 11 or Java 
> 17, the build completes. With Java 20 and 21 the same command fails because 
> now a couple of JUnit tests in tika-server-core fail:
> {noformat}
> (...)
> [INFO] Running org.apache.tika.server.core.StackTraceTest
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.034 
> s <<< FAILURE! -- in org.apache.tika.server.core.StackTraceTest
> [ERROR] org.apache.tika.server.core.StackTraceTest.testEmptyParser -- Time 
> elapsed: 0.007 s <<< FAILURE!
> org.opentest4j.AssertionFailedError: bad type: /tika ==> expected: <200> but 
> was: <500>
>   at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>   at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>   at 
> org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
>   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
>   at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:559)
>   at 
> org.apache.tika.server.core.StackTraceTest.testEmptyParser(StackTraceTest.java:132)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:580)
>   at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
>   at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
> WARN  [main] 21:28:26,651 org.apache.tika.pipes.PipesServer received -1 from 
> client; shutting down
> ERROR [main] 21:28:26,652 org.apache.tika.pipes.PipesServer exiting: 1
> [INFO] 
> [INFO] Results:
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   StackTraceOffTest.testEmptyParser:137 bad type: /tika ==> expected: 
> <200> but was: <500>
> [ERROR]   StackTraceTest.testEmptyParser:132 bad type: /tika ==> expected: 
> <200> but was: <500>
> [ERROR]   
> TikaResourceFetcherTest.testHeader:101->CXFTestBase.assertContains:66 hello 
> world not found in:
>  xmlns="http://www.w3.org/1999/xhtml";>
> 
> 
> 
> 
> 
>  content="org.apache.tika.parser.DefaultParser"/>
> 
>  content="org.apache.tika.parser.mock.MockParser"/>
> 
> 
> 
> 
> ==> expected:  but was: 
> [ERROR]   
> TikaResourceFetcherTest.testQueryPart:109->CXFTestBase.assertContains:66 
> hello world not found in:
>  xmlns="http://www.w3.org/1999/xhtml";>
> 
> 
> 
> 
> 
>  content="org.apache.tika.parser.DefaultParser"/>
> 
>  content="org.apache.tika.parser.mock.MockParser"/>
> 
> 
> 
> 
> ==> expected:  but was: 
> [ERROR]   TikaResourceTest.testHeaders:91->CXFTestBase.assertContains:66 
>  not found in:
>  xmlns="http://www.w3.org/1999/xhtml";>
> 
> 
> 
> 
> 
>  content="org.apache.tika.parser.DefaultParser"/>
> 
>  content="org.apache.tika.parser.mock.MockParser"/>
> 
> 
> 
>  content="R5FG5V2U44YXOZTMKGVNTTSPGLF2JH ==> expected:  but was: 
> [ERROR]   
> TikaResourceTest.testNoWriteLimitOnStreamingWrite:187->CXFTestBase.assertContains:66
>  separation. not found in:
> http://www.w3.org/1999/xhtml";>
> 
> 
>  content="org.apache.tika.parser.DefaultParser"/>
>  content="org.apache.tika.parser.mock.MockParser"/>
> 
>  content="AQWEMUMSJVFZWYGM4TKXRTQ5Q436X4DN"/>
> 
> 

[jira] [Commented] (TIKA-4137) Building current Tika main branch fails under Java 20/21

2024-05-15 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846697#comment-17846697
 ] 

Tim Allison commented on TIKA-4137:
---

Y, done just now.

> Building current Tika main branch fails under Java 20/21
> 
>
> Key: TIKA-4137
> URL: https://issues.apache.org/jira/browse/TIKA-4137
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.0.0-BETA
>Reporter: Thorsten Heit
>Priority: Major
> Fix For: 3.0.0-BETA, 2.9.3
>
> Attachments: org.apache.tika.server.core.StackTraceOffTest.txt, 
> org.apache.tika.server.core.StackTraceTest.txt, 
> org.apache.tika.server.core.TikaResourceFetcherTest.txt, 
> org.apache.tika.server.core.TikaResourceTest.txt
>
>
> When I execute "mvn verify" on the current main branch using  Java 11 or Java 
> 17, the build completes. With Java 20 and 21 the same command fails because 
> now a couple of JUnit tests in tika-server-core fail:
> {noformat}
> (...)
> [INFO] Running org.apache.tika.server.core.StackTraceTest
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.034 
> s <<< FAILURE! -- in org.apache.tika.server.core.StackTraceTest
> [ERROR] org.apache.tika.server.core.StackTraceTest.testEmptyParser -- Time 
> elapsed: 0.007 s <<< FAILURE!
> org.opentest4j.AssertionFailedError: bad type: /tika ==> expected: <200> but 
> was: <500>
>   at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>   at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>   at 
> org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
>   at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
>   at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:559)
>   at 
> org.apache.tika.server.core.StackTraceTest.testEmptyParser(StackTraceTest.java:132)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:580)
>   at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
>   at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
> WARN  [main] 21:28:26,651 org.apache.tika.pipes.PipesServer received -1 from 
> client; shutting down
> ERROR [main] 21:28:26,652 org.apache.tika.pipes.PipesServer exiting: 1
> [INFO] 
> [INFO] Results:
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   StackTraceOffTest.testEmptyParser:137 bad type: /tika ==> expected: 
> <200> but was: <500>
> [ERROR]   StackTraceTest.testEmptyParser:132 bad type: /tika ==> expected: 
> <200> but was: <500>
> [ERROR]   
> TikaResourceFetcherTest.testHeader:101->CXFTestBase.assertContains:66 hello 
> world not found in:
>  xmlns="http://www.w3.org/1999/xhtml";>
> 
> 
> 
> 
> 
>  content="org.apache.tika.parser.DefaultParser"/>
> 
>  content="org.apache.tika.parser.mock.MockParser"/>
> 
> 
> 
> 
> ==> expected:  but was: 
> [ERROR]   
> TikaResourceFetcherTest.testQueryPart:109->CXFTestBase.assertContains:66 
> hello world not found in:
>  xmlns="http://www.w3.org/1999/xhtml";>
> 
> 
> 
> 
> 
>  content="org.apache.tika.parser.DefaultParser"/>
> 
>  content="org.apache.tika.parser.mock.MockParser"/>
> 
> 
> 
> 
> ==> expected:  but was: 
> [ERROR]   TikaResourceTest.testHeaders:91->CXFTestBase.assertContains:66 
>  not found in:
>  xmlns="http://www.w3.org/1999/xhtml";>
> 
> 
> 
> 
> 
>  content="org.apache.tika.parser.DefaultParser"/>
> 
>  content="org.apache.tika.parser.mock.MockParser"/>
> 
> 
> 
>  content="R5FG5V2U44YXOZTMKGVNTTSPGLF2JH ==> expected:  but was: 
> [ERROR]   
> TikaResourceTest.testNoWriteLimitOnStreamingWrite:187->CXFTestBase.assertContains:66
>  separation. not found in:
> http://www.w3.org/1999/xhtml";>
> 
> 
>  content="org.apache.tika.parser.DefaultParser"/>
>  content="org.apache.tika.parser.mock.MockParser"/>
> 
>  content="AQWEMUMSJVFZWYGM4TKXRTQ5Q436X4DN"/>
> 
> 

[jira] [Updated] (TIKA-1907) Big Pdf parsing to text - Out of memory

2024-05-15 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated TIKA-1907:
--
Fix Version/s: 3.0.0

> Big Pdf parsing to text - Out of memory
> ---
>
> Key: TIKA-1907
> URL: https://issues.apache.org/jira/browse/TIKA-1907
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.12
>Reporter: Nicolas Daniels
>Priority: Major
> Fix For: 3.0.0
>
>
> Linked to PDFBox issue: [https://issues.apache.org/jira/browse/PDFBOX-3284]
> I'm duplicating it here to make sure it will be fixed in Tika as well. Maybe 
> PDFBox is not the appropriate lib to use in such case.
> Trying to read the same PDF using Tika leads to the same problem:
> {code:title=Test.java|borderStyle=solid}
> @Test
> public void testParsePdf_Content_Memory() throws Exception {
> {
> InputStream inputStream = new 
> FileInputStream("c:/tmp/sr2015_mx_clearing_3dot0_mdr2_solution.pdf");
> try {
>  StringWriter writer = new StringWriter();
>FileWriter fileWriter = new FileWriter(new 
> File("c:/tmp/test.txt"));
>   BodyContentHandler handler = new BodyContentHandler(fileWriter);
>   Metadata metadata = new Metadata();
>   new PDFParser().parse(inputStream, handler, metadata, new 
> ParseContext());
>  fileWriter.close();
> } finally {
> inputStream.close();
> }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)