[
https://issues.apache.org/jira/browse/TIKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated TIKA-4494:
----------------------------------
Description:
Hi.
On my tika server Apache Tika 3.2.3 Server, I obtain many errors like:
{code:java}
ERROR [qtp131037934-61] 10:44:03,903
org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST
index '1356 '
java.lang.NumberFormatException: For input string: "1356 "
at
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
{code}
This cause a restart of the child.
This is my tika-config.xml
{code:xml}
<properties>
<server>
<taskTimeoutMillis>120000</taskTimeoutMillis>
<minimumTimeoutMillis>10</minimumTimeoutMillis>
<port>9998</port>
<maxFiles>20000</maxFiles>
<forkedJvmArgs>
<arg>-Xmx2g</arg>
</forkedJvmArgs>
<!-- commento in data 17012025 poichè errori WARNING:
jakarta.ws.rs.ClientErrorException: HTTP 406 Not Acceptable e
TIKASERVERERROR
<endpoints>
<endpoint>rmeta</endpoint>
<endpoint>status</endpoint>
</endpoints>
-->
</server>
<!-- Enable dynamic service loading -->
<service-loader dynamic="true"/> <!-- Load error handling settings -->
<service-loader loadErrorHandler="WARN"/>
<parsers>
<!-- Default Parser Configuration -->
<parser class="org.apache.tika.parser.DefaultParser">
<!-- Exclude parsers that are not needed to reduce processing time
-->
<parser-exclude
class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
<parser-exclude
class="org.apache.tika.parser.microsoft.OfficeParser"/> <!-- Example exclusion
-->
<params>
<!-- inserisco in data 17012025 in byteArrayMaxOverride -->
<param name="byteArrayMaxOverride" type="int">30000000</param>
<param name="suppressExceptions" type="bool">true</param>
<param name="ignoreTikaErrors" type="bool">true</param>
</params>
</parser>
<!-- Specific Parser Configurations -->
<parser class="org.apache.tika.parser.microsoft.ooxml.OOXMLParser">
<params>
<param name="includeShapeBasedContent" type="bool">false</param>
<param name="suppressExceptions" type="bool">true</param>
</params>
</parser>
<parser class="org.apache.tika.parser.pdf.PDFParser">
<params>
<param name="pdfbox.enableAutoSpace" type="bool">true</param>
<param name="suppressExceptions" type="bool">true</param>
</params>
</parser>
<!-- Additional parsers can be added here as needed -->
</parsers>
<!-- Detectors configuration (if needed) -->
<detectors>
<detector class="org.apache.tika.detect.DefaultDetector"/>
<!-- Customize detectors if necessary -->
</detectors>
</properties>
{code}
This cause an interruption of my ManifoldCF job because it was working with
that child, so the job ends with:
{code:java}
// Error: Repeated service interruptions - failure processing document: The
target server failed to respond
{code}
The target server is Tika.
How could I get over this with a workaround?
Thanks a lot
Mario Bisonti
was:
Hi.
On my tika server Apache Tika 3.2.3 Server, I obtain many errors like:
{code:java}
ERROR [qtp131037934-61] 10:44:03,903
org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST
index '1356 '
java.lang.NumberFormatException: For input string: "1356 "
at
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
{code}
This cause a restart of the child.
This is my tika-config.xml
{code:java}
<properties>
<server>
<taskTimeoutMillis>120000</taskTimeoutMillis>
<minimumTimeoutMillis>10</minimumTimeoutMillis>
<port>9998</port>
<maxFiles>20000</maxFiles>
<forkedJvmArgs>
<arg>-Xmx2g</arg>
</forkedJvmArgs>
<!-- commento in data 17012025 poichè errori WARNING:
jakarta.ws.rs.ClientErrorException: HTTP 406 Not Acceptable e
TIKASERVERERROR
<endpoints>
<endpoint>rmeta</endpoint>
<endpoint>status</endpoint>
</endpoints>
-->
</server>
<!-- Enable dynamic service loading -->
<service-loader dynamic="true"/> <!-- Load error handling settings -->
<service-loader loadErrorHandler="WARN"/> <parsers>
<!-- Default Parser Configuration -->
<parser class="org.apache.tika.parser.DefaultParser">
<!-- Exclude parsers that are not needed to reduce processing time
-->
<parser-exclude
class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
<parser-exclude
class="org.apache.tika.parser.microsoft.OfficeParser"/> <!-- Example exclusion
-->
<params>
<!-- inserisco in data 17012025 in byteArrayMaxOverride -->
<param name="byteArrayMaxOverride" type="int">30000000</param>
<param name="suppressExceptions" type="bool">true</param>
<param name="ignoreTikaErrors" type="bool">true</param>
</params>
</parser> <!-- Specific Parser Configurations -->
<parser class="org.apache.tika.parser.microsoft.ooxml.OOXMLParser">
<params>
<param name="includeShapeBasedContent" type="bool">false</param>
<param name="suppressExceptions" type="bool">true</param>
</params>
</parser> <parser class="org.apache.tika.parser.pdf.PDFParser">
<params>
<param name="pdfbox.enableAutoSpace" type="bool">true</param>
<param name="suppressExceptions" type="bool">true</param>
</params>
</parser> <!-- Additional parsers can be added here as needed -->
</parsers> <!-- Detectors configuration (if needed) -->
<detectors>
<detector class="org.apache.tika.detect.DefaultDetector"/>
<!-- Customize detectors if necessary -->
</detectors>
</properties>
{code}
This cause an interruption of my ManifoldCF job because it was working with
that child, so the job ends with:
{code:java}
// Error: Repeated service interruptions - failure processing document: The
target server failed to respond {code}
The target server is Tika.
How could I get over this with a workaround?
Thanks a lot
Mario Bisonti
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST
> index
> ---------------------------------------------------------------------------------
>
> Key: TIKA-4494
> URL: https://issues.apache.org/jira/browse/TIKA-4494
> Project: Tika
> Issue Type: Wish
> Components: tika-server
> Affects Versions: 3.2.3
> Reporter: mbiso
> Priority: Major
>
> Hi.
> On my tika server Apache Tika 3.2.3 Server, I obtain many errors like:
> {code:java}
> ERROR [qtp131037934-61] 10:44:03,903
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST
> index '1356 '
> java.lang.NumberFormatException: For input string: "1356 "
> at
> java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
> {code}
> This cause a restart of the child.
> This is my tika-config.xml
> {code:xml}
> <properties>
> <server>
> <taskTimeoutMillis>120000</taskTimeoutMillis>
> <minimumTimeoutMillis>10</minimumTimeoutMillis>
> <port>9998</port>
> <maxFiles>20000</maxFiles>
> <forkedJvmArgs>
> <arg>-Xmx2g</arg>
> </forkedJvmArgs>
> <!-- commento in data 17012025 poichè errori WARNING:
> jakarta.ws.rs.ClientErrorException: HTTP 406 Not Acceptable e
> TIKASERVERERROR
> <endpoints>
> <endpoint>rmeta</endpoint>
> <endpoint>status</endpoint>
> </endpoints>
> -->
> </server>
> <!-- Enable dynamic service loading -->
> <service-loader dynamic="true"/> <!-- Load error handling settings -->
> <service-loader loadErrorHandler="WARN"/>
> <parsers>
> <!-- Default Parser Configuration -->
> <parser class="org.apache.tika.parser.DefaultParser">
> <!-- Exclude parsers that are not needed to reduce processing
> time -->
> <parser-exclude
> class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
> <parser-exclude
> class="org.apache.tika.parser.microsoft.OfficeParser"/> <!-- Example
> exclusion -->
> <params>
> <!-- inserisco in data 17012025 in byteArrayMaxOverride -->
> <param name="byteArrayMaxOverride" type="int">30000000</param>
> <param name="suppressExceptions" type="bool">true</param>
> <param name="ignoreTikaErrors" type="bool">true</param>
> </params>
> </parser>
> <!-- Specific Parser Configurations -->
> <parser class="org.apache.tika.parser.microsoft.ooxml.OOXMLParser">
> <params>
> <param name="includeShapeBasedContent"
> type="bool">false</param>
> <param name="suppressExceptions" type="bool">true</param>
> </params>
> </parser>
> <parser class="org.apache.tika.parser.pdf.PDFParser">
> <params>
> <param name="pdfbox.enableAutoSpace" type="bool">true</param>
> <param name="suppressExceptions" type="bool">true</param>
> </params>
> </parser>
> <!-- Additional parsers can be added here as needed -->
> </parsers>
> <!-- Detectors configuration (if needed) -->
> <detectors>
> <detector class="org.apache.tika.detect.DefaultDetector"/>
> <!-- Customize detectors if necessary -->
> </detectors>
> </properties>
> {code}
> This cause an interruption of my ManifoldCF job because it was working with
> that child, so the job ends with:
> {code:java}
> // Error: Repeated service interruptions - failure processing document: The
> target server failed to respond
> {code}
> The target server is Tika.
> How could I get over this with a workaround?
> Thanks a lot
> Mario Bisonti
--
This message was sent by Atlassian Jira
(v8.20.10#820010)