[ 
https://issues.apache.org/jira/browse/TIKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mbiso updated TIKA-4494:
------------------------
    Description: 
Hi.
On my tika server Apache Tika 3.2.3 Server, I obtain many errors like:
{code:java}
ERROR [qtp131037934-61] 10:44:03,903 
org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST 
index '1356 '

java.lang.NumberFormatException: For input string: "1356 "

        at 
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
 {code}
This cause a restart of the child.

This is my tika-config.xml
{code:java}

<properties>
  <server>
    <taskTimeoutMillis>120000</taskTimeoutMillis>
    <minimumTimeoutMillis>10</minimumTimeoutMillis>
    <port>9998</port>
    <maxFiles>20000</maxFiles>
    <forkedJvmArgs>
      <arg>-Xmx2g</arg>
    </forkedJvmArgs>
<!-- commento in data 17012025 poichè errori WARNING: 
jakarta.ws.rs.ClientErrorException: HTTP 406 Not Acceptable e
 TIKASERVERERROR
    <endpoints>
      <endpoint>rmeta</endpoint>
      <endpoint>status</endpoint>
    </endpoints>
-->
  </server>
    <!-- Enable dynamic service loading -->
    <service-loader dynamic="true"/>    <!-- Load error handling settings -->
    <service-loader loadErrorHandler="WARN"/>    <parsers>
        <!-- Default Parser Configuration -->
        <parser class="org.apache.tika.parser.DefaultParser">
            <!-- Exclude parsers that are not needed to reduce processing time 
-->
            <parser-exclude 
class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
            <parser-exclude 
class="org.apache.tika.parser.microsoft.OfficeParser"/> <!-- Example exclusion 
-->
            <params>
<!-- inserisco in data 17012025 in byteArrayMaxOverride -->
                <param name="byteArrayMaxOverride" type="int">30000000</param>  
                <param name="suppressExceptions" type="bool">true</param>
                <param name="ignoreTikaErrors" type="bool">true</param>
            </params>
        </parser>        <!-- Specific Parser Configurations -->
        <parser class="org.apache.tika.parser.microsoft.ooxml.OOXMLParser">
            <params>
                <param name="includeShapeBasedContent" type="bool">false</param>
                <param name="suppressExceptions" type="bool">true</param>
            </params>
        </parser>        <parser class="org.apache.tika.parser.pdf.PDFParser">
            <params>
                <param name="pdfbox.enableAutoSpace" type="bool">true</param>
                <param name="suppressExceptions" type="bool">true</param>
            </params>
        </parser>        <!-- Additional parsers can be added here as needed -->
    </parsers>    <!-- Detectors configuration (if needed) -->
    <detectors>
        <detector class="org.apache.tika.detect.DefaultDetector"/>
        <!-- Customize detectors if necessary -->
    </detectors>
</properties>
{code}
This cause an interruption of my ManifoldCF job because it was working with 
that child, so the job ends with:
{code:java}
// Error: Repeated service interruptions - failure processing document: The 
target server failed to respond {code}
The target server is Tika.

How could I get over this with a workaround?

Thanks a lot

Mario Bisonti

  was:
Hi.
On my tika server Apache Tika 3.2.3 Server, I obtain many:
{code:java}
// ERROR [qtp131037934-61] 10:44:03,903 
org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST 
index '1356 '

java.lang.NumberFormatException: For input string: "1356 "

        at 
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
 {code}
This cause a restart of the child.

This is my tika-config.xml
{code:java}
// <properties>
  <server>
    <taskTimeoutMillis>120000</taskTimeoutMillis>
    <minimumTimeoutMillis>10</minimumTimeoutMillis>
    <port>9998</port>
    <maxFiles>20000</maxFiles>
    <forkedJvmArgs>
      <arg>-Xmx2g</arg>
    </forkedJvmArgs>
<!-- commento in data 17012025 poichè errori WARNING: 
jakarta.ws.rs.ClientErrorException: HTTP 406 Not Acceptable e
 TIKASERVERERROR
    <endpoints>
      <endpoint>rmeta</endpoint>
      <endpoint>status</endpoint>
    </endpoints>
-->
  </server>
    <!-- Enable dynamic service loading -->
    <service-loader dynamic="true"/>    <!-- Load error handling settings -->
    <service-loader loadErrorHandler="WARN"/>    <parsers>
        <!-- Default Parser Configuration -->
        <parser class="org.apache.tika.parser.DefaultParser">
            <!-- Exclude parsers that are not needed to reduce processing time 
-->
            <parser-exclude 
class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
            <parser-exclude 
class="org.apache.tika.parser.microsoft.OfficeParser"/> <!-- Example exclusion 
-->
            <params>
<!-- inserisco in data 17012025 in byteArrayMaxOverride -->
                <param name="byteArrayMaxOverride" type="int">30000000</param>  
                <param name="suppressExceptions" type="bool">true</param>
                <param name="ignoreTikaErrors" type="bool">true</param>
            </params>
        </parser>        <!-- Specific Parser Configurations -->
        <parser class="org.apache.tika.parser.microsoft.ooxml.OOXMLParser">
            <params>
                <param name="includeShapeBasedContent" type="bool">false</param>
                <param name="suppressExceptions" type="bool">true</param>
            </params>
        </parser>        <parser class="org.apache.tika.parser.pdf.PDFParser">
            <params>
                <param name="pdfbox.enableAutoSpace" type="bool">true</param>
                <param name="suppressExceptions" type="bool">true</param>
            </params>
        </parser>        <!-- Additional parsers can be added here as needed -->
    </parsers>    <!-- Detectors configuration (if needed) -->
    <detectors>
        <detector class="org.apache.tika.detect.DefaultDetector"/>
        <!-- Customize detectors if necessary -->
    </detectors>
</properties>
{code}
This cause an interruption of my ManifoldCF job because it was working with 
that child, so the job ends with:
{code:java}
// Error: Repeated service interruptions - failure processing document: The 
target server failed to respond {code}
The target server is Tika.

How could I get over this with a workaround?

Thanks a lot

Mario Bisonti


> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST 
> index 
> ---------------------------------------------------------------------------------
>
>                 Key: TIKA-4494
>                 URL: https://issues.apache.org/jira/browse/TIKA-4494
>             Project: Tika
>          Issue Type: Wish
>          Components: tika-server
>    Affects Versions: 3.2.3
>            Reporter: mbiso
>            Priority: Major
>
> Hi.
> On my tika server Apache Tika 3.2.3 Server, I obtain many errors like:
> {code:java}
> ERROR [qtp131037934-61] 10:44:03,903 
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST 
> index '1356 '
> java.lang.NumberFormatException: For input string: "1356 "
>         at 
> java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
>  {code}
> This cause a restart of the child.
> This is my tika-config.xml
> {code:java}
> <properties>
>   <server>
>     <taskTimeoutMillis>120000</taskTimeoutMillis>
>     <minimumTimeoutMillis>10</minimumTimeoutMillis>
>     <port>9998</port>
>     <maxFiles>20000</maxFiles>
>     <forkedJvmArgs>
>       <arg>-Xmx2g</arg>
>     </forkedJvmArgs>
> <!-- commento in data 17012025 poichè errori WARNING: 
> jakarta.ws.rs.ClientErrorException: HTTP 406 Not Acceptable e
>  TIKASERVERERROR
>     <endpoints>
>       <endpoint>rmeta</endpoint>
>       <endpoint>status</endpoint>
>     </endpoints>
> -->
>   </server>
>     <!-- Enable dynamic service loading -->
>     <service-loader dynamic="true"/>    <!-- Load error handling settings -->
>     <service-loader loadErrorHandler="WARN"/>    <parsers>
>         <!-- Default Parser Configuration -->
>         <parser class="org.apache.tika.parser.DefaultParser">
>             <!-- Exclude parsers that are not needed to reduce processing 
> time -->
>             <parser-exclude 
> class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
>             <parser-exclude 
> class="org.apache.tika.parser.microsoft.OfficeParser"/> <!-- Example 
> exclusion -->
>             <params>
> <!-- inserisco in data 17012025 in byteArrayMaxOverride -->
>                 <param name="byteArrayMaxOverride" 
> type="int">30000000</param>                  <param name="suppressExceptions" 
> type="bool">true</param>
>                 <param name="ignoreTikaErrors" type="bool">true</param>
>             </params>
>         </parser>        <!-- Specific Parser Configurations -->
>         <parser class="org.apache.tika.parser.microsoft.ooxml.OOXMLParser">
>             <params>
>                 <param name="includeShapeBasedContent" 
> type="bool">false</param>
>                 <param name="suppressExceptions" type="bool">true</param>
>             </params>
>         </parser>        <parser class="org.apache.tika.parser.pdf.PDFParser">
>             <params>
>                 <param name="pdfbox.enableAutoSpace" type="bool">true</param>
>                 <param name="suppressExceptions" type="bool">true</param>
>             </params>
>         </parser>        <!-- Additional parsers can be added here as needed 
> -->
>     </parsers>    <!-- Detectors configuration (if needed) -->
>     <detectors>
>         <detector class="org.apache.tika.detect.DefaultDetector"/>
>         <!-- Customize detectors if necessary -->
>     </detectors>
> </properties>
> {code}
> This cause an interruption of my ManifoldCF job because it was working with 
> that child, so the job ends with:
> {code:java}
> // Error: Repeated service interruptions - failure processing document: The 
> target server failed to respond {code}
> The target server is Tika.
> How could I get over this with a workaround?
> Thanks a lot
> Mario Bisonti



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to