[jira] [Comment Edited] (TIKA-3829) java.lang.IllegalArgumentException: The document is really a XLS file exception while parsing doc file

2022-08-05 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575883#comment-17575883
 ] 

Tim Allison edited comment on TIKA-3829 at 8/5/22 2:47 PM:
---

You can exclude parsers and exclude specific mime types from parsers via 
tika-config.  See: https://tika.apache.org/2.4.1/configuring.html

I'm not sure how that would help you.

You can also turn off this logging via configuration of log4j2.xml.


was (Author: talli...@mitre.org):
You can exclude parsers and exclude specific mime types from parsers via 
tika-config.  See: https://tika.apache.org/2.4.1/configuring.html

> java.lang.IllegalArgumentException: The document is really a XLS file 
> exception while parsing doc file
> --
>
> Key: TIKA-3829
> URL: https://issues.apache.org/jira/browse/TIKA-3829
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.23
>Reporter: John
>Priority: Major
>
> Getting following exception while parsing doc file:
> WARN  Ignoring unexpected exception while parsing summary entry 
> DocumentSummaryInformation
> java.lang.IllegalArgumentException: The document is really a XLS file
>     at 
> org.apache.poi.poifs.filesystem.DirectoryNode.getEntry(DirectoryNode.java:322)
>     at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:82)
>     at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:74)
>     at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:155)
>     at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  
> What is the meaning of this exception? when it will be thrown?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (TIKA-3829) java.lang.IllegalArgumentException: The document is really a XLS file exception while parsing doc file

2022-08-05 Thread John (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575607#comment-17575607
 ] 

John edited comment on TIKA-3829 at 8/5/22 7:01 AM:


Ok. Will check and get you back if we faced this problem again. 

 

There is any way in tika to exclude some file types from extracting content? It 
also should be excluded even if files are available inside embedded files.


was (Author: JIRAUSER292452):
Ok. Will check and get you back if we faced this problem again. 

 

There is any way in tika to exclude some file types from scanning? It also 
should be excluded even if files are available inside embedded files.

> java.lang.IllegalArgumentException: The document is really a XLS file 
> exception while parsing doc file
> --
>
> Key: TIKA-3829
> URL: https://issues.apache.org/jira/browse/TIKA-3829
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.23
>Reporter: John
>Priority: Major
>
> Getting following exception while parsing doc file:
> WARN  Ignoring unexpected exception while parsing summary entry 
> DocumentSummaryInformation
> java.lang.IllegalArgumentException: The document is really a XLS file
>     at 
> org.apache.poi.poifs.filesystem.DirectoryNode.getEntry(DirectoryNode.java:322)
>     at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:82)
>     at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:74)
>     at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:155)
>     at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  
> What is the meaning of this exception? when it will be thrown?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)