[jira] [Comment Edited] (TIKA-3829) java.lang.IllegalArgumentException: The document is really a XLS file exception while parsing doc file

2022-08-05 Thread John (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575607#comment-17575607
 ] 

John edited comment on TIKA-3829 at 8/5/22 7:01 AM:


Ok. Will check and get you back if we faced this problem again. 

 

There is any way in tika to exclude some file types from extracting content? It 
also should be excluded even if files are available inside embedded files.


was (Author: JIRAUSER292452):
Ok. Will check and get you back if we faced this problem again. 

 

There is any way in tika to exclude some file types from scanning? It also 
should be excluded even if files are available inside embedded files.

> java.lang.IllegalArgumentException: The document is really a XLS file 
> exception while parsing doc file
> --
>
> Key: TIKA-3829
> URL: https://issues.apache.org/jira/browse/TIKA-3829
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.23
>Reporter: John
>Priority: Major
>
> Getting following exception while parsing doc file:
> WARN  Ignoring unexpected exception while parsing summary entry 
> DocumentSummaryInformation
> java.lang.IllegalArgumentException: The document is really a XLS file
>     at 
> org.apache.poi.poifs.filesystem.DirectoryNode.getEntry(DirectoryNode.java:322)
>     at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:82)
>     at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:74)
>     at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:155)
>     at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  
> What is the meaning of this exception? when it will be thrown?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3829) java.lang.IllegalArgumentException: The document is really a XLS file exception while parsing doc file

2022-08-05 Thread John (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575607#comment-17575607
 ] 

John commented on TIKA-3829:


Ok. Will check and get you back if we faced this problem again. 

 

There is any way in tika to exclude some file types from scanning? It also 
should be excluded even if files are available inside embedded files.

> java.lang.IllegalArgumentException: The document is really a XLS file 
> exception while parsing doc file
> --
>
> Key: TIKA-3829
> URL: https://issues.apache.org/jira/browse/TIKA-3829
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.23
>Reporter: John
>Priority: Major
>
> Getting following exception while parsing doc file:
> WARN  Ignoring unexpected exception while parsing summary entry 
> DocumentSummaryInformation
> java.lang.IllegalArgumentException: The document is really a XLS file
>     at 
> org.apache.poi.poifs.filesystem.DirectoryNode.getEntry(DirectoryNode.java:322)
>     at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:82)
>     at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:74)
>     at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:155)
>     at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  
> What is the meaning of this exception? when it will be thrown?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)