[ 
https://issues.apache.org/jira/browse/SOLR-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620437#comment-14620437
 ] 

Tim Allison edited comment on SOLR-7764 at 7/9/15 12:41 PM:
------------------------------------------------------------

1) Right.  That's the problem with how Tika is currently being used within DIH. 
 If it hangs, you'll never get an exception.  If the xlsx file is causing the 
hang and given the vintage of Tika you're using, it might be a custom fraction 
format (TIKA-1132)???

2) ummm...the pdcidfont issue sounds like a pdf problem, not an Excel problem.  
Does the excel file have an embedded PDF?  Will send email privately.

3) I can't quite tell what behavior you'd like.  Please give more info.

As a side note, for debugging purposes, you might try grabbing the relevant 
version of tika-app, and dropping potential problem files into that. If it 
hangs, you've found your problem.

Another option is to run tika-app ( >= 1.8) in batch mode against an input 
directory.  If your logging is set up correctly, you'll be able to tell which 
file caused the hang.  The commandline for that is: java -jar tika-app-xx.jar 
-i <inputdir> -o <outputdir>, but see the tika-batch 
[wiki|http://wiki.apache.org/tika/TikaBatchUsage] for advanced usage on 
configuring logging.  (well see it in about 10 minutes after I update it. ;) )


was (Author: talli...@mitre.org):
1) Right.  That's the problem with how Tika is currently being used within DIH. 
 If it hangs, you'll never get an exception.  If the xlsx file is causing the 
hang and given the vintage of Tika you're using, it might be a custom fraction 
format (TIKA-1132)???

2) ummm...the pdcidfont issue sounds like a pdf problem, not an Excel problem.  
Does the excel file have an embedded PDF?  Will send email privately.

3) I can't quite tell what behavior you'd like.  Please give more info.

> Solr indexing hangs if encounters an certain XML parse error
> ------------------------------------------------------------
>
>                 Key: SOLR-7764
>                 URL: https://issues.apache.org/jira/browse/SOLR-7764
>             Project: Solr
>          Issue Type: Bug
>          Components: query parsers
>    Affects Versions: 4.7.2
>         Environment: Ubuntu 12.04.5 LTS
>            Reporter: Sorin Gheorghiu
>              Labels: indexing
>         Attachments: Solr_XML_parse_error_080715.txt
>
>
> BlueSpice (http://bluespice.com/) uses Solr to index documents for the 
> 'Extended search' feature.
> Solr hangs if during indexing certain error occurs:
> 8.7.2015 15:34:26
> ERROR
> SolrCore
> org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: XML parse error
> 8.7.2015 15:34:26
> ERROR
> SolrDispatchFilter
> null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: XML parse error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to