[ https://issues.apache.org/jira/browse/SOLR-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620437#comment-14620437 ]
Tim Allison edited comment on SOLR-7764 at 7/9/15 12:41 PM: ------------------------------------------------------------ 1) Right. That's the problem with how Tika is currently being used within DIH. If it hangs, you'll never get an exception. If the xlsx file is causing the hang and given the vintage of Tika you're using, it might be a custom fraction format (TIKA-1132)??? 2) ummm...the pdcidfont issue sounds like a pdf problem, not an Excel problem. Does the excel file have an embedded PDF? Will send email privately. 3) I can't quite tell what behavior you'd like. Please give more info. As a side note, for debugging purposes, you might try grabbing the relevant version of tika-app, and dropping potential problem files into that. If it hangs, you've found your problem. Another option is to run tika-app ( >= 1.8) in batch mode against an input directory. If your logging is set up correctly, you'll be able to tell which file caused the hang. The commandline for that is: java -jar tika-app-xx.jar -i <inputdir> -o <outputdir>, but see the tika-batch [wiki|http://wiki.apache.org/tika/TikaBatchUsage] for advanced usage on configuring logging. (well see it in about 10 minutes after I update it. ;) ) was (Author: talli...@mitre.org): 1) Right. That's the problem with how Tika is currently being used within DIH. If it hangs, you'll never get an exception. If the xlsx file is causing the hang and given the vintage of Tika you're using, it might be a custom fraction format (TIKA-1132)??? 2) ummm...the pdcidfont issue sounds like a pdf problem, not an Excel problem. Does the excel file have an embedded PDF? Will send email privately. 3) I can't quite tell what behavior you'd like. Please give more info. > Solr indexing hangs if encounters an certain XML parse error > ------------------------------------------------------------ > > Key: SOLR-7764 > URL: https://issues.apache.org/jira/browse/SOLR-7764 > Project: Solr > Issue Type: Bug > Components: query parsers > Affects Versions: 4.7.2 > Environment: Ubuntu 12.04.5 LTS > Reporter: Sorin Gheorghiu > Labels: indexing > Attachments: Solr_XML_parse_error_080715.txt > > > BlueSpice (http://bluespice.com/) uses Solr to index documents for the > 'Extended search' feature. > Solr hangs if during indexing certain error occurs: > 8.7.2015 15:34:26 > ERROR > SolrCore > org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: XML parse error > 8.7.2015 15:34:26 > ERROR > SolrDispatchFilter > null:org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: XML parse error -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org