[ 
https://issues.apache.org/jira/browse/SOLR-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Rafalovitch closed SOLR-1847.
---------------------------------------
    Resolution: Cannot Reproduce

All components involved in this issue have been updated multiple times. If the 
problems still happens, the case can be reopened with new details or new case 
can be created.

> Solrj doesn't know if PDF was actually parsed by Tika
> -----------------------------------------------------
>
>                 Key: SOLR-1847
>                 URL: https://issues.apache.org/jira/browse/SOLR-1847
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 1.5
>         Environment: TOMCAT 6.0.24, SOLR 1.5Dev, Solrj1.5Dev Tika
>            Reporter: elsadek
>              Labels: Solr, Solrj, Tika, Tomcat6
>
> When posting pdf files using solrj the only response we get from Solr is only 
> server response status, but never know whether
> pdf was actually parsed or not, checking the log I found that  Tika wasn't 
> able
> to succeed with some pdf files because of content nature (texts in images 
> only) or are corrupted:
>     
>      25 mars 2010 14:54:07 org.apache.pdfbox.util.PDFStreamEngine 
> processOperator
>      INFO: unsupported/disabled operation: EI
>    
>      25 mars 2010 14:54:02 org.apache.pdfbox.filter.FlateFilter decode
>      GRAVE: Stop reading corrupt stream
> The question is how can I catch these kinds of exceptions through Solrj ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to