[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

Subasini Rath (JIRA) Tue, 15 Jan 2019 03:41:22 -0800


    [ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742999#comment-16742999
 ]


Subasini Rath commented on CONNECTORS-1563:
-------------------------------------------

Hi Karl,
   Hope you have seen the document I had sent you in email with screenshots. 
Yes, I did the following steps : 

1.In output connection, I clicked the button [Reset All associated Records] and 
started indexing from the beginning.
2. In path tab, I changed update handler to /update (default was 
(/update/extract)
3. In schema tab, unchecked the check box [Use the Extract Update Handler]. It 
forced me to give document length and content field name
4. copied the existing job and created a new one with same configuration
5. Run the new job.
6. In manifold,I can see website is getting crawled and documents are getting 
processed but nothing was appearing in Solr index.

Please guide.






Thanks & Regards,
Subasini Rath
O: +91-33 6636-8889 
M: +91 983-1234-341
Email: subasini.r...@endeavourenergy.com.au



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> -----------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1563
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
>             Project: ManifoldCF
>          Issue Type: Task
>          Components: Lucene/SOLR connector
>            Reporter: Sneha
>            Assignee: Karl Wright
>            Priority: Major
>         Attachments: managed-schema, manifold settings.docx, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

Reply via email to