subject:"\[jira\] \[Commented\] \(CONNECTORS\-1563\) SolrException\: org.apache.tika.exception.ZeroByteFileException\: InputStream must have

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-28 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780373#comment-16780373
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Hi [~Subasini],

The "excluded mime types" that you set are meant to exclude documents 
*entirely*, so changing that setting has no effect on *how* documents are 
indexed.  You can look at the Simple History report to verify that this is 
taking place as you desire, because most connectors create a record when they 
reject a document for any reason.  The Web Connector is no exception.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, Manifold and Solr 
> settings_CustomField.docx, managed-schema, manifold settings.docx, 
> manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-28 Thread Subasini Rath (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780317#comment-16780317
 ] 

Subasini Rath commented on CONNECTORS-1563:
---

Hi Karl,
   After analysing I found that the below setting works fine if no MIMe type 
excluded in output connection.
I have to exclude MIME types like image/gif, image/jpg,text/css. I have 
specified them in the output connection-->documents tab-->Excluded MIME types.
When I removed, it worked fine.

Could you please suggest how it can work with excluded MIME types.





Thanks & Regards,
Subasini Rath
App Team Lead | TCS,Offshore
Mobile : +91-983-1234-341
M: subasini.r...@endeavourenergy.com.au | : www.endeavourenergy.com.au 
Planned Leave: 11th March – 13th March  
  



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, Manifold and Solr 
> settings_CustomField.docx, managed-schema, manifold settings.docx, 
> manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-20 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772995#comment-16772995
 ] 

Karl Wright commented on CONNECTORS-1563:
-

[~Subasini], we are trying to debug your setup.  The first principle of 
debugging is to identify where exactly the problem is occurring.  It eliminates 
one variable.  The file system connector is quite simple and has few 
configuration options, so it should be easy to set something up we can use to 
evaluate your solr connection.

Thanks,
Karl

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, Manifold and Solr 
> settings_CustomField.docx, managed-schema, manifold settings.docx, 
> manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-20 Thread Subasini Rath (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772955#comment-16772955
 ] 

Subasini Rath commented on CONNECTORS-1563:
---

Hi Karl,
It didn't clear my doubt why should I use File System Connector when my 
requirement is website crawling.



Thanks & Regards,
Subasini Rath
O: +91-33 6636-8889 
M: +91 983-1234-341
Email: subasini.r...@endeavourenergy.com.au



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, Manifold and Solr 
> settings_CustomField.docx, managed-schema, manifold settings.docx, 
> manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-20 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772912#comment-16772912
 ] 

Karl Wright commented on CONNECTORS-1563:
-

[~Subasini], the "error" is because it does not recognize a specific 
translation bundle for your language, so it defaults to English.  It is 
harmless.

I asked you to *try* working with a File System connection initially to narrow 
down where your problems were coming from.  Please do so.  [~shinichiro abe] 
and myself both tried a configuration similar to the one you report end of last 
year when we were debugging the 2.11 release of ManifoldCF.

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, Manifold and Solr 
> settings_CustomField.docx, managed-schema, manifold settings.docx, 
> manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-20 Thread Subasini Rath (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772827#comment-16772827
 ] 

Subasini Rath commented on CONNECTORS-1563:
---

Hi Shinichiro,
   My requirement is not to crawl a file system.
My requirement is to crawl a website. That is the reason I am using Web 
repository.

** - Also could you please let me know when manifold interacts with Solr , 
which field does it write the actual content of document without any metadata.


Didn't get your point on Simple history.
My solr log doesnot show any error.
Manifold log is as follows : 

==
ERROR 2019-02-18T21:19:25,484 (qtp1619356001-411) - Missing resource bundle 
'org.apache.manifoldcf.agents.output.solr.common' for locale 'en': Can't find 
bundle for base name org.apache.manifoldcf.agents.output.solr.common, locale 
en; trying en_US
java.util.MissingResourceException: Can't find bundle for base name 
org.apache.manifoldcf.agents.output.solr.common, locale en
at 
java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1573)
 ~[?:1.8.0_181]
at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1396) 
~[?:1.8.0_181]
at java.util.ResourceBundle.getBundle(ResourceBundle.java:1091) 
~[?:1.8.0_181]
at 
org.apache.manifoldcf.core.i18n.Messages.getResourceBundle(Messages.java:142) 
[mcf-core.jar:?]
at 
org.apache.manifoldcf.core.i18n.Messages.getMessage(Messages.java:178) 
[mcf-core.jar:?]
at 
org.apache.manifoldcf.core.i18n.Messages.getString(Messages.java:216) 
[mcf-core.jar:?]
at 
org.apache.manifoldcf.agents.output.solr.Messages.getString(Messages.java:91) 
[mcf-solr-connector.jar:?]
at 
org.apache.manifoldcf.agents.output.solr.Messages.getString(Messages.java:39) 
[mcf-solr-connector.jar:?]
at 
org.apache.manifoldcf.agents.output.solr.SolrConnector.outputConfigurationHeader(SolrConnector.java:637)
 [mcf-solr-connector.jar:?]
at 
org.apache.manifoldcf.core.interfaces.ConnectorFactory.outputThisConfigurationHeader(ConnectorFactory.java:71)
 [mcf-core.jar:?]
at 
org.apache.manifoldcf.agents.interfaces.OutputConnectorFactory.outputConfigurationHeader(OutputConnectorFactory.java:98)
 [mcf-agents.jar:?]
at org.apache.jsp.editoutput_jsp._jspService(editoutput_jsp.java:423) 
[jsp/:?]
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) 
[jasper-6.0.35.jar:6.0.35]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) 
[javax.servlet-api-3.1.0.jar:3.1.0]
at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388) 
[jasper-6.0.35.jar:6.0.35]
at 
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) 
[jasper-6.0.35.jar:6.0.35]
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) 
[jasper-6.0.35.jar:6.0.35]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) 
[javax.servlet-api-3.1.0.jar:3.1.0]
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) 
[jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905]
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) 
[jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905]
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) 
[jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) 
[jetty-security-9.2.3.v20140905.jar:9.2.3.v20140905]
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
 [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
 [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) 
[jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905]
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
 [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
 [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 
[jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
 [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
at 
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) 
[jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) 
[jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905]
at org.eclipse.jetty.server.

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-20 Thread Shinichiro Abe (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772774#comment-16772774
 ] 

Shinichiro Abe commented on CONNECTORS-1563:


Hi Subasini,
Did you try to use file system repository instead of web repository?
I did work with the configuration that is close to settings above though.
Also what did simple history complain to not work? 

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, Manifold and Solr 
> settings_CustomField.docx, managed-schema, manifold settings.docx, 
> manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-19 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772729#comment-16772729
 ] 

Karl Wright commented on CONNECTORS-1563:
-

In general in cases like this I recommend that people start with the simplest 
possible working configuration and then modify it until they achieve their 
goals.  In this case that would mean starting with a file system job and a 
freshly-installed Solr instance, with no other changes whatsoever.

[~shinichiro abe], can you help Mr. Rath by trying MCF 2.12 with a fresh 
single-process Solr instance, using the "/update" handler?  He claims that this 
does not work and I do not have any time to work with him for the next few 
weeks.  If it works for you please provide detailed steps describing what you 
did.  Thanks in advance!


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, Manifold and Solr 
> settings_CustomField.docx, managed-schema, manifold settings.docx, 
> manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-19 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771861#comment-16771861
 ] 

Karl Wright commented on CONNECTORS-1563:
-

In addition to unchecking the "use extracting update handler", you also need to 
change the "Server update handler" value from "/update/extract" to just plain 
"/update".


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, Manifold and Solr 
> settings_CustomField.docx, managed-schema, manifold settings.docx, 
> manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-18 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771663#comment-16771663
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Hi Subasini,

Are you now Tika-extracting in ManifoldCF, or in Solr?
The text field looks like it contains properly extracted content, along with 
other stuff you do not want.  Is this correct?

If the extraction is happening in Solr, then I have no idea what this is coming 
from.  If the extraction is happening in ManifoldCF, then if you have placed a 
Metadata Adjuster transformer in the pipeline between the Tika Extractor and 
the Solr Output Connector, I'd say you had set it up to concatenate many fields 
together into a text field.  The Metadata Adjuster has that ability.

The choice of how metadata (or content) fields get mapped to Solr schema is set 
up in your Solr output connection configuration.  The Tika extraction basically 
replaces a binary input document with a character-sequence output document plus 
metadata fields.  The character-sequence output document then must be sent to 
Solr not using the exracting update handler, but just the standard handler, so 
the handler should be changed from /update/extract to just /update, and the 
"Use extracting update handler" should be turned off.  The actual field name 
used for the extracted content body can also be changed, if desired, in the 
"Schema" part of the configuration.  But what is there by default works with 
Solr as it's set up by default.





> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-02-18 Thread Subasini Rath (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771607#comment-16771607
 ] 

Subasini Rath commented on CONNECTORS-1563:
---

Hi Karl,
Could you please guide me - to which field manifold writes the actual 
textual content of the document.

Currently I am using the _text_ field but it has been found that _text_ does 
not contain the actual data. Rather it added some extra values to the actual 
content.

In my managed-schema : 



After my indexing in Solr, the value looks like : (The first 4 lines are 
appended before the content of file)

"title":["NETWORK PLANNING\u"],
"_text_":[" \n \n stream_size 34070  \n X-Parsed-By 
org.apache.tika.parser.DefaultParser  \n X-Parsed-By 
org.apache.tika.parser.txt.TXTParser  \n stream_content_type application/pdf  
\n stream_name cs.exe?bmsdocid=9.2.1&func=eebms.docdownload  \n 
stream_source_info cs.exe?bmsdocid=9.2.1&func=eebms.docdownload  \n 
Content-Encoding UTF-8  \n resourceName 
cs.exe?bmsdocid=9.2.1&func=eebms.docdownload  \n Content-Type text/plain; 
charset=UTF-8  \n  \n \n  9.2.1 UNCONTROLLED IF PRINTED Page 1 of 13\nCompany 
Policy\nNETWORK\nDocument No Amendment No Approved By Approval Date Review 
Date\n: : : : :\n9.2.1 9 CEO 23/05/2016 23/05/2019\n9.2.1 NETWORK PLANNING\n1.0 
POLICY STATEMENT\nThe company will plan the expansion and augmentation of its 
electrical network to achieve levels of safety, reliability and quality of 
supply commensurate with community, regulator, customer and shareholder 
expectations.\nThe company will coordinate its planning with the NSW 
transmission utility Transgrid and neighbouring distribution utilities to 
develop effective solutions to satisfy load growth within the company’s supply 
area and in adjacent franchise areas where the company’s network has 
influence.\n2.0 PURPOSE\nTo provide principles for planning network



Thanks & Regards,
Subasini Rath
O: +91-33 6636-8889 
M: +91 983-1234-341
Email: subasini.r...@endeavourenergy.com.au



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-21 Thread Subasini Rath (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16747859#comment-16747859
 ] 

Subasini Rath commented on CONNECTORS-1563:
---

Hi Karl,
   I just removed one of the transformer connection and it worked fine for me.
So there may be some issue was happening by using same type of transformation 
connection.



Thanks & Regards,
Subasini Rath
O: +91-33 6636-8889 
M: +91 983-1234-341
Email: subasini.r...@endeavourenergy.com.au



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743033#comment-16743033
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Please also see this discussion:

https://issues.apache.org/jira/browse/CONNECTORS-1533



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743028#comment-16743028
 ] 

Karl Wright commented on CONNECTORS-1563:
-

First, I asked for the Simple History, not the manifoldcf logs.  What does the 
simple history say about document ingestions for the connection in question 
with the new configuration?

But, from your solr log:

{code}
2019-01-15 11:51:54.211 ERROR (qtp592617454-22) [   x:eesolr_webcrawler] 
o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: 
org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
{code}

Note that the stack trace is from the ExtractingDocumentLoader, which is Tika.  
You did not manage to actually change the output handler to the non-extracting 
one, possibly because you have configured your Solr in a non-default way.  I 
cannot debug that for you, sorry.

Can you do the following:  Download the current 7.x version of Solr, fresh, and 
extract it.  Start it using the standard provided simple scripts.  Point 
ManifoldCF at it and crawl some documents, using the setup for the connection I 
have described.  Does that work?  If it does, and I expect it to because that 
is what works for me here, then it is your job to figure out what you did to 
Solr to make that not work.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743006#comment-16743006
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Please include [INFO] messages from the Solr log for example indexing requests, 
and also include records from the Simple History for documents indexed with the 
new configuration.  Thanks.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, manifold settings.docx, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Subasini Rath (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742999#comment-16742999
 ] 

Subasini Rath commented on CONNECTORS-1563:
---

Hi Karl,
   Hope you have seen the document I had sent you in email with screenshots. 
Yes, I did the following steps : 

1.In output connection, I clicked the button [Reset All associated Records] and 
started indexing from the beginning.
2. In path tab, I changed update handler to /update (default was 
(/update/extract)
3. In schema tab, unchecked the check box [Use the Extract Update Handler]. It 
forced me to give document length and content field name
4. copied the existing job and created a new one with same configuration
5. Run the new job.
6. In manifold,I can see website is getting crawled and documents are getting 
processed but nothing was appearing in Solr index.

Please guide.






Thanks & Regards,
Subasini Rath
O: +91-33 6636-8889 
M: +91 983-1234-341
Email: subasini.r...@endeavourenergy.com.au



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, manifold settings.docx, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742949#comment-16742949
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Please view the Solr connection and click the button that tells it to forget 
about everything it has indexed.  That will force reindexing.  That's standard 
step when you change configuration like this and you want all documents to be 
reindexed.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, manifold settings.docx, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Subasini Rath (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742883#comment-16742883
 ] 

Subasini Rath commented on CONNECTORS-1563:
---

Hi Karl,

  Tried your suggestions in the below email but no luck.

Please find attached the screenshots for my manifold settings.

Could you please revisit once and let me know if I am missing something.

 

Also as per your suggestion - In the Solr output connection :

 tab [Paths] --- > I changed [update handler to /update instead of 
/update/extract .

 In [Schema] tab ---> deselect [Use the Extract Update Handler:].

What I observe is no indexing happened in Solr.

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741004#comment-16741004
 ] 

Karl Wright commented on CONNECTORS-1563:
-

{quote}
I need to pass from manifold one custom field and value which I want to see in 
Solr index. That is the reason why I used metadata transformer where I can pass 
the custom field in job - tab metadata adjuster.
{quote}

Yes, people do that all the time.  Just add the Metadata Adjuster any place in 
your pipeline and have it inject the field value you want.  It will be 
faithfully transmitted to Solr.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Subasini Rath (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740992#comment-16740992
 ] 

Subasini Rath commented on CONNECTORS-1563:
---

Thanks Karl.  Just need to get clear one more doubt.  I need to pass from 
manifold one custom field and value which I want to see in Solr index.  That is 
the reason why I used metadata transformer where I can pass the custom field in 
job - tab metadata adjuster.
If I will use only tika extractor,  is there any way to pass custom field which 
we will get indexed in Solr.

On 11-Jan-2019 11:17 PM, "Karl Wright (JIRA)"  wrote:

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740587#comment-16740587
 ]

Karl Wright commented on CONNECTORS-1563:
-

The metadata extractor can go anywhere in your pipeline, after Tika extraction. 
 There is absolutely no point in having *two* Tika extractions though -- and 
that's what you're trying to do with the setup you've got.

What I'd recommend is that you use only the ManifoldCF-side Tika extractor, and 
inject content into Solr using the /update handler, not the /update/extract 
handler.  There's also a checkbox you'd need to uncheck in the Solr connection 
configuration. It's all covered in the ManifoldCF end user documentation.

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740587#comment-16740587
 ] 

Karl Wright commented on CONNECTORS-1563:
-

The metadata extractor can go anywhere in your pipeline, after Tika extraction. 
 There is absolutely no point in having *two* Tika extractions though -- and 
that's what you're trying to do with the setup you've got.

What I'd recommend is that you use only the ManifoldCF-side Tika extractor, and 
inject content into Solr using the /update handler, not the /update/extract 
handler.  There's also a checkbox you'd need to uncheck in the Solr connection 
configuration. It's all covered in the ManifoldCF end user documentation.



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Subasini Rath (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740529#comment-16740529
 ] 

Subasini Rath commented on CONNECTORS-1563:
---

Hi Karl,
  In that case how I should proceed.  I need to use both tika content extractor 
and metadata adjuster.  How shall I update the handler.
Do I need to do any modification in the solrconfig. Xml and managedschema. Xml
If yes, please guide me what I need to do.


On 11-Jan-2019 7:59 PM, "Karl Wright (JIRA)"  wrote:

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740435#comment-16740435
 ]

Karl Wright commented on CONNECTORS-1563:
-

{quote}
Solr cell with standard update handler...
{quote}

This is not Option 2; it's a combination of (1) and (2) and is not a model that 
we support.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740435#comment-16740435
 ] 

Karl Wright commented on CONNECTORS-1563:
-

{quote}
Solr cell with standard update handler...
{quote}

This is not Option 2; it's a combination of (1) and (2) and is not a model that 
we support.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Subasini Rath (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740347#comment-16740347
 ] 

Subasini Rath commented on CONNECTORS-1563:
---

Hi Karl,
   Thanks for your response.  I am using the second option. Solr cell with 
standard update handler + tika content extractor and metadata adjuster.

On 11-Jan-2019 5:33 PM, "Karl Wright (JIRA)"  wrote:

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740330#comment-16740330
 ]

Karl Wright commented on CONNECTORS-1563:
-

Can you tell me which configuration you are attempting:

(1) Solr Cell + extract update handler + no Tika content extraction in MCF, or
(2) NO Solr Cell + standard update handler + Tika content extraction in MCF

Which is it?





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740330#comment-16740330
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Can you tell me which configuration you are attempting:

(1) Solr Cell + extract update handler + no Tika content extraction in MCF, or
(2) NO Solr Cell + standard update handler + Tika content extraction in MCF

Which is it?


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-10 Thread Subasini Rath (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740040#comment-16740040
 ] 

Subasini Rath commented on CONNECTORS-1563:
---

I am using apache solr 7.6.0 and manifoldcf 2.12.

I am also getting the same error.

Hi Karl,

  Could you please let me know how can we check that our Solr configuration is 
correct or not as we are using the default set up.

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2018-12-13 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720171#comment-16720171
 ] 

Karl Wright commented on CONNECTORS-1563:
-

If you need me to debug your solr setup, you're going to need to wait a couple 
of weeks, I'm afraid.  I'm extremely behind and I honestly have been working 20 
hour days.  You're on your own for now.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2018-12-13 Thread Sneha (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720085#comment-16720085
 ] 

Sneha commented on CONNECTORS-1563:
---

I am not getting the missing configuration in solr. I have attached the 
managed-schema and solr-config.xml.

Created core using command: solr create -c corename 

Please help me with the configuration. 

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2018-12-13 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720061#comment-16720061
 ] 

Karl Wright commented on CONNECTORS-1563:
-

That argues that your solr configuration is not correct, because this was 
tested thoroughly during the last release cycle.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2018-12-13 Thread Sneha (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720018#comment-16720018
 ] 

Sneha commented on CONNECTORS-1563:
---

Upgraded the ManifoldCF with version 2.11 but still same error is showing at 
solr server

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2018-12-13 Thread Karl Wright (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719952#comment-16719952
 ] 

Karl Wright commented on CONNECTORS-1563:
-

SolrJ recently had a significant internal revision and you may need to upgrade 
to the latest ManifoldCF (2.11) where changes were made to work with it.  

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Priority: Major
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2018-12-12 Thread Furkan KAMACI (JIRA)



[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719860#comment-16719860
 ] 

Furkan KAMACI commented on CONNECTORS-1563:
---

Please check here: issues.apache.org/jira/browse/TIKA-2637

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Solr 7.x component
>Reporter: Sneha
>Priority: Major
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

32 matches

Site Navigation

Mail list logo

Footer information