[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780373#comment-16780373 ] Karl Wright commented on CONNECTORS-1563: - Hi [~Subasini], The "excluded mime types" that you set are meant to exclude documents *entirely*, so changing that setting has no effect on *how* documents are indexed. You can look at the Simple History report to verify that this is taking place as you desire, because most connectors create a record when they reject a document for any reason. The Web Connector is no exception. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, Manifold and Solr > settings_CustomField.docx, managed-schema, manifold settings.docx, > manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780317#comment-16780317 ] Subasini Rath commented on CONNECTORS-1563: --- Hi Karl, After analysing I found that the below setting works fine if no MIMe type excluded in output connection. I have to exclude MIME types like image/gif, image/jpg,text/css. I have specified them in the output connection-->documents tab-->Excluded MIME types. When I removed, it worked fine. Could you please suggest how it can work with excluded MIME types. Thanks & Regards, Subasini Rath App Team Lead | TCS,Offshore Mobile : +91-983-1234-341 M: subasini.r...@endeavourenergy.com.au | : www.endeavourenergy.com.au Planned Leave: 11th March – 13th March > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, Manifold and Solr > settings_CustomField.docx, managed-schema, manifold settings.docx, > manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772995#comment-16772995 ] Karl Wright commented on CONNECTORS-1563: - [~Subasini], we are trying to debug your setup. The first principle of debugging is to identify where exactly the problem is occurring. It eliminates one variable. The file system connector is quite simple and has few configuration options, so it should be easy to set something up we can use to evaluate your solr connection. Thanks, Karl > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, Manifold and Solr > settings_CustomField.docx, managed-schema, manifold settings.docx, > manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772955#comment-16772955 ] Subasini Rath commented on CONNECTORS-1563: --- Hi Karl, It didn't clear my doubt why should I use File System Connector when my requirement is website crawling. Thanks & Regards, Subasini Rath O: +91-33 6636-8889 M: +91 983-1234-341 Email: subasini.r...@endeavourenergy.com.au > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, Manifold and Solr > settings_CustomField.docx, managed-schema, manifold settings.docx, > manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772912#comment-16772912 ] Karl Wright commented on CONNECTORS-1563: - [~Subasini], the "error" is because it does not recognize a specific translation bundle for your language, so it defaults to English. It is harmless. I asked you to *try* working with a File System connection initially to narrow down where your problems were coming from. Please do so. [~shinichiro abe] and myself both tried a configuration similar to the one you report end of last year when we were debugging the 2.11 release of ManifoldCF. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, Manifold and Solr > settings_CustomField.docx, managed-schema, manifold settings.docx, > manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772827#comment-16772827 ] Subasini Rath commented on CONNECTORS-1563: --- Hi Shinichiro, My requirement is not to crawl a file system. My requirement is to crawl a website. That is the reason I am using Web repository. ** - Also could you please let me know when manifold interacts with Solr , which field does it write the actual content of document without any metadata. Didn't get your point on Simple history. My solr log doesnot show any error. Manifold log is as follows : == ERROR 2019-02-18T21:19:25,484 (qtp1619356001-411) - Missing resource bundle 'org.apache.manifoldcf.agents.output.solr.common' for locale 'en': Can't find bundle for base name org.apache.manifoldcf.agents.output.solr.common, locale en; trying en_US java.util.MissingResourceException: Can't find bundle for base name org.apache.manifoldcf.agents.output.solr.common, locale en at java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1573) ~[?:1.8.0_181] at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1396) ~[?:1.8.0_181] at java.util.ResourceBundle.getBundle(ResourceBundle.java:1091) ~[?:1.8.0_181] at org.apache.manifoldcf.core.i18n.Messages.getResourceBundle(Messages.java:142) [mcf-core.jar:?] at org.apache.manifoldcf.core.i18n.Messages.getMessage(Messages.java:178) [mcf-core.jar:?] at org.apache.manifoldcf.core.i18n.Messages.getString(Messages.java:216) [mcf-core.jar:?] at org.apache.manifoldcf.agents.output.solr.Messages.getString(Messages.java:91) [mcf-solr-connector.jar:?] at org.apache.manifoldcf.agents.output.solr.Messages.getString(Messages.java:39) [mcf-solr-connector.jar:?] at org.apache.manifoldcf.agents.output.solr.SolrConnector.outputConfigurationHeader(SolrConnector.java:637) [mcf-solr-connector.jar:?] at org.apache.manifoldcf.core.interfaces.ConnectorFactory.outputThisConfigurationHeader(ConnectorFactory.java:71) [mcf-core.jar:?] at org.apache.manifoldcf.agents.interfaces.OutputConnectorFactory.outputConfigurationHeader(OutputConnectorFactory.java:98) [mcf-agents.jar:?] at org.apache.jsp.editoutput_jsp._jspService(editoutput_jsp.java:423) [jsp/:?] at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) [jasper-6.0.35.jar:6.0.35] at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0] at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:388) [jasper-6.0.35.jar:6.0.35] at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313) [jasper-6.0.35.jar:6.0.35] at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260) [jasper-6.0.35.jar:6.0.35] at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0] at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769) [jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905] at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905] at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905] at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) [jetty-security-9.2.3.v20140905.jar:9.2.3.v20140905] at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905] at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905] at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) [jetty-servlet-9.2.3.v20140905.jar:9.2.3.v20140905] at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905] at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905] at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905] at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905] at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905] at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.2.3.v20140905.jar:9.2.3.v20140905] at org.eclipse.jetty.server.
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772774#comment-16772774 ] Shinichiro Abe commented on CONNECTORS-1563: Hi Subasini, Did you try to use file system repository instead of web repository? I did work with the configuration that is close to settings above though. Also what did simple history complain to not work? > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, Manifold and Solr > settings_CustomField.docx, managed-schema, manifold settings.docx, > manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772729#comment-16772729 ] Karl Wright commented on CONNECTORS-1563: - In general in cases like this I recommend that people start with the simplest possible working configuration and then modify it until they achieve their goals. In this case that would mean starting with a file system job and a freshly-installed Solr instance, with no other changes whatsoever. [~shinichiro abe], can you help Mr. Rath by trying MCF 2.12 with a fresh single-process Solr instance, using the "/update" handler? He claims that this does not work and I do not have any time to work with him for the next few weeks. If it works for you please provide detailed steps describing what you did. Thanks in advance! > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, Manifold and Solr > settings_CustomField.docx, managed-schema, manifold settings.docx, > manifoldcf.log, path.png, schema.png, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771861#comment-16771861 ] Karl Wright commented on CONNECTORS-1563: - In addition to unchecking the "use extracting update handler", you also need to change the "Server update handler" value from "/update/extract" to just plain "/update". > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, Manifold and Solr > settings_CustomField.docx, managed-schema, manifold settings.docx, > manifoldcf.log, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771663#comment-16771663 ] Karl Wright commented on CONNECTORS-1563: - Hi Subasini, Are you now Tika-extracting in ManifoldCF, or in Solr? The text field looks like it contains properly extracted content, along with other stuff you do not want. Is this correct? If the extraction is happening in Solr, then I have no idea what this is coming from. If the extraction is happening in ManifoldCF, then if you have placed a Metadata Adjuster transformer in the pipeline between the Tika Extractor and the Solr Output Connector, I'd say you had set it up to concatenate many fields together into a text field. The Metadata Adjuster has that ability. The choice of how metadata (or content) fields get mapped to Solr schema is set up in your Solr output connection configuration. The Tika extraction basically replaces a binary input document with a character-sequence output document plus metadata fields. The character-sequence output document then must be sent to Solr not using the exracting update handler, but just the standard handler, so the handler should be changed from /update/extract to just /update, and the "Use extracting update handler" should be turned off. The actual field name used for the extracted content body can also be changed, if desired, in the "Schema" part of the configuration. But what is there by default works with Solr as it's set up by default. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, managed-schema, manifold > settings.docx, manifoldcf.log, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771607#comment-16771607 ] Subasini Rath commented on CONNECTORS-1563: --- Hi Karl, Could you please guide me - to which field manifold writes the actual textual content of the document. Currently I am using the _text_ field but it has been found that _text_ does not contain the actual data. Rather it added some extra values to the actual content. In my managed-schema : After my indexing in Solr, the value looks like : (The first 4 lines are appended before the content of file) "title":["NETWORK PLANNING\u"], "_text_":[" \n \n stream_size 34070 \n X-Parsed-By org.apache.tika.parser.DefaultParser \n X-Parsed-By org.apache.tika.parser.txt.TXTParser \n stream_content_type application/pdf \n stream_name cs.exe?bmsdocid=9.2.1&func=eebms.docdownload \n stream_source_info cs.exe?bmsdocid=9.2.1&func=eebms.docdownload \n Content-Encoding UTF-8 \n resourceName cs.exe?bmsdocid=9.2.1&func=eebms.docdownload \n Content-Type text/plain; charset=UTF-8 \n \n \n 9.2.1 UNCONTROLLED IF PRINTED Page 1 of 13\nCompany Policy\nNETWORK\nDocument No Amendment No Approved By Approval Date Review Date\n: : : : :\n9.2.1 9 CEO 23/05/2016 23/05/2019\n9.2.1 NETWORK PLANNING\n1.0 POLICY STATEMENT\nThe company will plan the expansion and augmentation of its electrical network to achieve levels of safety, reliability and quality of supply commensurate with community, regulator, customer and shareholder expectations.\nThe company will coordinate its planning with the NSW transmission utility Transgrid and neighbouring distribution utilities to develop effective solutions to satisfy load growth within the company’s supply area and in adjacent franchise areas where the company’s network has influence.\n2.0 PURPOSE\nTo provide principles for planning network Thanks & Regards, Subasini Rath O: +91-33 6636-8889 M: +91 983-1234-341 Email: subasini.r...@endeavourenergy.com.au > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, managed-schema, manifold > settings.docx, manifoldcf.log, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16747859#comment-16747859 ] Subasini Rath commented on CONNECTORS-1563: --- Hi Karl, I just removed one of the transformer connection and it worked fine for me. So there may be some issue was happening by using same type of transformation connection. Thanks & Regards, Subasini Rath O: +91-33 6636-8889 M: +91 983-1234-341 Email: subasini.r...@endeavourenergy.com.au > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, managed-schema, manifold > settings.docx, manifoldcf.log, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743033#comment-16743033 ] Karl Wright commented on CONNECTORS-1563: - Please also see this discussion: https://issues.apache.org/jira/browse/CONNECTORS-1533 > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, managed-schema, manifold > settings.docx, manifoldcf.log, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743028#comment-16743028 ] Karl Wright commented on CONNECTORS-1563: - First, I asked for the Simple History, not the manifoldcf logs. What does the simple history say about document ingestions for the connection in question with the new configuration? But, from your solr log: {code} 2019-01-15 11:51:54.211 ERROR (qtp592617454-22) [ x:eesolr_webcrawler] o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234) {code} Note that the stack trace is from the ExtractingDocumentLoader, which is Tika. You did not manage to actually change the output handler to the non-extracting one, possibly because you have configured your Solr in a non-default way. I cannot debug that for you, sorry. Can you do the following: Download the current 7.x version of Solr, fresh, and extract it. Start it using the standard provided simple scripts. Point ManifoldCF at it and crawl some documents, using the setup for the connection I have described. Does that work? If it does, and I expect it to because that is what works for me here, then it is your job to figure out what you did to Solr to make that not work. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, managed-schema, manifold > settings.docx, manifoldcf.log, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743006#comment-16743006 ] Karl Wright commented on CONNECTORS-1563: - Please include [INFO] messages from the Solr log for example indexing requests, and also include records from the Simple History for documents indexed with the new configuration. Thanks. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, manifold settings.docx, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742999#comment-16742999 ] Subasini Rath commented on CONNECTORS-1563: --- Hi Karl, Hope you have seen the document I had sent you in email with screenshots. Yes, I did the following steps : 1.In output connection, I clicked the button [Reset All associated Records] and started indexing from the beginning. 2. In path tab, I changed update handler to /update (default was (/update/extract) 3. In schema tab, unchecked the check box [Use the Extract Update Handler]. It forced me to give document length and content field name 4. copied the existing job and created a new one with same configuration 5. Run the new job. 6. In manifold,I can see website is getting crawled and documents are getting processed but nothing was appearing in Solr index. Please guide. Thanks & Regards, Subasini Rath O: +91-33 6636-8889 M: +91 983-1234-341 Email: subasini.r...@endeavourenergy.com.au > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, manifold settings.docx, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742949#comment-16742949 ] Karl Wright commented on CONNECTORS-1563: - Please view the Solr connection and click the button that tells it to forget about everything it has indexed. That will force reindexing. That's standard step when you change configuration like this and you want all documents to be reindexed. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, manifold settings.docx, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742883#comment-16742883 ] Subasini Rath commented on CONNECTORS-1563: --- Hi Karl, Tried your suggestions in the below email but no luck. Please find attached the screenshots for my manifold settings. Could you please revisit once and let me know if I am missing something. Also as per your suggestion - In the Solr output connection : tab [Paths] --- > I changed [update handler to /update instead of /update/extract . In [Schema] tab ---> deselect [Use the Extract Update Handler:]. What I observe is no indexing happened in Solr. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741004#comment-16741004 ] Karl Wright commented on CONNECTORS-1563: - {quote} I need to pass from manifold one custom field and value which I want to see in Solr index. That is the reason why I used metadata transformer where I can pass the custom field in job - tab metadata adjuster. {quote} Yes, people do that all the time. Just add the Metadata Adjuster any place in your pipeline and have it inject the field value you want. It will be faithfully transmitted to Solr. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740992#comment-16740992 ] Subasini Rath commented on CONNECTORS-1563: --- Thanks Karl. Just need to get clear one more doubt. I need to pass from manifold one custom field and value which I want to see in Solr index. That is the reason why I used metadata transformer where I can pass the custom field in job - tab metadata adjuster. If I will use only tika extractor, is there any way to pass custom field which we will get indexed in Solr. On 11-Jan-2019 11:17 PM, "Karl Wright (JIRA)" wrote: [ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740587#comment-16740587 ] Karl Wright commented on CONNECTORS-1563: - The metadata extractor can go anywhere in your pipeline, after Tika extraction. There is absolutely no point in having *two* Tika extractions though -- and that's what you're trying to do with the setup you've got. What I'd recommend is that you use only the ManifoldCF-side Tika extractor, and inject content into Solr using the /update handler, not the /update/extract handler. There's also a checkbox you'd need to uncheck in the Solr connection configuration. It's all covered in the ManifoldCF end user documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740587#comment-16740587 ] Karl Wright commented on CONNECTORS-1563: - The metadata extractor can go anywhere in your pipeline, after Tika extraction. There is absolutely no point in having *two* Tika extractions though -- and that's what you're trying to do with the setup you've got. What I'd recommend is that you use only the ManifoldCF-side Tika extractor, and inject content into Solr using the /update handler, not the /update/extract handler. There's also a checkbox you'd need to uncheck in the Solr connection configuration. It's all covered in the ManifoldCF end user documentation. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740529#comment-16740529 ] Subasini Rath commented on CONNECTORS-1563: --- Hi Karl, In that case how I should proceed. I need to use both tika content extractor and metadata adjuster. How shall I update the handler. Do I need to do any modification in the solrconfig. Xml and managedschema. Xml If yes, please guide me what I need to do. On 11-Jan-2019 7:59 PM, "Karl Wright (JIRA)" wrote: [ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740435#comment-16740435 ] Karl Wright commented on CONNECTORS-1563: - {quote} Solr cell with standard update handler... {quote} This is not Option 2; it's a combination of (1) and (2) and is not a model that we support. -- This message was sent by Atlassian JIRA (v7.6.3#76005) > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740435#comment-16740435 ] Karl Wright commented on CONNECTORS-1563: - {quote} Solr cell with standard update handler... {quote} This is not Option 2; it's a combination of (1) and (2) and is not a model that we support. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740347#comment-16740347 ] Subasini Rath commented on CONNECTORS-1563: --- Hi Karl, Thanks for your response. I am using the second option. Solr cell with standard update handler + tika content extractor and metadata adjuster. On 11-Jan-2019 5:33 PM, "Karl Wright (JIRA)" wrote: [ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740330#comment-16740330 ] Karl Wright commented on CONNECTORS-1563: - Can you tell me which configuration you are attempting: (1) Solr Cell + extract update handler + no Tika content extraction in MCF, or (2) NO Solr Cell + standard update handler + Tika content extraction in MCF Which is it? -- This message was sent by Atlassian JIRA (v7.6.3#76005) > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740330#comment-16740330 ] Karl Wright commented on CONNECTORS-1563: - Can you tell me which configuration you are attempting: (1) Solr Cell + extract update handler + no Tika content extraction in MCF, or (2) NO Solr Cell + standard update handler + Tika content extraction in MCF Which is it? > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740040#comment-16740040 ] Subasini Rath commented on CONNECTORS-1563: --- I am using apache solr 7.6.0 and manifoldcf 2.12. I am also getting the same error. Hi Karl, Could you please let me know how can we check that our Solr configuration is correct or not as we are using the default set up. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720171#comment-16720171 ] Karl Wright commented on CONNECTORS-1563: - If you need me to debug your solr setup, you're going to need to wait a couple of weeks, I'm afraid. I'm extremely behind and I honestly have been working 20 hour days. You're on your own for now. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720085#comment-16720085 ] Sneha commented on CONNECTORS-1563: --- I am not getting the missing configuration in solr. I have attached the managed-schema and solr-config.xml. Created core using command: solr create -c corename Please help me with the configuration. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720061#comment-16720061 ] Karl Wright commented on CONNECTORS-1563: - That argues that your solr configuration is not correct, because this was tested thoroughly during the last release cycle. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720018#comment-16720018 ] Sneha commented on CONNECTORS-1563: --- Upgraded the ManifoldCF with version 2.11 but still same error is showing at solr server > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719952#comment-16719952 ] Karl Wright commented on CONNECTORS-1563: - SolrJ recently had a significant internal revision and you may need to upgrade to the latest ManifoldCF (2.11) where changes were made to work with it. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Priority: Major > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719860#comment-16719860 ] Furkan KAMACI commented on CONNECTORS-1563: --- Please check here: issues.apache.org/jira/browse/TIKA-2637 > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Solr 7.x component >Reporter: Sneha >Priority: Major > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)