[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743287#comment-16743287 ] Michael Osipov commented on CONNECTORS-1564: HI [~kwri...@metacarta.com], I was on a business trip for a couple of days. I will -- hopefully -- pick this up on Thu. > Support preemptive authentication to Solr connector > --- > > Key: CONNECTORS-1564 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1564 > Project: ManifoldCF > Issue Type: Improvement > Components: Lucene/SOLR connector >Reporter: Erlend Garåsen >Assignee: Karl Wright >Priority: Major > Attachments: CONNECTORS-1564.patch > > > We should post preemptively in case the Solr server requires basic > authentication. This will make the communication between ManifoldCF and Solr > much more effective instead of the following: > * Send a HTTP POST request to Solr > * Solr sends a 401 response > * Send the same request, but with a "{{Authorization: Basic}}" header > With preemptive authentication, we can send the header in the first request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743186#comment-16743186 ] Karl Wright commented on CONNECTORS-1564: - [~michael-o], any updates? > Support preemptive authentication to Solr connector > --- > > Key: CONNECTORS-1564 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1564 > Project: ManifoldCF > Issue Type: Improvement > Components: Lucene/SOLR connector >Reporter: Erlend Garåsen >Assignee: Karl Wright >Priority: Major > Attachments: CONNECTORS-1564.patch > > > We should post preemptively in case the Solr server requires basic > authentication. This will make the communication between ManifoldCF and Solr > much more effective instead of the following: > * Send a HTTP POST request to Solr > * Solr sends a 401 response > * Send the same request, but with a "{{Authorization: Basic}}" header > With preemptive authentication, we can send the header in the first request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[ANNOUNCE] Apache Roadshow Chicago, Call for Presentations
Hello Devs! You're receiving this email because you are subscribed to one or more Apache developer email lists. I’m writing to let you know about an exciting event coming to the Chicago area: The Apache Roadshow Chicago. It will be held May 13th and 14th at three bars in the Logan Square neighborhood (Revolution Brewing, The Native, and the Radler). There will be six tracks: - Apache in Adtech: Tell us how Apache works in your advertising stack - Apache in Fintech: Tell us how Apache works in your finance/insurance business - Apache in Startups: Tell us how you’re using Apache in your startup - Diversity in Apache: How do we increase and encourage diversity in Apache and tech fields overall? - Made in Chicago: Apache related things made by people in Chicago that don’t fall into other buckets - Project Shark Tank: Do you want more developers or users for your Apache project? Come here and pitch it! This is an exciting chance to learn about how Apache Projects are in use in production around Chicago, how business users make the decision to use Apache projects, to learn about exciting new projects that want help from developers like you, and how/why to increase diversity in tech and IT. If you have any use cases of Apache products in Adtech, Fintech, or Startups; if you represent a minority working in tech and have perspectives to share, if you live in the Chicagoland area and want to highlight some work you’ve done on an Apache project, or if you want to get other people excited to come work on your project, then please submit a CFP before the deadline on February 15th! Tickets to the Apache Roadshow Chicago are $100; speakers will get a complimentary ticket. We’re looking forward to reading your submissions and seeing you there on May 13-14! Sincerely, Trevor Grant https://www.apachecon.com/chiroadshow19/cfp.html https://www.apachecon.com/chiroadshow19/register.html
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743033#comment-16743033 ] Karl Wright commented on CONNECTORS-1563: - Please also see this discussion: https://issues.apache.org/jira/browse/CONNECTORS-1533 > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, managed-schema, manifold > settings.docx, manifoldcf.log, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743028#comment-16743028 ] Karl Wright commented on CONNECTORS-1563: - First, I asked for the Simple History, not the manifoldcf logs. What does the simple history say about document ingestions for the connection in question with the new configuration? But, from your solr log: {code} 2019-01-15 11:51:54.211 ERROR (qtp592617454-22) [ x:eesolr_webcrawler] o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234) {code} Note that the stack trace is from the ExtractingDocumentLoader, which is Tika. You did not manage to actually change the output handler to the non-extracting one, possibly because you have configured your Solr in a non-default way. I cannot debug that for you, sorry. Can you do the following: Download the current 7.x version of Solr, fresh, and extract it. Start it using the standard provided simple scripts. Point ManifoldCF at it and crawl some documents, using the setup for the connection I have described. Does that work? If it does, and I expect it to because that is what works for me here, then it is your job to figure out what you did to Solr to make that not work. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, managed-schema, manifold > settings.docx, manifoldcf.log, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subasini Rath updated CONNECTORS-1563: -- Attachment: Document simple history.docx manifoldcf.log solr.log Please find the log files and document simple history. Thanks & Regards, Subasini Rath O: +91-33 6636-8889 M: +91 983-1234-341 Email: subasini.r...@endeavourenergy.com.au > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: Document simple history.docx, managed-schema, manifold > settings.docx, manifoldcf.log, solr.log, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743006#comment-16743006 ] Karl Wright commented on CONNECTORS-1563: - Please include [INFO] messages from the Solr log for example indexing requests, and also include records from the Simple History for documents indexed with the new configuration. Thanks. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, manifold settings.docx, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742949#comment-16742949 ] Karl Wright commented on CONNECTORS-1563: - Please view the Solr connection and click the button that tells it to forget about everything it has indexed. That will force reindexing. That's standard step when you change configuration like this and you want all documents to be reindexed. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, manifold settings.docx, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subasini Rath updated CONNECTORS-1563: -- Attachment: manifold settings.docx Hi Karl, Tried your suggestions in the below email but no luck. Please find attached the screenshots for my manifold settings. Could you please revisit once and let me know if I am missing something. Also as per your suggestion - In the Solr output connection : tab [Paths] --- > I changed [update handler to /update instead of /update/extract . In [Schema] tab ---> deselect [Use the Extract Update Handler:]. What I observe is no indexing happened in Solr. Thanks & Regards, Subasini Rath O: +91-33 6636-8889 M: +91 983-1234-341 Email: subasini.r...@endeavourenergy.com.au > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, manifold settings.docx, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742883#comment-16742883 ] Subasini Rath edited comment on CONNECTORS-1563 at 1/15/19 9:12 AM: Hi Karl, Tried your suggestions in the below email but no luck. Please find attached the screenshots for my manifold settings. Could you please revisit once and let me know if I am missing something. Also as per your suggestion - In the Solr output connection : tab [Paths] — > I changed [update handler to /update instead of /update/extract . In [Schema] tab ---> deselect [Use the Extract Update Handler:]. What I observe is no indexing happened in Solr. was (Author: subasinir): Hi Karl, Tried your suggestions in the below email but no luck. Please find attached the screenshots for my manifold settings. Could you please revisit once and let me know if I am missing something. Also as per your suggestion - In the Solr output connection : tab [Paths] --- > I changed [update handler to /update instead of /update/extract . In [Schema] tab ---> deselect [Use the Extract Update Handler:]. What I observe is no indexing happened in Solr. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742883#comment-16742883 ] Subasini Rath commented on CONNECTORS-1563: --- Hi Karl, Tried your suggestions in the below email but no luck. Please find attached the screenshots for my manifold settings. Could you please revisit once and let me know if I am missing something. Also as per your suggestion - In the Solr output connection : tab [Paths] --- > I changed [update handler to /update instead of /update/extract . In [Schema] tab ---> deselect [Use the Extract Update Handler:]. What I observe is no indexing happened in Solr. > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > --- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector >Reporter: Sneha >Assignee: Karl Wright >Priority: Major > Attachments: managed-schema, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)