[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627214#comment-16627214 ] Karl Wright commented on CONNECTORS-1533: - The better fix is not actually better so I had to revisit it. The fix now does take into account the situation where URLs get too long for normal PUT-style operations, and the code attempts to use multipart in that situation. I will need [~julienFL] to check out trunk and try it, though, and check to be sure his document with lots of metadata gets properly indexed. I'm not worried about deletes now because they all go through the default pathway, and I've confirmed that extracting update handler requests all do the right thing and use multipart. Standard update requests use PUT except when the URL is too long. r1841918 r1841919 (release branch) > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625883#comment-16625883 ] Karl Wright commented on CONNECTORS-1533: - I found a somewhat better fix, actually, that lets all documents use multipart post. r1841853 r1841854 (release branch) > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625822#comment-16625822 ] Julien Massiera commented on CONNECTORS-1533: - [~kwri...@metacarta.com], just tested and yes it works for me > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625794#comment-16625794 ] Karl Wright commented on CONNECTORS-1533: - I've committed a fix on top of the new codebase from 7.4's HttpSolrClient. This implements the previous hack (requiring multipart only for ContentStreamUpdateRequest) but it does fix the deletion problem. r1841840 r1841843 (release branch) I still do not understand the issues with stream_size described above. [~julienFL], can you try out the latest trunk in unmodified form and let me know whether it works for you? > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625705#comment-16625705 ] Karl Wright commented on CONNECTORS-1533: - The rejections I see have the following form: {code} Error from server at http://localhost:8394/solr/collection1: ERROR: [doc=file:/C:/wip/mcf-release-scripts/release-scripts/.svn/pristine/db/db0ad7d88277f4e7ab862049c928c40b797be9de.svn-base] Error adding field 'stream_size'='null' msg=For input string: "null" {code} Some documents succeed, and some fail with this. The failures have response code 400, which means they don't get indexed. I can see no reason why some succeed and some fail. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625678#comment-16625678 ] Karl Wright commented on CONNECTORS-1533: - Hmm. The unmodified HttpSolrClient works for deletions, so clearly I've missed something important and will have to look at that. But also, many documents are rejected for mysterious reasons. More on that later. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625656#comment-16625656 ] Karl Wright commented on CONNECTORS-1533: - I've looked into this more. Standalone solr also fails to delete documents, with the same underlying error. The reason it wasn't noted before was because it fails silently with a 400 response. We note this in the history as a REMOTESOLREXCEPTION. I'm trying this now with the unmodified solrj 7.4. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625612#comment-16625612 ] Julien Massiera commented on CONNECTORS-1533: - [~kwri...@metacarta.com], I tested what you wanted, based on trunk : HttpPoster.java l291 replaced by this one : solrServer = new HttpSolrClient.Builder(httpSolrServerUrl).withHttpClient(localClient).withResponseParser(new XMLResponseParser()).allowCompression(allowCompression).build(); HttpPoster.java l173 replaced by this one : final CloudSolrClient cloudSolrServer = new CloudSolrClient.Builder().withZkHost(zookeeperHosts).withLBHttpSolrClient(new LBHttpSolrClient.Builder().withHttpClient(HttpClientUtil.createClient(null)).build()).build(); Result : document ingestions and deletions are working, Result Code OK for all documents and they are present in the Solr index after the crawl and removed after the job deletion. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625582#comment-16625582 ] Karl Wright commented on CONNECTORS-1533: - Oh, and line 173 needs to be changed as well, to instantiate LBHttpSolrClient rather than ModifiedLBHttpSolrClient. Sorry for overlooking that. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625553#comment-16625553 ] Karl Wright commented on CONNECTORS-1533: - [~julienFL], thanks for the update. I believe this to be a bug in SolrJ that is not due to changes we've made. But we should confirm that. Can you: - Check out trunk - change line 291 of HttpPoster.java to do "new HttpSolrClient" instead of "new ModifiedHttpSolrClient" - build and test If that also doesn't delete properly under Solr Cloud, I will open another SolrJ ticket for that Thanks, and hope you have time to test this out soon. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625526#comment-16625526 ] Julien Massiera commented on CONNECTORS-1533: - Hi [~kwri...@metacarta.com], I just tested the last commit r713ba5f (release branch) and it does not work on deletion. On a job deletion, the cleaning phase ends but I can notice in the Simple History, for each document, a ROUTEEXCEPTION result code with the following description : Error from server at http://localhost:8983/solr/FileShare_shard1_replica_n1: missing content stream: Error from server at http://localhost:8983/solr/FileShare_shard1_replica_n1: missing content stream Furthermore, after checking the Solr index, none of the documents have been removed. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624816#comment-16624816 ] Karl Wright commented on CONNECTORS-1533: - I ran as suggested, and everything still succeeded. The only odd thing is that I see log messages about empty content streams, that get retried five times. They seemingly return 400 errors. They do not cause errors in the running (or cleaning-up) ManifoldCF job though. I wonder if these are really errors, or are effectively warnings? I guess the only way to know is to examine the index to see how many documents are in it after a job run, and again after it's been deleted. At any rate, I'm done with this for the weekend -- I have some Geo3D bugs to look at now. If [~julienFL] would like to try out the release artifact and verify that it is working for him (being sure to check the index contents after indexing and then again after the job has been deleted) I'd be very appreciative. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624771#comment-16624771 ] Shinichiro Abe commented on CONNECTORS-1533: I could see that exception with zero-length documents ,too. But I don't know how to turn off. Please change into useExtractHandler=false, path:"/update/extract" -> "/update" and add Tika extractor to pipeline. on this changes, you can see "missing content stream" exception when posing documents. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624763#comment-16624763 ] Karl Wright commented on CONNECTORS-1533: - If I delete the job under Solr Cloud, and all the documents in it are removed from the index, it still works fine for me. :-( So I don't know what the difference is. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624761#comment-16624761 ] Karl Wright commented on CONNECTORS-1533: - [~shinichiro abe] I get this particular error from Solr when I try to index a zero-length file: {code} WARN 2018-09-22T13:38:09,581 (Worker thread '32') - Solr exception during indexing file:/C:/wip/mcf-release-scripts/release-scripts/.svn/wc.db-journal (500): Error from server at http://192.168.1.143:8983/solr/collection1: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://192.168.1.143:8983/solr/collection1: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643) ~[?:?] at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) ~[?:?] at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) ~[?:?] at org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483) ~[?:?] at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413) ~[?:?] at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1106) ~[?:?] at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:886) ~[?:?] at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:819) ~[?:?] at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194) ~[?:?] at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) ~[?:?] at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:967) ~[?:?] WARN 2018-09-22T13:38:09,595 (Worker thread '32') - Service interruption reported for job 1537637859471 connection 'files': Solr exception during indexing file:/C:/wip/mcf-release-scripts/release-scripts/.svn/wc.db-journal (500): Error from server at http://192.168.1.143:8983/solr/collection1: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes WARN 2018-09-22T13:39:09,959 (Worker thread '46') - Solr exception during indexing file:/C:/wip/mcf-release-scripts/release-scripts/.svn/wc.db-journal (500): Error from server at http://192.168.1.143:8983/solr/collection1: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://192.168.1.143:8983/solr/collection1: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643) ~[?:?] at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) ~[?:?] at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) ~[?:?] at org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483) ~[?:?] at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413) ~[?:?] at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1106) ~[?:?] at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:886) ~[?:?] at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:819) ~[?:?] at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194) ~[?:?] at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) ~[?:?] at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:967) ~[?:?] WARN 2018-09-22T13:39:09,968 (Worker thread '46') - Service interruption reported for job 1537637859471 connection 'files': Solr exception during indexing file:/C:/wip/mcf-release-scripts/release-scripts/.svn/wc.db-journal (500): Error from server at http://192.168.1.143:8983/solr/collection1: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes WARN 2018-09-22T13:40:10,352 (Worker thread '40') - Solr exception during indexing file:/C:/wip/mcf-release-scripts/release-scripts/.svn/wc.db-journal (500): Error from server at http://192.168.1.143:8983/solr/collection1: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://192.168.1.143:8983/solr/collection1: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624757#comment-16624757 ] Shinichiro Abe commented on CONNECTORS-1533: I set Solr Cloud up in standard way: {noformat} cd solr-7.4.0 ./bin/solr -c -f ./bin/solr create_collection -c collection1 ./bin/solr delete -c collection1 {noformat} > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624753#comment-16624753 ] Karl Wright commented on CONNECTORS-1533: - Did you set it up using instructions as described here? https://lucene.apache.org/solr/guide/7_4/getting-started-with-solrcloud.html#getting-started-with-solrcloud If that's how you set it up, I can try the same thing. Just let me know. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624751#comment-16624751 ] Karl Wright commented on CONNECTORS-1533: - Hi [~shinichiro abe], you have it set up to use Solr Cloud. I haven't tried that here. I'm not even sure it is supposed to work with a singleton Solr instance. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624728#comment-16624728 ] Shinichiro Abe commented on CONNECTORS-1533: [~kwri...@metacarta.com], I did fresh checkout again 10 mins ago, buiid it and run with new Solr collection, it did not succeed for documents deletion. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624700#comment-16624700 ] Julien Massiera commented on CONNECTORS-1533: - Hi [~kwri...@metacarta.com], [~shinichiro abe] I am currently unable to test the r713ba5f (and I won't be till monday) as I don't have my computer with me, but the last test I did yesterday based on the rbb6dae8 was a success, documents ingestion and deletion worked ! Being able to use ManifoldCF with Solr 7.4.0 is pretty important for us but if you decide to rollback the Solr connector I guess we will wait till a clean solution is found. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624689#comment-16624689 ] Karl Wright commented on CONNECTORS-1533: - [~shinichiro abe], [~julienFL], I just did a test using current trunk code and an actual installation of Solr 7.4.0, and it succeeded without any problems, including deleting files from the index. I suspect Abe-san may have been using an older version of trunk. Anyhow, there are still risks, but at least it's not entirely broken. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624633#comment-16624633 ] Karl Wright commented on CONNECTORS-1533: - If this is true, then I think the only possibility is shipping SolrJ 7.3 instead, and rolling back out the changes made to support 7.4, since SolrJ 7.4 doesn't work even in unmodified form. [~julienFL], are you folks dead set on using Solr 7.4? Because it doesn't look like SolrJ for 7.4 is actually functional. If you disagree with this approach please be prepared to demonstrate that we're not leaving our other Solr users stuck by this upgrade. It seems to me that we are, and I'm not willing at this time to have multiple independent versions of the Solr connector. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > Attachments: CONNECTORS-1533.patch > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624481#comment-16624481 ] Karl Wright commented on CONNECTORS-1533: - Hi [~shinichiro abe], what version of Solr are you using? The version of Solrj was updated to 4.6. I think it may the case that solrj 4.6 is incompatible with previous versions of Solr. I have already modified ModifiedHttpClient to work with SolrJ 4.6. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624419#comment-16624419 ] Shinichiro Abe commented on CONNECTORS-1533: HI, I tested for RC artifact a few days ago, then I saw the "missing content stream" error in Solr instance when posting documents. I pulled latest trunk today, I still see the "missing content stream" error, when deletion documents at Job deletion in crawler-ui. ModifiedHttpClient was introduced in CONNECTORS-623. As far as I know there may still have a few impacts as to back compat, that function will need to be changed or removed to work in latest Solr. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623781#comment-16623781 ] Karl Wright commented on CONNECTORS-1533: - I posted to d...@lucene.apache.org describing the problem, and committed a fix that allowed the integration test to pass. Basically, if the URL required is more than 4000 characters, it will use multipart post. Otherwise it will do whatever SolrJ wants. I am still very concerned that there are a number of fixes we needed to add to SolrJ to make it work with our setup. One case I know that will not work is the multipart form's name field, which cannot be transmitted to Solr Cell through a standard URL. Thus my hack is going to break this functionality. I expect it will impact folks like [~shinichiro abe], because they have relied on this in the past. Unfortunately I know of no other workaround at this time, so the release will be postponed further until we find one. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.11 > > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623626#comment-16623626 ] Karl Wright commented on CONNECTORS-1533: - The issue is that they changed the flow through this method to do something different if it's a PUT or a POST. In that case, the metadata is converted to URL parameters, and the stream is sent via the contentWriter. I overlooked this code before. I've added it in. But now I'm concerned that this is how all requests ManifoldCF makes to Solr will be made, and multipart forms will not be used. That's fatal because it has a length limitation. r 1841587 (trunk) r 1841588 (release branch) > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623608#comment-16623608 ] Karl Wright commented on CONNECTORS-1533: - Ok, then I wonder how this works in Solr. The logic for getting the streams would seem equally flawed there as well. I don't know what the solution is. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623583#comment-16623583 ] Karl Wright commented on CONNECTORS-1533: - I verified that the integration test in question confirms the following: (a) that the right number of documents were processed, and that (b) there were no errors reported during the processing. So unless the failure is indeed a silent one, and documents are simply not getting transmitted to Solr at all, that test should be valid. Can you describe the actual failure that you are seeing please? > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623550#comment-16623550 ] Julien Massiera commented on CONNECTORS-1533: - Hi [~kwri...@metacarta.com], the following exception is triggered on Solr side : RequestHandlerBase org.apache.solr.common.SolrException: missing content stream at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:63) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2539) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:531) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678) at java.lang.Thread.run(Thread.java:748) So the Solr connector get a response code 400 with the "missing content stream" message status. After debugging the code I found that after the following lines :( SolrParams params = request.getParams(); RequestWriter.ContentWriter contentWriter = requestWriter.getContentWriter(request); Collection streams = contentWriter == null ? requestWriter.getContentStreams(request) : null; The 'streams' collection object which should NOT be null and should contain the inputstream of the incoming file, IS null. And this is due to the fact that the contentWriter object is not null > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced >
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623544#comment-16623544 ] Karl Wright commented on CONNECTORS-1533: - For your reference, here is the Lucene master version of the HttpSolrClient method we needed to update in ModifiedHttpSolrClient to make compatible with SolrJ 7.4: {code} protected HttpRequestBase createMethod(SolrRequest request, String collection) throws IOException, SolrServerException { if (request instanceof V2RequestSupport) { request = ((V2RequestSupport) request).getV2Request(); } SolrParams params = request.getParams(); RequestWriter.ContentWriter contentWriter = requestWriter.getContentWriter(request); Collection streams = contentWriter == null ? requestWriter.getContentStreams(request) : null; String path = requestWriter.getPath(request); if (path == null || !path.startsWith("/")) { path = DEFAULT_PATH; } ResponseParser parser = request.getResponseParser(); if (parser == null) { parser = this.parser; } // The parser 'wt=' and 'version=' params are used instead of the original // params ModifiableSolrParams wparams = new ModifiableSolrParams(params); if (parser != null) { wparams.set(CommonParams.WT, parser.getWriterType()); wparams.set(CommonParams.VERSION, parser.getVersion()); } if (invariantParams != null) { wparams.add(invariantParams); } String basePath = baseUrl; if (collection != null) basePath += "/" + collection; if (request instanceof V2Request) { if (System.getProperty("solr.v2RealPath") == null) { basePath = baseUrl.replace("/solr", "/api"); } else { basePath = baseUrl + "/v2"; } } if (SolrRequest.METHOD.GET == request.getMethod()) { if (streams != null || contentWriter != null) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "GET can't send streams!"); } return new HttpGet(basePath + path + wparams.toQueryString()); } if (SolrRequest.METHOD.DELETE == request.getMethod()) { return new HttpDelete(basePath + path + wparams.toQueryString()); } if (SolrRequest.METHOD.POST == request.getMethod() || SolrRequest.METHOD.PUT == request.getMethod()) { String url = basePath + path; boolean hasNullStreamName = false; if (streams != null) { for (ContentStream cs : streams) { if (cs.getName() == null) { hasNullStreamName = true; break; } } } boolean isMultipart = ((this.useMultiPartPost && SolrRequest.METHOD.POST == request.getMethod()) || (streams != null && streams.size() > 1)) && !hasNullStreamName; LinkedList postOrPutParams = new LinkedList<>(); if(contentWriter != null) { String fullQueryUrl = url + wparams.toQueryString(); HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == request.getMethod() ? new HttpPost(fullQueryUrl) : new HttpPut(fullQueryUrl); postOrPut.addHeader("Content-Type", contentWriter.getContentType()); postOrPut.setEntity(new BasicHttpEntity(){ @Override public boolean isStreaming() { return true; } @Override public void writeTo(OutputStream outstream) throws IOException { contentWriter.write(outstream); } }); return postOrPut; } else if (streams == null || isMultipart) { // send server list and request list as query string params ModifiableSolrParams queryParams = calculateQueryParams(this.queryParams, wparams); queryParams.add(calculateQueryParams(request.getQueryParams(), wparams)); String fullQueryUrl = url + queryParams.toQueryString(); HttpEntityEnclosingRequestBase postOrPut = fillContentStream(request, streams, wparams, isMultipart, postOrPutParams, fullQueryUrl); return postOrPut; } // It is has one stream, it is the post body, put the params in the URL else { String fullQueryUrl = url + wparams.toQueryString(); HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == request.getMethod() ? new HttpPost(fullQueryUrl) : new HttpPut(fullQueryUrl); fillSingleContentStream(streams, postOrPut); return postOrPut; } } throw new SolrServerException("Unsupported method: " + request.getMethod()); } {code} Please compare and contrast with what is committed to the ManifoldCF code base. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF >
[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents
[ https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623541#comment-16623541 ] Karl Wright commented on CONNECTORS-1533: - Hi [~julienFL], the IT tests I ran do indeed index documents via SolrJ successfully, so I am not sure what the disconnect is. The changes I committed include lines from the HttpSolrClient class method corresponding to the one that was overridden. The only difference is that HTTP 2.0 support was not included. If you are seeing a specific exception please include it. > Solr Connector is unable to ingest documents > > > Key: CONNECTORS-1533 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1533 > Project: ManifoldCF > Issue Type: Bug > Components: Lucene/SOLR connector >Affects Versions: ManifoldCF 2.11 >Reporter: Julien Massiera >Assignee: Karl Wright >Priority: Major > > The "r69acbd9 - Fix solr connector content deletion bug" has introduced > another bug : > It is now impossible to ingest documents into Solr 7.4.0, we obtain the > following error : Error from server at http://localhost:8983/solr/FileShare: > missing content stream > The fact is, the requestWriter.getContentWriter(request) object is equal to > null only on commit requests. So the new lines of code introduced by the fix, > which are based on the test of this object, result in a null > Collection streams object and so the update request is failing. > Concerned class : > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient -- This message was sent by Atlassian JIRA (v7.6.3#76005)