[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-25 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627214#comment-16627214
 ] 

Karl Wright commented on CONNECTORS-1533:
-

The better fix is not actually better so I had to revisit it.

The fix now does take into account the situation where URLs get too long for 
normal PUT-style operations, and the code attempts to use multipart in that 
situation.  I will need [~julienFL] to check out trunk and try it, though, and 
check to be sure his document with lots of metadata gets properly indexed.  I'm 
not worried about deletes now because they all go through the default pathway, 
and I've confirmed that extracting update handler requests all do the right 
thing and use multipart.  Standard update requests use PUT except when the URL 
is too long.

r1841918
r1841919 (release branch)



> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-24 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625883#comment-16625883
 ] 

Karl Wright commented on CONNECTORS-1533:
-

I found a somewhat better fix, actually, that lets all documents use multipart 
post.

r1841853
r1841854 (release branch)



> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-24 Thread Julien Massiera (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625822#comment-16625822
 ] 

Julien Massiera commented on CONNECTORS-1533:
-

[~kwri...@metacarta.com], just tested and yes it works for me

> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-24 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625794#comment-16625794
 ] 

Karl Wright commented on CONNECTORS-1533:
-

I've committed a fix on top of the new codebase from 7.4's HttpSolrClient.  
This implements the previous hack (requiring multipart only for 
ContentStreamUpdateRequest)  but it does fix the deletion problem.

r1841840
r1841843 (release branch)

I still do not understand the issues with stream_size described above.

[~julienFL], can you try out the latest trunk in unmodified form and let me 
know whether it works for you?



> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-24 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625705#comment-16625705
 ] 

Karl Wright commented on CONNECTORS-1533:
-

The rejections I see have the following form:

{code}
Error from server at http://localhost:8394/solr/collection1: ERROR: 
[doc=file:/C:/wip/mcf-release-scripts/release-scripts/.svn/pristine/db/db0ad7d88277f4e7ab862049c928c40b797be9de.svn-base]
 Error adding field 'stream_size'='null' msg=For input string: "null"
{code}

Some documents succeed, and some fail with this.  The failures have response 
code 400, which means they don't get indexed.  I can see no reason why some 
succeed and some fail.


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-24 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625678#comment-16625678
 ] 

Karl Wright commented on CONNECTORS-1533:
-

Hmm.

The unmodified HttpSolrClient works for deletions, so clearly I've missed 
something important and will have to look at that.

But also, many documents are rejected for mysterious reasons.  More on that 
later.



> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-24 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625656#comment-16625656
 ] 

Karl Wright commented on CONNECTORS-1533:
-

I've looked into this more.

Standalone solr also fails to delete documents, with the same underlying error. 
 The reason it wasn't noted before was because it fails silently with a 400 
response.  We note this in the history as a REMOTESOLREXCEPTION.

I'm trying this now with the unmodified solrj 7.4.


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-24 Thread Julien Massiera (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625612#comment-16625612
 ] 

Julien Massiera commented on CONNECTORS-1533:
-

[~kwri...@metacarta.com],

I tested what you wanted, based on trunk :
HttpPoster.java l291 replaced by this one : 
solrServer = new 
HttpSolrClient.Builder(httpSolrServerUrl).withHttpClient(localClient).withResponseParser(new
 XMLResponseParser()).allowCompression(allowCompression).build();

HttpPoster.java l173 replaced by this one : 
final CloudSolrClient cloudSolrServer = new 
CloudSolrClient.Builder().withZkHost(zookeeperHosts).withLBHttpSolrClient(new 
LBHttpSolrClient.Builder().withHttpClient(HttpClientUtil.createClient(null)).build()).build();

Result : document ingestions and deletions are working, Result Code OK for all 
documents and they are present in the Solr index after the crawl and removed 
after the job deletion.

> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-24 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625582#comment-16625582
 ] 

Karl Wright commented on CONNECTORS-1533:
-

Oh, and line 173 needs to be changed as well, to instantiate LBHttpSolrClient 
rather than ModifiedLBHttpSolrClient.

Sorry for overlooking that.


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-24 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625553#comment-16625553
 ] 

Karl Wright commented on CONNECTORS-1533:
-

[~julienFL], thanks for the update.

I believe this to be a bug in SolrJ that is not due to changes we've made.  But 
we should confirm that.  Can you:

- Check out trunk
- change line 291 of HttpPoster.java to do "new HttpSolrClient" instead of "new 
ModifiedHttpSolrClient"
- build and test

If that also doesn't delete properly under Solr Cloud, I will open another 
SolrJ ticket for that

Thanks, and hope you have time to test this out soon.


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-24 Thread Julien Massiera (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625526#comment-16625526
 ] 

Julien Massiera commented on CONNECTORS-1533:
-

Hi [~kwri...@metacarta.com],

I just tested the last commit r713ba5f (release branch) and it does not work on 
deletion. On a job deletion, the cleaning phase ends but I can notice in the 
Simple History, for each document, a ROUTEEXCEPTION result code with the 
following description : Error from server at 
http://localhost:8983/solr/FileShare_shard1_replica_n1: missing content stream: 
Error from server at http://localhost:8983/solr/FileShare_shard1_replica_n1: 
missing content stream
Furthermore, after checking the Solr index, none of the documents have been 
removed.

> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-22 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624816#comment-16624816
 ] 

Karl Wright commented on CONNECTORS-1533:
-

I ran as suggested, and everything still succeeded.

The only odd thing is that I see log messages about empty content streams, that 
get retried five times.  They seemingly return 400 errors.  They do not cause 
errors in the running (or cleaning-up) ManifoldCF job though.

I wonder if these are really errors, or are effectively warnings?  I guess the 
only way to know is to examine the index to see how many documents are in it 
after a job run, and again after it's been deleted.

At any rate, I'm done with this for the weekend -- I have some Geo3D bugs to 
look at now.  If [~julienFL] would like to try out the release artifact and 
verify that it is working for him (being sure to check the index contents after 
indexing and then again after the job has been deleted) I'd be very 
appreciative.


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-22 Thread Shinichiro Abe (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624771#comment-16624771
 ] 

Shinichiro Abe commented on CONNECTORS-1533:


I could see that exception with zero-length documents ,too. But I don't know 
how to turn off.
Please change into useExtractHandler=false, path:"/update/extract" ->  
"/update" and add Tika extractor to pipeline. on this changes, you can see 
"missing content stream" exception when posing documents.

> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-22 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624763#comment-16624763
 ] 

Karl Wright commented on CONNECTORS-1533:
-

If I delete the job under Solr Cloud, and all the documents in it are removed 
from the index, it still works fine for me.  :-(  So I don't know what the 
difference is.


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-22 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624761#comment-16624761
 ] 

Karl Wright commented on CONNECTORS-1533:
-

[~shinichiro abe] I get this particular error from Solr when I try to index a 
zero-length file:

{code}
 WARN 2018-09-22T13:38:09,581 (Worker thread '32') - Solr exception during 
indexing file:/C:/wip/mcf-release-scripts/release-scripts/.svn/wc.db-journal 
(500): Error from server at http://192.168.1.143:8983/solr/collection1: 
org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://192.168.1.143:8983/solr/collection1: 
org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
 ~[?:?]
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
 ~[?:?]
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
 ~[?:?]
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483)
 ~[?:?]
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413)
 ~[?:?]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1106)
 ~[?:?]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:886)
 ~[?:?]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:819)
 ~[?:?]
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194) ~[?:?]
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) ~[?:?]
at 
org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:967)
 ~[?:?]
 WARN 2018-09-22T13:38:09,595 (Worker thread '32') - Service interruption 
reported for job 1537637859471 connection 'files': Solr exception during 
indexing file:/C:/wip/mcf-release-scripts/release-scripts/.svn/wc.db-journal 
(500): Error from server at http://192.168.1.143:8983/solr/collection1: 
org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
 WARN 2018-09-22T13:39:09,959 (Worker thread '46') - Solr exception during 
indexing file:/C:/wip/mcf-release-scripts/release-scripts/.svn/wc.db-journal 
(500): Error from server at http://192.168.1.143:8983/solr/collection1: 
org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://192.168.1.143:8983/solr/collection1: 
org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
 ~[?:?]
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
 ~[?:?]
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
 ~[?:?]
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483)
 ~[?:?]
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413)
 ~[?:?]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1106)
 ~[?:?]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:886)
 ~[?:?]
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:819)
 ~[?:?]
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194) ~[?:?]
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) ~[?:?]
at 
org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:967)
 ~[?:?]
 WARN 2018-09-22T13:39:09,968 (Worker thread '46') - Service interruption 
reported for job 1537637859471 connection 'files': Solr exception during 
indexing file:/C:/wip/mcf-release-scripts/release-scripts/.svn/wc.db-journal 
(500): Error from server at http://192.168.1.143:8983/solr/collection1: 
org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
 WARN 2018-09-22T13:40:10,352 (Worker thread '40') - Solr exception during 
indexing file:/C:/wip/mcf-release-scripts/release-scripts/.svn/wc.db-journal 
(500): Error from server at http://192.168.1.143:8983/solr/collection1: 
org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://192.168.1.143:8983/solr/collection1: 
org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 

[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-22 Thread Shinichiro Abe (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624757#comment-16624757
 ] 

Shinichiro Abe commented on CONNECTORS-1533:


I set Solr Cloud up in standard way:
{noformat}
cd solr-7.4.0
./bin/solr -c -f

./bin/solr create_collection -c collection1
./bin/solr delete -c collection1
{noformat}

> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-22 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624753#comment-16624753
 ] 

Karl Wright commented on CONNECTORS-1533:
-

Did you set it up using instructions as described here?

https://lucene.apache.org/solr/guide/7_4/getting-started-with-solrcloud.html#getting-started-with-solrcloud

If that's how you set it up, I can try the same thing.  Just let me know.


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-22 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624751#comment-16624751
 ] 

Karl Wright commented on CONNECTORS-1533:
-

Hi [~shinichiro abe], you have it set up to use Solr Cloud.  I haven't tried 
that here.  I'm not even sure it is supposed to work with a singleton Solr 
instance.


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: 2018-09-23-012800.png, CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-22 Thread Shinichiro Abe (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624728#comment-16624728
 ] 

Shinichiro Abe commented on CONNECTORS-1533:


[~kwri...@metacarta.com], I did fresh checkout again 10 mins ago, buiid it and 
run with new Solr collection, it did not succeed for documents deletion.

> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-22 Thread Julien Massiera (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624700#comment-16624700
 ] 

Julien Massiera commented on CONNECTORS-1533:
-

Hi [~kwri...@metacarta.com], [~shinichiro abe]

I am currently unable to test the r713ba5f (and I won't be till monday) as I 
don't have my computer with me, but the last test I did yesterday based on the 
rbb6dae8 was a success, documents ingestion and deletion worked ! 
Being able to use ManifoldCF with Solr 7.4.0 is pretty important for us but if 
you decide to rollback the Solr connector I guess we will wait till a clean 
solution is found.

 

 

> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-22 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624689#comment-16624689
 ] 

Karl Wright commented on CONNECTORS-1533:
-

[~shinichiro abe], [~julienFL], I just did a test using current trunk code and 
an actual installation of Solr 7.4.0, and it succeeded without any problems, 
including deleting files from the index.

I suspect Abe-san may have been using an older version of trunk.

Anyhow, there are still risks, but at least it's not entirely broken.


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-22 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624633#comment-16624633
 ] 

Karl Wright commented on CONNECTORS-1533:
-

If this is true, then I think the only possibility is shipping SolrJ 7.3 
instead, and rolling back out the changes made to support 7.4, since SolrJ 7.4 
doesn't work even in unmodified form.

[~julienFL], are you folks dead set on using Solr 7.4?  Because it doesn't look 
like SolrJ for 7.4 is actually functional.  If you disagree with this approach 
please be prepared to demonstrate that we're not leaving our other Solr users 
stuck by this upgrade.  It seems to me that we are, and I'm not willing at this 
time to have multiple independent versions of the Solr connector.




> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
> Attachments: CONNECTORS-1533.patch
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-21 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624481#comment-16624481
 ] 

Karl Wright commented on CONNECTORS-1533:
-

Hi [~shinichiro abe], what version of Solr are you using?
The version of Solrj was updated to 4.6.  I think it may the case that solrj 
4.6 is incompatible with previous versions of Solr.  I have already modified 
ModifiedHttpClient to work with SolrJ 4.6.




> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-21 Thread Shinichiro Abe (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624419#comment-16624419
 ] 

Shinichiro Abe commented on CONNECTORS-1533:


HI,
I tested for RC artifact a few days ago, then I saw the "missing content 
stream" error in Solr instance  when posting documents. I pulled latest trunk 
today, I still see the "missing content stream" error, when deletion documents 
at Job deletion in crawler-ui.
ModifiedHttpClient was introduced in CONNECTORS-623. As far as I know there may 
still have a few impacts as to back compat, that function will need to be 
changed or removed to work in latest Solr.

> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-21 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623781#comment-16623781
 ] 

Karl Wright commented on CONNECTORS-1533:
-

I posted to d...@lucene.apache.org describing the problem, and committed a fix 
that allowed the integration test to pass.  Basically, if the URL required is 
more than 4000 characters, it will use multipart post.  Otherwise it will do 
whatever SolrJ wants.

I am still very concerned that there are a number of fixes we needed to add to 
SolrJ to make it work with our setup.  One case I know that will not work is 
the multipart form's name field, which cannot be transmitted to Solr Cell 
through a standard URL.  Thus my hack is going to break this functionality.  I 
expect it will impact folks like [~shinichiro abe], because they have relied on 
this in the past.  Unfortunately I know of no other workaround at this time, so 
the release will be postponed further until we find one.


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.11
>
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-21 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623626#comment-16623626
 ] 

Karl Wright commented on CONNECTORS-1533:
-

The issue is that they changed the flow through this method to do something 
different if it's a PUT or a POST.  In that case, the metadata is converted to 
URL parameters, and the stream is sent via the contentWriter.

I overlooked this code before.  I've added it in.  But now I'm concerned that 
this is how all requests ManifoldCF makes to Solr will be made, and multipart 
forms will not be used.  That's fatal because it has a length limitation.

r 1841587 (trunk)
r 1841588 (release branch)


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-21 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623608#comment-16623608
 ] 

Karl Wright commented on CONNECTORS-1533:
-

Ok, then I wonder how this works in Solr.  The logic for getting the streams 
would seem equally flawed there as well.

I don't know what the solution is.


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-21 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623583#comment-16623583
 ] 

Karl Wright commented on CONNECTORS-1533:
-

I verified that the integration test in question confirms the following: (a) 
that the right number of documents were processed, and that (b) there were no 
errors reported during the processing.  So unless the failure is indeed a 
silent one, and documents are simply not getting transmitted to Solr at all, 
that test should be valid.

Can you describe the actual failure that you are seeing please?


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-21 Thread Julien Massiera (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623550#comment-16623550
 ] 

Julien Massiera commented on CONNECTORS-1533:
-

Hi [~kwri...@metacarta.com],

the following exception is triggered on Solr side : 
RequestHandlerBase org.apache.solr.common.SolrException: missing content stream
 at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:63)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2539)
 at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
 at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
 at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
 at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
 at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
 at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
 at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
 at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
 at org.eclipse.jetty.server.Server.handle(Server.java:531)
 at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
 at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
 at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
 at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
 at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
 at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
 at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
 at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
 at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
 at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678)
 at java.lang.Thread.run(Thread.java:748)



So the Solr connector get a response code 400 with the "missing content stream" 
message status. After debugging the code I found that after the following lines 
:(
SolrParams params = request.getParams();
 RequestWriter.ContentWriter contentWriter = 
requestWriter.getContentWriter(request);
 Collection streams = contentWriter == null ? 
requestWriter.getContentStreams(request) : null;



The 'streams' collection object which should NOT be null and should contain the 
inputstream of the incoming file, IS null. And this is due to the fact that the 
contentWriter object is not null 

 

> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> 

[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-21 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623544#comment-16623544
 ] 

Karl Wright commented on CONNECTORS-1533:
-

For your reference, here is the Lucene master version of the HttpSolrClient 
method we needed to update in ModifiedHttpSolrClient to make compatible with 
SolrJ 7.4:

{code}
  protected HttpRequestBase createMethod(SolrRequest request, String 
collection) throws IOException, SolrServerException {
if (request instanceof V2RequestSupport) {
  request = ((V2RequestSupport) request).getV2Request();
}
SolrParams params = request.getParams();
RequestWriter.ContentWriter contentWriter = 
requestWriter.getContentWriter(request);
Collection streams = contentWriter == null ? 
requestWriter.getContentStreams(request) : null;
String path = requestWriter.getPath(request);
if (path == null || !path.startsWith("/")) {
  path = DEFAULT_PATH;
}

ResponseParser parser = request.getResponseParser();
if (parser == null) {
  parser = this.parser;
}

// The parser 'wt=' and 'version=' params are used instead of the original
// params
ModifiableSolrParams wparams = new ModifiableSolrParams(params);
if (parser != null) {
  wparams.set(CommonParams.WT, parser.getWriterType());
  wparams.set(CommonParams.VERSION, parser.getVersion());
}
if (invariantParams != null) {
  wparams.add(invariantParams);
}

String basePath = baseUrl;
if (collection != null)
  basePath += "/" + collection;

if (request instanceof V2Request) {
  if (System.getProperty("solr.v2RealPath") == null) {
basePath = baseUrl.replace("/solr", "/api");
  } else {
basePath = baseUrl + "/v2";
  }
}

if (SolrRequest.METHOD.GET == request.getMethod()) {
  if (streams != null || contentWriter != null) {
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "GET can't 
send streams!");
  }

  return new HttpGet(basePath + path + wparams.toQueryString());
}

if (SolrRequest.METHOD.DELETE == request.getMethod()) {
  return new HttpDelete(basePath + path + wparams.toQueryString());
}

if (SolrRequest.METHOD.POST == request.getMethod() || 
SolrRequest.METHOD.PUT == request.getMethod()) {

  String url = basePath + path;
  boolean hasNullStreamName = false;
  if (streams != null) {
for (ContentStream cs : streams) {
  if (cs.getName() == null) {
hasNullStreamName = true;
break;
  }
}
  }
  boolean isMultipart = ((this.useMultiPartPost && SolrRequest.METHOD.POST 
== request.getMethod())
  || (streams != null && streams.size() > 1)) && !hasNullStreamName;

  LinkedList postOrPutParams = new LinkedList<>();

  if(contentWriter != null) {
String fullQueryUrl = url + wparams.toQueryString();
HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == 
request.getMethod() ?
new HttpPost(fullQueryUrl) : new HttpPut(fullQueryUrl);
postOrPut.addHeader("Content-Type",
contentWriter.getContentType());
postOrPut.setEntity(new BasicHttpEntity(){
  @Override
  public boolean isStreaming() {
return true;
  }

  @Override
  public void writeTo(OutputStream outstream) throws IOException {
contentWriter.write(outstream);
  }
});
return postOrPut;

  } else if (streams == null || isMultipart) {
// send server list and request list as query string params
ModifiableSolrParams queryParams = 
calculateQueryParams(this.queryParams, wparams);
queryParams.add(calculateQueryParams(request.getQueryParams(), 
wparams));
String fullQueryUrl = url + queryParams.toQueryString();
HttpEntityEnclosingRequestBase postOrPut = fillContentStream(request, 
streams, wparams, isMultipart, postOrPutParams, fullQueryUrl);
return postOrPut;
  }
  // It is has one stream, it is the post body, put the params in the URL
  else {
String fullQueryUrl = url + wparams.toQueryString();
HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == 
request.getMethod() ?
new HttpPost(fullQueryUrl) : new HttpPut(fullQueryUrl);
fillSingleContentStream(streams, postOrPut);

return postOrPut;
  }
}

throw new SolrServerException("Unsupported method: " + request.getMethod());

  }
{code}

Please compare and contrast with what is committed to the ManifoldCF code base.


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  

[jira] [Commented] (CONNECTORS-1533) Solr Connector is unable to ingest documents

2018-09-21 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16623541#comment-16623541
 ] 

Karl Wright commented on CONNECTORS-1533:
-

Hi [~julienFL], the IT tests I ran do indeed index documents via SolrJ 
successfully, so I am not sure what the disconnect is.

The changes I committed include lines from the HttpSolrClient class method 
corresponding to the one that was overridden.  The only difference is that HTTP 
2.0 support was not included.

If you are seeing a specific exception please include it.


> Solr Connector is unable to ingest documents
> 
>
> Key: CONNECTORS-1533
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1533
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.11
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
>
> The "r69acbd9 - Fix solr connector content deletion bug" has introduced 
> another bug : 
> It is now impossible to ingest documents into Solr 7.4.0, we obtain the 
> following error : Error from server at http://localhost:8983/solr/FileShare: 
> missing content stream
> The fact is, the requestWriter.getContentWriter(request) object is equal to 
> null only on commit requests. So the new lines of code introduced by the fix, 
> which are based on the test of this object, result in a null 
> Collection streams object and so the update request is failing.
> Concerned class : 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)