[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630851#comment-16630851
 ] 

Karl Wright commented on SOLR-12798:
------------------------------------

Please examine the following code from master HttpSolrClient.java:

{code}
      if(contentWriter != null) {
        String fullQueryUrl = url + wparams.toQueryString();
        HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == 
request.getMethod() ?new HttpPost(fullQueryUrl) : new HttpPut(fullQueryUrl);
        postOrPut.addHeader("Content-Type",
            contentWriter.getContentType());
        postOrPut.setEntity(new BasicHttpEntity(){
          @Override
          public boolean isStreaming() {
            return true;
          }

          @Override
          public void writeTo(OutputStream outstream) throws IOException {
            contentWriter.write(outstream);
          }
        });
        return postOrPut;

      } else if (streams == null || isMultipart) {
{code}

The request is formed by taking all the parameters in wparams (which include 
the metadata fields AFAICT) and putting them into the URL:

{code}
        HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == 
request.getMethod() ?new HttpPost(fullQueryUrl) : new HttpPut(fullQueryUrl);
{code}

There is no other way in the SolrJ request handling code for PUT and POST 
requests to transmit metadata to Solr.  

Indeed, right now, both documents added to an UpdateRequest, as well as 
documents that are specified via ContentStreamUpdateRequest, go by this route.  
We did verify that using the 7.5.0 version of SolrJ and completely removing all 
ManifoldCF custom code led to documents that would exceed the maximum URL 
length if their metadata was long enough.


> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-12798
>                 URL: https://issues.apache.org/jira/browse/SOLR-12798
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: 7.4
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>            Priority: Major
>         Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to