[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

JIRA Fri, 28 Sep 2018 01:27:49 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631520#comment-16631520
 ]


Jan Høydahl commented on SOLR-12798:
------------------------------------

{quote}How we can avoid multipart when we have one big chunk as a content 
stream and one chunk with huge params?
{quote}
I have still not seen the usecase for this. Why would there be huge params when 
you post a binary content stream to SolrCell? The params would come from the 
metadata inside the binary docs, which are unpacked on the Solr server side? 
You could of course have large metadata about a PDF sitting in a database on 
the client and want to post that with the binary doc but as I understand the 
usecase for MCF, the huge metadata is parsed from the binary doc by Tika on the 
server side?

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-12798
>                 URL: https://issues.apache.org/jira/browse/SOLR-12798
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: 7.4
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>            Priority: Major
>         Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

Reply via email to