Raymond: Right, it looks like SolrJ 4.4 and after includes the fix
necessary for us to ditch my local hack.  But we still need a hacked
version of CloudSolrServer and a new SOLR ticket to fix the SolrCloud
version of this problem.

I suggest we continue the discussion thread on the CONNECTORS-839 ticket.

Karl



On Mon, Dec 16, 2013 at 7:36 AM, Raymond Wiker <[email protected]> wrote:

> See also:
>
> https://issues.apache.org/jira/browse/SOLR-4358
>
> https://issues.apache.org/jira/browse/CONNECTORS-674
>
>
> On Mon, Dec 16, 2013 at 1:34 PM, Karl Wright <[email protected]> wrote:
>
> > Hi Alessandro,
> >
> > ManifoldCF wound up including a hacked version of HttpSolrServer because
> > the Solr version's support for multipart post was broken.  I did send a
> > patch to Solr/Lucene but I lost track of whether that got committed or
> not,
> > and whether it has been released yet.  But that is immaterial; it appears
> > that the SolrCloud implementation turns off multipart too - and that
> could
> > well be because of the breakage I was describing earlier.
> >
> > ManifoldCF needs to use multipart post for more reasons than just that:
> > solr actually treats multipart post fields differently in some respects
> > than url fields.  So we need to find a solution to this problem.
> >
> > I've created a ticket: CONNECTORS-839.
> >
> > Karl
> >
> >
> >
> > On Mon, Dec 16, 2013 at 7:18 AM, Alessandro Benedetti <
> > [email protected]> wrote:
> >
> > > I have more details now, after a deep debugging :
> > >
> > > The CloudSolrServer  triggers the LBHttpSolrServer
> > > lbServer.request(lbRequest).getResponse().
> > >
> > > The LBHttpSolrServer triggers the HttpSolrServer request(request).
> > >
> > > It's here that we build the httpPOST in this way :
> > >
> > > boolean isMultipart = (this.useMultiPartPost || ( streams != null &&
> > > streams.size() > 1 )) && !hasNullStreamName;
> > >
> > >             LinkedList<NameValuePair> postParams = new
> > > LinkedList<NameValuePair>();
> > >        ...
> > >               List<FormBodyPart> parts = new
> LinkedList<FormBodyPart>();
> > >               Iterator<String> iter =
> params.getParameterNamesIterator();
> > >               while (iter.hasNext()) {
> > >                 String p = iter.next();
> > >                 String[] vals = params.getParams(p);
> > >                 if (vals != null) {
> > >                   for (String v : vals) {
> > >                     if (isMultipart) { *// IMPORTANT*
> > >                       parts.add(new FormBodyPart(p, new StringBody(v,
> > > Charset.forName("UTF-8"))));
> > >                     } else {
> > >                       postParams.add(new BasicNameValuePair(p, v));
> > >                     }
> > >                   }
> > >                 }
> > >               }
> > >             ...
> > >             }
> > >            * // It is has one stream, it is the post body, put the
> params
> > > in the URL*
> > > *            else { // we finish in this case*
> > >               String pstr = ClientUtils.toQueryString(params, false);
> > >               HttpPost post = new HttpPost(url + pstr);
> > >
> > > I checked that debugging Manifold the CloudSolrServer calls a
> > > LBHttpSolrServer that calls a HttpSolrServer with
> useMultiPartPost=false
> > .
> > > Here we are with the problem.
> > > So at the moment we have evidence that the metadata field values are
> > placed
> > > in the http header.
> > >
> > > Now, what's behind that ? A bug ? A decision to not use multiPartPost ?
> > > Any advice ?
> > >
> > >
> > >
> > > 2013/12/16 Raymond Wiker <[email protected]>
> > >
> > > > That looks distinctly odd: you have an HTTP POST request, but the
> > > > parameters are attached to the url, GET-style. It really makes no
> sense
> > > to
> > > > add parameters to the url when you have to use POST to carry the file
> > > > content --- but in the "simple post tool", that is exactly what they
> > do.
> > > My
> > > > best guess is that they do it this way to avoid having to deal with
> the
> > > > complexities of multipart/form-data, and this might be acceptable in
> a
> > > > scenario where the number of parameters is so small that you run no
> > risk
> > > of
> > > > overrunning the header size limit.
> > > >
> > > > It's possible that the SolrJ developers make the assumption that this
> > is
> > > > safe; alternatively (and hopefully) there is a way of instructing
> SolrJ
> > > to
> > > > place all the parameters in the request body. If the first is the
> case,
> > > > you'll have to find a workaround (for example, increasing the maximum
> > > > header size in Jetty); In the second case, I guess that ManifoldCF
> > needs
> > > to
> > > > setup SolrJ appropriately.
> > > >
> > > >
> > > >
> > > > On Mon, Dec 16, 2013 at 11:53 AM, Alessandro Benedetti <
> > > > [email protected]> wrote:
> > > >
> > > > > There was an error in the previous mail, and some of the content is
> > > > quoted
> > > > > and maybe not clear at a first glance, I report the most important
> > part
> > > > of
> > > > > the mail here :
> > > > >
> > > > > You can see that all the params are appended to the URL,so they
> will
> > go
> > > > in
> > > > > the Headers of the Http POST request, here you are  :
> > > > >
> > > > > POST /solr/collection1/update/extract?literal.id
> > > > > =C+Movies%3A1025&literal.field2=value2&....&literal.fieldN=valueN&
> > > > > resource.name=Tom+Cruise&wt=javabin&version=2
> > > > >
> > > > > User-Agent Solr[org.apache.solr.client.solrj.impl.HttpSolrServer]
> 1.0
> > > > > Transfer-Encoding chunked
> > > > > Content-Type text/plain
> > > > > Host 10.0.1.16:8983
> > > > > Request Header Size : 5.99 KB (6133 bytes)
> > > > >
> > > > > Remember that is not my code, but Manifold 1.4.1 out of the box :
> > > > >
> > > > > org.apache.manifoldcf.agents.output.solr.HttpPoster
> > > > >
> > > > >  writeField(out,LITERAL+newFieldName,values);
> > > > > // Write the commitWithin parameter
> > > > >  if (commitWithin != null)
> > > > >      writeField(out,COMMITWITHIN_METADATA,commitWithin);
> > > > >      contentStreamUpdateRequest.setParams(out);
> > > > >      contentStreamUpdateRequest.addContentStream(new
> > > > >  RepositoryDocumentStream(is,length,contentType,contentName));
> > > > >      contentStreamUpdateRequest.process(solrServer)
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > > 2013/12/16 Alessandro Benedetti <[email protected]>
> > > > >
> > > > > > 2013/12/16 Raymond Wiker <[email protected]>
> > > > > >
> > > > > >> On Mon, Dec 16, 2013 at 9:42 AM, Alessandro Benedetti <
> > > > > >> [email protected]> wrote:
> > > > > >>
> > > > > >
> > > > > >> > Do you have any means of capturing the entire http (POST)
> > request?
> > > > It
> > > > > >> > could
> > > > > >> > > be that SolrJ is adding things to the header.
> > > > > >> >
> > > > > >> > I used Fiddler and Charles ( 2 softwares for monitoring http
> > > > > requests).
> > > > > >> All
> > > > > >> > the params added to the ContentStreamUpdateRequest appear to
> be
> > in
> > > > the
> > > > > >> > header.
> > > > > >> > Nothing else added by SolrJ.
> > > > > >> >
> > > > > >>
> > > > > >> Ok. Would it be possible for you to generate a set of captures
> > that
> > > > > could
> > > > > >> be shared? I'd be happy to take a look.
> > > > > >>
> > > > > >
> > > > > > Absolutely yes,you can see that all the params are appended to
> the
> > > > URL,so
> > > > > > they will go in the Headers of the Http POST request, here you
> are
> >  :
> > > > > >
> > > > > > POST /solr/collection1/update/extract?literal.id
> > > > > >
> =C+Movies%3A1025&literal.field2=value2&....&literal.fieldN=valueN&
> > > > > > resource.name=Tom+Cruise&wt=javabin&version=2
> > > > > >
> > > > > > User-Agent Solr[org.apache.solr.client.solrj.impl.HttpSolrServer]
> > 1.0
> > > > > > Transfer-Encoding chunked
> > > > > > Content-Type text/plain
> > > > > > Host 10.0.1.16:8983
> > > > > > Request Header Size : 5.99 KB (6133 bytes)
> > > > > >
> > > > > > Remember that is not my code, but Manifold 1.4.1 out of the box :
> > > > > >
> > > > > > org.apache.manifoldcf.agents.output.solr.HttpPoster
> > > > > >
> > > > > >  writeField(out,LITERAL+newFieldName,values);
> > > > > > // Write the commitWithin parameter
> > > > > >  if (commitWithin != null)
> > > > > >      writeField(out,COMMITWITHIN_METADATA,commitWithin);
> > > > > >      contentStreamUpdateRequest.setParams(out);
> > > > > >      contentStreamUpdateRequest.addContentStream(new
> > > > > >  RepositoryDocumentStream(is,length,contentType,contentName));
> > > > > >      contentStreamUpdateRequest.process(solrServer)
> > > > > >
> > > > > >
> > > > > >
> > > > > >>
> > > > > >> > >
> > > > > >> > > What container are you running Solr under? Are you accessing
> > > Solr
> > > > > >> > directly,
> > > > > >> > > or via a proxy?
> > > > > >> >
> > > > > >> > Direct access through a SolrCloudServer configured on a
> zookeper
> > > > > >> ensemble
> > > > > >> > of 3 zk.
> > > > > >> > Solr are running on Jetty.
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > --------------------------
> > > > > >
> > > > > > Benedetti Alessandro
> > > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > > >
> > > > > > "Tyger, tyger burning bright
> > > > > > In the forests of the night,
> > > > > > What immortal hand or eye
> > > > > > Could frame thy fearful symmetry?"
> > > > > >
> > > > > > William Blake - Songs of Experience -1794 England
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > --------------------------
> > > > > >
> > > > > > Benedetti Alessandro
> > > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > > >
> > > > > > "Tyger, tyger burning bright
> > > > > > In the forests of the night,
> > > > > > What immortal hand or eye
> > > > > > Could frame thy fearful symmetry?"
> > > > > >
> > > > > > William Blake - Songs of Experience -1794 England
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --------------------------
> > > > >
> > > > > Benedetti Alessandro
> > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > >
> > > > > "Tyger, tyger burning bright
> > > > > In the forests of the night,
> > > > > What immortal hand or eye
> > > > > Could frame thy fearful symmetry?"
> > > > >
> > > > > William Blake - Songs of Experience -1794 England
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>

Reply via email to