Raymond: Right, it looks like SolrJ 4.4 and after includes the fix necessary for us to ditch my local hack. But we still need a hacked version of CloudSolrServer and a new SOLR ticket to fix the SolrCloud version of this problem.
I suggest we continue the discussion thread on the CONNECTORS-839 ticket. Karl On Mon, Dec 16, 2013 at 7:36 AM, Raymond Wiker <[email protected]> wrote: > See also: > > https://issues.apache.org/jira/browse/SOLR-4358 > > https://issues.apache.org/jira/browse/CONNECTORS-674 > > > On Mon, Dec 16, 2013 at 1:34 PM, Karl Wright <[email protected]> wrote: > > > Hi Alessandro, > > > > ManifoldCF wound up including a hacked version of HttpSolrServer because > > the Solr version's support for multipart post was broken. I did send a > > patch to Solr/Lucene but I lost track of whether that got committed or > not, > > and whether it has been released yet. But that is immaterial; it appears > > that the SolrCloud implementation turns off multipart too - and that > could > > well be because of the breakage I was describing earlier. > > > > ManifoldCF needs to use multipart post for more reasons than just that: > > solr actually treats multipart post fields differently in some respects > > than url fields. So we need to find a solution to this problem. > > > > I've created a ticket: CONNECTORS-839. > > > > Karl > > > > > > > > On Mon, Dec 16, 2013 at 7:18 AM, Alessandro Benedetti < > > [email protected]> wrote: > > > > > I have more details now, after a deep debugging : > > > > > > The CloudSolrServer triggers the LBHttpSolrServer > > > lbServer.request(lbRequest).getResponse(). > > > > > > The LBHttpSolrServer triggers the HttpSolrServer request(request). > > > > > > It's here that we build the httpPOST in this way : > > > > > > boolean isMultipart = (this.useMultiPartPost || ( streams != null && > > > streams.size() > 1 )) && !hasNullStreamName; > > > > > > LinkedList<NameValuePair> postParams = new > > > LinkedList<NameValuePair>(); > > > ... > > > List<FormBodyPart> parts = new > LinkedList<FormBodyPart>(); > > > Iterator<String> iter = > params.getParameterNamesIterator(); > > > while (iter.hasNext()) { > > > String p = iter.next(); > > > String[] vals = params.getParams(p); > > > if (vals != null) { > > > for (String v : vals) { > > > if (isMultipart) { *// IMPORTANT* > > > parts.add(new FormBodyPart(p, new StringBody(v, > > > Charset.forName("UTF-8")))); > > > } else { > > > postParams.add(new BasicNameValuePair(p, v)); > > > } > > > } > > > } > > > } > > > ... > > > } > > > * // It is has one stream, it is the post body, put the > params > > > in the URL* > > > * else { // we finish in this case* > > > String pstr = ClientUtils.toQueryString(params, false); > > > HttpPost post = new HttpPost(url + pstr); > > > > > > I checked that debugging Manifold the CloudSolrServer calls a > > > LBHttpSolrServer that calls a HttpSolrServer with > useMultiPartPost=false > > . > > > Here we are with the problem. > > > So at the moment we have evidence that the metadata field values are > > placed > > > in the http header. > > > > > > Now, what's behind that ? A bug ? A decision to not use multiPartPost ? > > > Any advice ? > > > > > > > > > > > > 2013/12/16 Raymond Wiker <[email protected]> > > > > > > > That looks distinctly odd: you have an HTTP POST request, but the > > > > parameters are attached to the url, GET-style. It really makes no > sense > > > to > > > > add parameters to the url when you have to use POST to carry the file > > > > content --- but in the "simple post tool", that is exactly what they > > do. > > > My > > > > best guess is that they do it this way to avoid having to deal with > the > > > > complexities of multipart/form-data, and this might be acceptable in > a > > > > scenario where the number of parameters is so small that you run no > > risk > > > of > > > > overrunning the header size limit. > > > > > > > > It's possible that the SolrJ developers make the assumption that this > > is > > > > safe; alternatively (and hopefully) there is a way of instructing > SolrJ > > > to > > > > place all the parameters in the request body. If the first is the > case, > > > > you'll have to find a workaround (for example, increasing the maximum > > > > header size in Jetty); In the second case, I guess that ManifoldCF > > needs > > > to > > > > setup SolrJ appropriately. > > > > > > > > > > > > > > > > On Mon, Dec 16, 2013 at 11:53 AM, Alessandro Benedetti < > > > > [email protected]> wrote: > > > > > > > > > There was an error in the previous mail, and some of the content is > > > > quoted > > > > > and maybe not clear at a first glance, I report the most important > > part > > > > of > > > > > the mail here : > > > > > > > > > > You can see that all the params are appended to the URL,so they > will > > go > > > > in > > > > > the Headers of the Http POST request, here you are : > > > > > > > > > > POST /solr/collection1/update/extract?literal.id > > > > > =C+Movies%3A1025&literal.field2=value2&....&literal.fieldN=valueN& > > > > > resource.name=Tom+Cruise&wt=javabin&version=2 > > > > > > > > > > User-Agent Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] > 1.0 > > > > > Transfer-Encoding chunked > > > > > Content-Type text/plain > > > > > Host 10.0.1.16:8983 > > > > > Request Header Size : 5.99 KB (6133 bytes) > > > > > > > > > > Remember that is not my code, but Manifold 1.4.1 out of the box : > > > > > > > > > > org.apache.manifoldcf.agents.output.solr.HttpPoster > > > > > > > > > > writeField(out,LITERAL+newFieldName,values); > > > > > // Write the commitWithin parameter > > > > > if (commitWithin != null) > > > > > writeField(out,COMMITWITHIN_METADATA,commitWithin); > > > > > contentStreamUpdateRequest.setParams(out); > > > > > contentStreamUpdateRequest.addContentStream(new > > > > > RepositoryDocumentStream(is,length,contentType,contentName)); > > > > > contentStreamUpdateRequest.process(solrServer) > > > > > > > > > > Cheers > > > > > > > > > > > > > > > 2013/12/16 Alessandro Benedetti <[email protected]> > > > > > > > > > > > 2013/12/16 Raymond Wiker <[email protected]> > > > > > > > > > > > >> On Mon, Dec 16, 2013 at 9:42 AM, Alessandro Benedetti < > > > > > >> [email protected]> wrote: > > > > > >> > > > > > > > > > > > >> > Do you have any means of capturing the entire http (POST) > > request? > > > > It > > > > > >> > could > > > > > >> > > be that SolrJ is adding things to the header. > > > > > >> > > > > > > >> > I used Fiddler and Charles ( 2 softwares for monitoring http > > > > > requests). > > > > > >> All > > > > > >> > the params added to the ContentStreamUpdateRequest appear to > be > > in > > > > the > > > > > >> > header. > > > > > >> > Nothing else added by SolrJ. > > > > > >> > > > > > > >> > > > > > >> Ok. Would it be possible for you to generate a set of captures > > that > > > > > could > > > > > >> be shared? I'd be happy to take a look. > > > > > >> > > > > > > > > > > > > Absolutely yes,you can see that all the params are appended to > the > > > > URL,so > > > > > > they will go in the Headers of the Http POST request, here you > are > > : > > > > > > > > > > > > POST /solr/collection1/update/extract?literal.id > > > > > > > =C+Movies%3A1025&literal.field2=value2&....&literal.fieldN=valueN& > > > > > > resource.name=Tom+Cruise&wt=javabin&version=2 > > > > > > > > > > > > User-Agent Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] > > 1.0 > > > > > > Transfer-Encoding chunked > > > > > > Content-Type text/plain > > > > > > Host 10.0.1.16:8983 > > > > > > Request Header Size : 5.99 KB (6133 bytes) > > > > > > > > > > > > Remember that is not my code, but Manifold 1.4.1 out of the box : > > > > > > > > > > > > org.apache.manifoldcf.agents.output.solr.HttpPoster > > > > > > > > > > > > writeField(out,LITERAL+newFieldName,values); > > > > > > // Write the commitWithin parameter > > > > > > if (commitWithin != null) > > > > > > writeField(out,COMMITWITHIN_METADATA,commitWithin); > > > > > > contentStreamUpdateRequest.setParams(out); > > > > > > contentStreamUpdateRequest.addContentStream(new > > > > > > RepositoryDocumentStream(is,length,contentType,contentName)); > > > > > > contentStreamUpdateRequest.process(solrServer) > > > > > > > > > > > > > > > > > > > > > > > >> > > > > > >> > > > > > > > >> > > What container are you running Solr under? Are you accessing > > > Solr > > > > > >> > directly, > > > > > >> > > or via a proxy? > > > > > >> > > > > > > >> > Direct access through a SolrCloudServer configured on a > zookeper > > > > > >> ensemble > > > > > >> > of 3 zk. > > > > > >> > Solr are running on Jetty. > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > -------------------------- > > > > > > > > > > > > Benedetti Alessandro > > > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > > > > > "Tyger, tyger burning bright > > > > > > In the forests of the night, > > > > > > What immortal hand or eye > > > > > > Could frame thy fearful symmetry?" > > > > > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > -------------------------- > > > > > > > > > > > > Benedetti Alessandro > > > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > > > > > "Tyger, tyger burning bright > > > > > > In the forests of the night, > > > > > > What immortal hand or eye > > > > > > Could frame thy fearful symmetry?" > > > > > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -------------------------- > > > > > > > > > > Benedetti Alessandro > > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > > > "Tyger, tyger burning bright > > > > > In the forests of the night, > > > > > What immortal hand or eye > > > > > Could frame thy fearful symmetry?" > > > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > > > > > > > -- > > > -------------------------- > > > > > > Benedetti Alessandro > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > "Tyger, tyger burning bright > > > In the forests of the night, > > > What immortal hand or eye > > > Could frame thy fearful symmetry?" > > > > > > William Blake - Songs of Experience -1794 England > > > > > >
