[jira] Updated: (SOLR-443) POST queries don't declare its charset

2008-07-02 Thread Lars Kotthoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Kotthoff updated SOLR-443:
---

Attachment: SOLR-443-multipart.patch

Attaching new patch which makes it configurable through a constructor parameter 
whether to use single-part POSTs and setting the content type to 
application/x-www-form-urlencoded; charset=UTF-8 or use multi-part POSTs. 
Single-part is the default.

Note that this patch changes the current behaviour for requests with streams. 
When content streams are present in the request, multi-part requests are 
*always* used. This is because the request has to have mutiple parts and we 
therefore cannot specify the content type. For multi-part POST requests a 
boundary between the parts has to be specified in the Content-Type header, but 
this is unknown when assembling the request, thus the Content-Type header 
cannot be set.

 POST queries don't declare its charset
 --

 Key: SOLR-443
 URL: https://issues.apache.org/jira/browse/SOLR-443
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.2
 Environment: Tomcat 6.0.14
Reporter: Andrew Schurman
Priority: Minor
 Attachments: SOLR-443-multipart.patch, solr-443.patch, 
 solr-443.patch, SolrDispatchFilter.patch


 When sending a query via POST, the content-type is not set. The content 
 charset for the POST parameters are set, but this only appears to be used for 
 creating the Content-Length header in the commons library. Since a query is 
 encoded in UTF-8, the http headers should also specify content type charset.
 On Tomcat, this causes problems when the query string contains non-ascii 
 characters (characters with accents and such) as it tries to parse the POST 
 body in its default ISO-9886-1. There appears to be no way to set/change the 
 default encoding for a message body on Tomcat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-443) POST queries don't declare its charset

2008-07-02 Thread Lars Kotthoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Kotthoff updated SOLR-443:
---

Attachment: (was: SOLR-443-multipart.patch)

 POST queries don't declare its charset
 --

 Key: SOLR-443
 URL: https://issues.apache.org/jira/browse/SOLR-443
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.2
 Environment: Tomcat 6.0.14
Reporter: Andrew Schurman
Priority: Minor
 Attachments: SOLR-443-multipart.patch, solr-443.patch, 
 solr-443.patch, SolrDispatchFilter.patch


 When sending a query via POST, the content-type is not set. The content 
 charset for the POST parameters are set, but this only appears to be used for 
 creating the Content-Length header in the commons library. Since a query is 
 encoded in UTF-8, the http headers should also specify content type charset.
 On Tomcat, this causes problems when the query string contains non-ascii 
 characters (characters with accents and such) as it tries to parse the POST 
 body in its default ISO-9886-1. There appears to be no way to set/change the 
 default encoding for a message body on Tomcat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-443) POST queries don't declare its charset

2008-06-18 Thread Lars Kotthoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Kotthoff updated SOLR-443:
---

Attachment: SOLR-443-multipart.patch

After reading 
[http://www.w3.org/TR/html401/interact/forms.html#form-content-type] it seems 
to me that the only reliable way to ensure that the data is encoded/decoded 
properly is to send the request parameters as parts of a multi-part request. 
The charset of each part can be set to UTF-8, the content-type header is 
generated by httpclient, and nothing needs to be url-encoded.

The downside is that the size of requests becomes larger, as there's quite a 
lot of overhead when putting each parameter into a separate part.

Attached the patch SOLR-443-multipart.patch which makes the necessary changes 
to CommonsHttpSolrServer. Verified to work with the Jetty version used in the 
tests and Tomcat 5.5.

A possible optimisation would be to check each parameter for non-ascii 
characters and only make it a new part if it does, otherwise just include it as 
a parameter.

 POST queries don't declare its charset
 --

 Key: SOLR-443
 URL: https://issues.apache.org/jira/browse/SOLR-443
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.2
 Environment: Tomcat 6.0.14
Reporter: Andrew Schurman
Priority: Minor
 Attachments: SOLR-443-multipart.patch, solr-443.patch, 
 solr-443.patch, SolrDispatchFilter.patch


 When sending a query via POST, the content-type is not set. The content 
 charset for the POST parameters are set, but this only appears to be used for 
 creating the Content-Length header in the commons library. Since a query is 
 encoded in UTF-8, the http headers should also specify content type charset.
 On Tomcat, this causes problems when the query string contains non-ascii 
 characters (characters with accents and such) as it tries to parse the POST 
 body in its default ISO-9886-1. There appears to be no way to set/change the 
 default encoding for a message body on Tomcat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-443) POST queries don't declare its charset

2008-03-12 Thread Hiroaki Kawai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroaki Kawai updated SOLR-443:
---

Attachment: SolrDispatchFilter.patch

This patch will fix the issue.

New in Servlet Spec 2.5, we can specify expected incoming encoding rather than 
decoding it as ISO-8859 string.
http://java.sun.com/javaee/5/docs/api/javax/servlet/ServletRequest.html#setCharacterEncoding(java.lang.String)

The patch will only work with servlet engine implementing servlet 2.5, (i.e, 
Tomcat6 or like that), but I think this is the most desirable way.

 POST queries don't declare its charset
 --

 Key: SOLR-443
 URL: https://issues.apache.org/jira/browse/SOLR-443
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.2
 Environment: Tomcat 6.0.14
Reporter: Andrew Schurman
Priority: Minor
 Attachments: solr-443.patch, solr-443.patch, SolrDispatchFilter.patch


 When sending a query via POST, the content-type is not set. The content 
 charset for the POST parameters are set, but this only appears to be used for 
 creating the Content-Length header in the commons library. Since a query is 
 encoded in UTF-8, the http headers should also specify content type charset.
 On Tomcat, this causes problems when the query string contains non-ascii 
 characters (characters with accents and such) as it tries to parse the POST 
 body in its default ISO-9886-1. There appears to be no way to set/change the 
 default encoding for a message body on Tomcat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-443) POST queries don't declare its charset

2007-12-21 Thread Andrew Schurman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schurman updated SOLR-443:
-

Attachment: solr-443.patch

Simple fix that will fix the issue for this case. I don't believe it will cause 
issues elsewhere within the java client.

 POST queries don't declare its charset
 --

 Key: SOLR-443
 URL: https://issues.apache.org/jira/browse/SOLR-443
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.2
 Environment: Tomcat 6.0.14
Reporter: Andrew Schurman
Priority: Minor
 Attachments: solr-443.patch


 When sending a query via POST, the content-type is not set. The content 
 charset for the POST parameters are set, but this only appears to be used for 
 creating the Content-Length header in the commons library. Since a query is 
 encoded in UTF-8, the http headers should also specify content type charset.
 On Tomcat, this causes problems when the query string contains non-ascii 
 characters (characters with accents and such) as it tries to parse the POST 
 body in its default ISO-9886-1. There appears to be no way to set/change the 
 default encoding for a message body on Tomcat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-443) POST queries don't declare its charset

2007-12-21 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-443:
---

Attachment: solr-443.patch

Andrew, does this patch work for you?

rather then specify the contentType for all POST request, it only adds it for 
ones that don't specify it within a ContentStream

 POST queries don't declare its charset
 --

 Key: SOLR-443
 URL: https://issues.apache.org/jira/browse/SOLR-443
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.2
 Environment: Tomcat 6.0.14
Reporter: Andrew Schurman
Priority: Minor
 Attachments: solr-443.patch, solr-443.patch


 When sending a query via POST, the content-type is not set. The content 
 charset for the POST parameters are set, but this only appears to be used for 
 creating the Content-Length header in the commons library. Since a query is 
 encoded in UTF-8, the http headers should also specify content type charset.
 On Tomcat, this causes problems when the query string contains non-ascii 
 characters (characters with accents and such) as it tries to parse the POST 
 body in its default ISO-9886-1. There appears to be no way to set/change the 
 default encoding for a message body on Tomcat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.