[jira] Updated: (SOLR-443) POST queries don't declare its charset
[ https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Kotthoff updated SOLR-443: --- Attachment: SOLR-443-multipart.patch Attaching new patch which makes it configurable through a constructor parameter whether to use single-part POSTs and setting the content type to application/x-www-form-urlencoded; charset=UTF-8 or use multi-part POSTs. Single-part is the default. Note that this patch changes the current behaviour for requests with streams. When content streams are present in the request, multi-part requests are *always* used. This is because the request has to have mutiple parts and we therefore cannot specify the content type. For multi-part POST requests a boundary between the parts has to be specified in the Content-Type header, but this is unknown when assembling the request, thus the Content-Type header cannot be set. POST queries don't declare its charset -- Key: SOLR-443 URL: https://issues.apache.org/jira/browse/SOLR-443 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 1.2 Environment: Tomcat 6.0.14 Reporter: Andrew Schurman Priority: Minor Attachments: SOLR-443-multipart.patch, solr-443.patch, solr-443.patch, SolrDispatchFilter.patch When sending a query via POST, the content-type is not set. The content charset for the POST parameters are set, but this only appears to be used for creating the Content-Length header in the commons library. Since a query is encoded in UTF-8, the http headers should also specify content type charset. On Tomcat, this causes problems when the query string contains non-ascii characters (characters with accents and such) as it tries to parse the POST body in its default ISO-9886-1. There appears to be no way to set/change the default encoding for a message body on Tomcat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-443) POST queries don't declare its charset
[ https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Kotthoff updated SOLR-443: --- Attachment: (was: SOLR-443-multipart.patch) POST queries don't declare its charset -- Key: SOLR-443 URL: https://issues.apache.org/jira/browse/SOLR-443 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 1.2 Environment: Tomcat 6.0.14 Reporter: Andrew Schurman Priority: Minor Attachments: SOLR-443-multipart.patch, solr-443.patch, solr-443.patch, SolrDispatchFilter.patch When sending a query via POST, the content-type is not set. The content charset for the POST parameters are set, but this only appears to be used for creating the Content-Length header in the commons library. Since a query is encoded in UTF-8, the http headers should also specify content type charset. On Tomcat, this causes problems when the query string contains non-ascii characters (characters with accents and such) as it tries to parse the POST body in its default ISO-9886-1. There appears to be no way to set/change the default encoding for a message body on Tomcat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-443) POST queries don't declare its charset
[ https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Kotthoff updated SOLR-443: --- Attachment: SOLR-443-multipart.patch After reading [http://www.w3.org/TR/html401/interact/forms.html#form-content-type] it seems to me that the only reliable way to ensure that the data is encoded/decoded properly is to send the request parameters as parts of a multi-part request. The charset of each part can be set to UTF-8, the content-type header is generated by httpclient, and nothing needs to be url-encoded. The downside is that the size of requests becomes larger, as there's quite a lot of overhead when putting each parameter into a separate part. Attached the patch SOLR-443-multipart.patch which makes the necessary changes to CommonsHttpSolrServer. Verified to work with the Jetty version used in the tests and Tomcat 5.5. A possible optimisation would be to check each parameter for non-ascii characters and only make it a new part if it does, otherwise just include it as a parameter. POST queries don't declare its charset -- Key: SOLR-443 URL: https://issues.apache.org/jira/browse/SOLR-443 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 1.2 Environment: Tomcat 6.0.14 Reporter: Andrew Schurman Priority: Minor Attachments: SOLR-443-multipart.patch, solr-443.patch, solr-443.patch, SolrDispatchFilter.patch When sending a query via POST, the content-type is not set. The content charset for the POST parameters are set, but this only appears to be used for creating the Content-Length header in the commons library. Since a query is encoded in UTF-8, the http headers should also specify content type charset. On Tomcat, this causes problems when the query string contains non-ascii characters (characters with accents and such) as it tries to parse the POST body in its default ISO-9886-1. There appears to be no way to set/change the default encoding for a message body on Tomcat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-443) POST queries don't declare its charset
[ https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated SOLR-443: --- Attachment: SolrDispatchFilter.patch This patch will fix the issue. New in Servlet Spec 2.5, we can specify expected incoming encoding rather than decoding it as ISO-8859 string. http://java.sun.com/javaee/5/docs/api/javax/servlet/ServletRequest.html#setCharacterEncoding(java.lang.String) The patch will only work with servlet engine implementing servlet 2.5, (i.e, Tomcat6 or like that), but I think this is the most desirable way. POST queries don't declare its charset -- Key: SOLR-443 URL: https://issues.apache.org/jira/browse/SOLR-443 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 1.2 Environment: Tomcat 6.0.14 Reporter: Andrew Schurman Priority: Minor Attachments: solr-443.patch, solr-443.patch, SolrDispatchFilter.patch When sending a query via POST, the content-type is not set. The content charset for the POST parameters are set, but this only appears to be used for creating the Content-Length header in the commons library. Since a query is encoded in UTF-8, the http headers should also specify content type charset. On Tomcat, this causes problems when the query string contains non-ascii characters (characters with accents and such) as it tries to parse the POST body in its default ISO-9886-1. There appears to be no way to set/change the default encoding for a message body on Tomcat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-443) POST queries don't declare its charset
[ https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Schurman updated SOLR-443: - Attachment: solr-443.patch Simple fix that will fix the issue for this case. I don't believe it will cause issues elsewhere within the java client. POST queries don't declare its charset -- Key: SOLR-443 URL: https://issues.apache.org/jira/browse/SOLR-443 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 1.2 Environment: Tomcat 6.0.14 Reporter: Andrew Schurman Priority: Minor Attachments: solr-443.patch When sending a query via POST, the content-type is not set. The content charset for the POST parameters are set, but this only appears to be used for creating the Content-Length header in the commons library. Since a query is encoded in UTF-8, the http headers should also specify content type charset. On Tomcat, this causes problems when the query string contains non-ascii characters (characters with accents and such) as it tries to parse the POST body in its default ISO-9886-1. There appears to be no way to set/change the default encoding for a message body on Tomcat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-443) POST queries don't declare its charset
[ https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-443: --- Attachment: solr-443.patch Andrew, does this patch work for you? rather then specify the contentType for all POST request, it only adds it for ones that don't specify it within a ContentStream POST queries don't declare its charset -- Key: SOLR-443 URL: https://issues.apache.org/jira/browse/SOLR-443 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 1.2 Environment: Tomcat 6.0.14 Reporter: Andrew Schurman Priority: Minor Attachments: solr-443.patch, solr-443.patch When sending a query via POST, the content-type is not set. The content charset for the POST parameters are set, but this only appears to be used for creating the Content-Length header in the commons library. Since a query is encoded in UTF-8, the http headers should also specify content type charset. On Tomcat, this causes problems when the query string contains non-ascii characters (characters with accents and such) as it tries to parse the POST body in its default ISO-9886-1. There appears to be no way to set/change the default encoding for a message body on Tomcat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.