[
https://issues.apache.org/jira/browse/SHINDIG-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Kohn updated SHINDIG-1981:
----------------------------------
Summary: Wrong encoding of non-file form items in RPC requests with
multipart/form-data (was: Wrong encoding )
> Wrong encoding of non-file form items in RPC requests with multipart/form-data
> ------------------------------------------------------------------------------
>
> Key: SHINDIG-1981
> URL: https://issues.apache.org/jira/browse/SHINDIG-1981
> Project: Shindig
> Issue Type: Bug
> Components: Java
> Affects Versions: 2.5.1
> Reporter: Andreas Kohn
>
> We're using RPC requests with multipart/form-data encoding when uploading
> files. All encoding settings on both frontend and backend are configured to
> UTF-8, to handle non-ASCII content.
> However, even then the content inside the 'request' object was still
> encoding-wise garbage.
> Debugging that showed that when the JsonRpcServlet is parsing the request
> body it assumes that the encoding is either ISO-8859-1 for non-file items, or
> is defined in the Content-Type header on that item.
> In HTML 5 this is both no longer a correct assumption as per
> http://dev.w3.org/html5/spec-preview/constraints.html#multipart-form-data
> {quote}
> If the algorithm was invoked with an explicit character encoding, let the
> selected character encoding be that encoding. (This algorithm is used by
> other specifications, which provide an explicit character encoding to avoid
> the dependency on the form element described in the next paragraph.)
> Otherwise, if the form element has an accept-charset attribute, then, taking
> into account the characters found in the form data set's names and values,
> and the character encodings supported by the user agent, select a character
> encoding from the list given in the form's accept-charset attribute that is
> an ASCII-compatible character encoding. If none of the encodings are
> supported, or if none are listed, then let the selected character encoding be
> UTF-8.
> Otherwise, if the document's character encoding is an ASCII-compatible
> character encoding, then that is the selected character encoding.
> Otherwise, let the selected character encoding be UTF-8.
> {quote}
> and
> {quote}
> The parts of the generated multipart/form-data resource that correspond to
> non-file fields must not have a Content-Type header specified. Their names
> and values must be encoded using the character encoding selected above (field
> names in particular do not get converted to a 7-bit safe encoding as
> suggested in RFC 2388).
> {quote}
> The patch in the review https://reviews.apache.org/r/24449/ fixes the problem
> for us, by using the request encoding as a default when the content-type
> header does not specify any other encoding.
> I've tested this with firefox on linux, and am currently checking that it
> still works as expected with IE and chrome.
--
This message was sent by Atlassian JIRA
(v6.2#6252)