Confusing handling of multipart/form-data elements
--------------------------------------------------
Key: WEB-171
URL: http://jira.nuxeo.org/browse/WEB-171
Project: Nuxeo Web Engine
Issue Type: Bug
Environment: all
Reporter: Arnaud Bailly
Assignee: Bogdan Stefanescu
multipart/form-data is the normal encoding type for html forms when one needs
to upload a file. In Nuxeo Web Engine, it is handled by the FormData class and
ultimately by the commons-fileupload component. When one has <input> elements
in the form, and put non-ascii characters in these input elements, encoding
issues may arise.
What we have observed is the following:
- content is sent from client (ffox3, ie6) to server using the page encoding
(eg. utf-8)
- parts are read as raw bytes byt FileUploadServlet and other
commons-components
- form inputs data can be extracted by the application using FormData API, eg.
getString() and others. These methods retrieve the FileItem object and extract
its content using getString() *without* specifying an encoding.
commons-fileupload decodes the input data using either a passed charset, the
specified charset from the POST data, or a default charset which is
ISO-8859-1.
- form data is then retrieved garbled through bad encoding (latin1 instead of
utf-8)
It is not clear whether this is the responsibility of the backend (ie.
commons-fileupload), the client of FormData. On the client side, it does not
seem possible to specify charsets for individual input fields (at least in
standard HTML). Fileupload offers methods to specify encoding, so it seems
simpler to allow specifying the encoding in the FormData object.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.nuxeo.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
_______________________________________________
ECM-tickets mailing list
[email protected]
http://lists.nuxeo.com/mailman/listinfo/ecm-tickets