Hi Michael, The Stack Overflow link refers to file download more than multipart form file upload, and I'm already aware of these issues for download.
On the server side, I'm reading in the multipart form data with files using Apache Commons FileUpload, which can read the filenames supplied by both Firefox and Chrome, containing characters outside the ISO-8859-1 charset in the case of Firefox (combining grave accent) and a precomposed "è" in the case of Chrome (with Chrome it's therefore within the ISO-8859-1 charset). However, with HTTP client, the charset seems to be US-ASCII for the filename, and not ISO-8859-1. With the Firefox developer tools, Network tab, and "edit and resend", I can see that the filename is encoded in the Content-Disposition field using the unicode character. Thanks anyway, Christopher On 12 February 2015 at 11:41, Michael Osipov <[email protected]> wrote: > > Hello, > > > > I'm writing a unit test to simulate behavior of different browsers > > regarding multipart file upload where the filenames may contain letters > > with accents. The server has noticed that some browsers (such as Chrome) > > use composed accents (a single character code point with the character > and > > accent, for example "é") and others (such as Firefox) use combining > > diacriticals ("e" + \u0301). > > > > I'm having difficulty simulating this with HTTP client, because when the > > actual HTTP multipart entity is sent to the server, the filenames of each > > part are encoded in ASCII. It seems likely that I need to set the > charset > > somewhere, but I don't see where. Obviously, I set the content-type for > > the entity (such as "application/octet-stream", which has no character > set, > > or perhaps a user-supplied text file in some arbitrary encoding, which > may > > differ from one part to another), but surely, the character encoding (for > > text files) is unrelated to the character encoding of filenames? > > > > Below is the relevant part of the test code (I hope it's readable, it's > the > > ByteArrayBody constructor which seems to be the issue), followed by the > > HTTP wire log. > > Hi Christopher, > > maybe this will help you: http://stackoverflow.com/q/93551/696632 and > http://greenbytes.de/tech/tc2231/. > > Always keep in mind that HTTP headers are always encoded with ISO-8859-1 > as per RFC. > > Michael >
