On Thu, 2015-02-12 at 12:00 +0100, Christopher BROWN wrote: > Hi Michael, > > The Stack Overflow link refers to file download more than multipart form > file upload, and I'm already aware of these issues for download. > > On the server side, I'm reading in the multipart form data with files using > Apache Commons FileUpload, which can read the filenames supplied by both > Firefox and Chrome, containing characters outside the ISO-8859-1 charset in > the case of Firefox (combining grave accent) and a precomposed "è" in the > case of Chrome (with Chrome it's therefore within the ISO-8859-1 charset). > However, with HTTP client, the charset seems to be US-ASCII for the > filename, and not ISO-8859-1. > > With the Firefox developer tools, Network tab, and "edit and resend", I can > see that the filename is encoded in the Content-Disposition field using the > unicode character. > > Thanks anyway, > Christopher >
Christopher Have you tried using MultipartEntityBuilder in RFC6532 mode? RFC6532 permits non-ASCII chars to be present in multipart headers. You may also consider using Apache Mime4J which offers massively more flexibility with regards to MIME content composition compared to HttpMime. Oleg > > On 12 February 2015 at 11:41, Michael Osipov <[email protected]> wrote: > > > > Hello, > > > > > > I'm writing a unit test to simulate behavior of different browsers > > > regarding multipart file upload where the filenames may contain letters > > > with accents. The server has noticed that some browsers (such as Chrome) > > > use composed accents (a single character code point with the character > > and > > > accent, for example "é") and others (such as Firefox) use combining > > > diacriticals ("e" + \u0301). > > > > > > I'm having difficulty simulating this with HTTP client, because when the > > > actual HTTP multipart entity is sent to the server, the filenames of each > > > part are encoded in ASCII. It seems likely that I need to set the > > charset > > > somewhere, but I don't see where. Obviously, I set the content-type for > > > the entity (such as "application/octet-stream", which has no character > > set, > > > or perhaps a user-supplied text file in some arbitrary encoding, which > > may > > > differ from one part to another), but surely, the character encoding (for > > > text files) is unrelated to the character encoding of filenames? > > > > > > Below is the relevant part of the test code (I hope it's readable, it's > > the > > > ByteArrayBody constructor which seems to be the issue), followed by the > > > HTTP wire log. > > > > Hi Christopher, > > > > maybe this will help you: http://stackoverflow.com/q/93551/696632 and > > http://greenbytes.de/tech/tc2231/. > > > > Always keep in mind that HTTP headers are always encoded with ISO-8859-1 > > as per RFC. > > > > Michael > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
