Oleg, Using RFC6532 mode solves the problem, so thanks for that.
I've never looked at Mime4J before, interesting, so thanks for that too. -- Christopher On 12 February 2015 at 12:07, Oleg Kalnichevski <[email protected]> wrote: > On Thu, 2015-02-12 at 12:00 +0100, Christopher BROWN wrote: > > Hi Michael, > > > > The Stack Overflow link refers to file download more than multipart form > > file upload, and I'm already aware of these issues for download. > > > > On the server side, I'm reading in the multipart form data with files > using > > Apache Commons FileUpload, which can read the filenames supplied by both > > Firefox and Chrome, containing characters outside the ISO-8859-1 charset > in > > the case of Firefox (combining grave accent) and a precomposed "è" in the > > case of Chrome (with Chrome it's therefore within the ISO-8859-1 > charset). > > However, with HTTP client, the charset seems to be US-ASCII for the > > filename, and not ISO-8859-1. > > > > With the Firefox developer tools, Network tab, and "edit and resend", I > can > > see that the filename is encoded in the Content-Disposition field using > the > > unicode character. > > > > Thanks anyway, > > Christopher > > > > Christopher > > Have you tried using MultipartEntityBuilder in RFC6532 mode? RFC6532 > permits non-ASCII chars to be present in multipart headers. > > You may also consider using Apache Mime4J which offers massively more > flexibility with regards to MIME content composition compared to > HttpMime. > > Oleg > > > > > On 12 February 2015 at 11:41, Michael Osipov <[email protected]> wrote: > > > > > > Hello, > > > > > > > > I'm writing a unit test to simulate behavior of different browsers > > > > regarding multipart file upload where the filenames may contain > letters > > > > with accents. The server has noticed that some browsers (such as > Chrome) > > > > use composed accents (a single character code point with the > character > > > and > > > > accent, for example "é") and others (such as Firefox) use combining > > > > diacriticals ("e" + \u0301). > > > > > > > > I'm having difficulty simulating this with HTTP client, because when > the > > > > actual HTTP multipart entity is sent to the server, the filenames of > each > > > > part are encoded in ASCII. It seems likely that I need to set the > > > charset > > > > somewhere, but I don't see where. Obviously, I set the content-type > for > > > > the entity (such as "application/octet-stream", which has no > character > > > set, > > > > or perhaps a user-supplied text file in some arbitrary encoding, > which > > > may > > > > differ from one part to another), but surely, the character encoding > (for > > > > text files) is unrelated to the character encoding of filenames? > > > > > > > > Below is the relevant part of the test code (I hope it's readable, > it's > > > the > > > > ByteArrayBody constructor which seems to be the issue), followed by > the > > > > HTTP wire log. > > > > > > Hi Christopher, > > > > > > maybe this will help you: http://stackoverflow.com/q/93551/696632 and > > > http://greenbytes.de/tech/tc2231/. > > > > > > Always keep in mind that HTTP headers are always encoded with > ISO-8859-1 > > > as per RFC. > > > > > > Michael > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
