Hi Michael,

The Stack Overflow link refers to file download more than multipart form
file upload, and I'm already aware of these issues for download.

On the server side, I'm reading in the multipart form data with files using
Apache Commons FileUpload, which can read the filenames supplied by both
Firefox and Chrome, containing characters outside the ISO-8859-1 charset in
the case of Firefox (combining grave accent) and a precomposed "è" in the
case of Chrome (with Chrome it's therefore within the ISO-8859-1 charset).
However, with HTTP client, the charset seems to be US-ASCII for the
filename, and not ISO-8859-1.

With the Firefox developer tools, Network tab, and "edit and resend", I can
see that the filename is encoded in the Content-Disposition field using the
unicode character.

Thanks anyway,
Christopher


On 12 February 2015 at 11:41, Michael Osipov <[email protected]> wrote:

> > Hello,
> >
> > I'm writing a unit test to simulate behavior of different browsers
> > regarding multipart file upload where the filenames may contain letters
> > with accents.  The server has noticed that some browsers (such as Chrome)
> > use composed accents (a single character code point with the character
> and
> > accent, for example "é") and others (such as Firefox) use combining
> > diacriticals ("e" + \u0301).
> >
> > I'm having difficulty simulating this with HTTP client, because when the
> > actual HTTP multipart entity is sent to the server, the filenames of each
> > part are encoded in ASCII.  It seems likely that I need to set the
> charset
> > somewhere, but I don't see where.  Obviously, I set the content-type for
> > the entity (such as "application/octet-stream", which has no character
> set,
> > or perhaps a user-supplied text file in some arbitrary encoding, which
> may
> > differ from one part to another), but surely, the character encoding (for
> > text files) is unrelated to the character encoding of filenames?
> >
> > Below is the relevant part of the test code (I hope it's readable, it's
> the
> > ByteArrayBody constructor which seems to be the issue), followed by the
> > HTTP wire log.
>
> Hi Christopher,
>
> maybe this will help you: http://stackoverflow.com/q/93551/696632 and
> http://greenbytes.de/tech/tc2231/.
>
> Always keep in mind that HTTP headers are always encoded with ISO-8859-1
> as per RFC.
>
> Michael
>

Reply via email to