Re: Problem with form upload and filenames containing accents

Oleg Kalnichevski Thu, 12 Feb 2015 03:08:14 -0800

On Thu, 2015-02-12 at 12:00 +0100, Christopher BROWN wrote:
> Hi Michael,
> 
> The Stack Overflow link refers to file download more than multipart form
> file upload, and I'm already aware of these issues for download.
> 
> On the server side, I'm reading in the multipart form data with files using
> Apache Commons FileUpload, which can read the filenames supplied by both
> Firefox and Chrome, containing characters outside the ISO-8859-1 charset in
> the case of Firefox (combining grave accent) and a precomposed "è" in the
> case of Chrome (with Chrome it's therefore within the ISO-8859-1 charset).
> However, with HTTP client, the charset seems to be US-ASCII for the
> filename, and not ISO-8859-1.
> 
> With the Firefox developer tools, Network tab, and "edit and resend", I can
> see that the filename is encoded in the Content-Disposition field using the
> unicode character.
> 
> Thanks anyway,
> Christopher
>


Christopher

Have you tried using MultipartEntityBuilder in RFC6532 mode? RFC6532
permits non-ASCII chars to be present in multipart headers.

You may also consider using Apache Mime4J which offers massively more
flexibility with regards to MIME content composition compared to
HttpMime.

Oleg

> 
> On 12 February 2015 at 11:41, Michael Osipov <[email protected]> wrote:
> 
> > > Hello,
> > >
> > > I'm writing a unit test to simulate behavior of different browsers
> > > regarding multipart file upload where the filenames may contain letters
> > > with accents.  The server has noticed that some browsers (such as Chrome)
> > > use composed accents (a single character code point with the character
> > and
> > > accent, for example "é") and others (such as Firefox) use combining
> > > diacriticals ("e" + \u0301).
> > >
> > > I'm having difficulty simulating this with HTTP client, because when the
> > > actual HTTP multipart entity is sent to the server, the filenames of each
> > > part are encoded in ASCII.  It seems likely that I need to set the
> > charset
> > > somewhere, but I don't see where.  Obviously, I set the content-type for
> > > the entity (such as "application/octet-stream", which has no character
> > set,
> > > or perhaps a user-supplied text file in some arbitrary encoding, which
> > may
> > > differ from one part to another), but surely, the character encoding (for
> > > text files) is unrelated to the character encoding of filenames?
> > >
> > > Below is the relevant part of the test code (I hope it's readable, it's
> > the
> > > ByteArrayBody constructor which seems to be the issue), followed by the
> > > HTTP wire log.
> >
> > Hi Christopher,
> >
> > maybe this will help you: http://stackoverflow.com/q/93551/696632 and
> > http://greenbytes.de/tech/tc2231/.
> >
> > Always keep in mind that HTTP headers are always encoded with ISO-8859-1
> > as per RFC.
> >
> > Michael
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Problem with form upload and filenames containing accents

Reply via email to