Re: Problem with form upload and filenames containing accents

Christopher BROWN Thu, 12 Feb 2015 04:33:09 -0800

Oleg,

Using RFC6532 mode solves the problem, so thanks for that.


I've never looked at Mime4J before, interesting, so thanks for that too.

--
Christopher


On 12 February 2015 at 12:07, Oleg Kalnichevski <[email protected]> wrote:

> On Thu, 2015-02-12 at 12:00 +0100, Christopher BROWN wrote:
> > Hi Michael,
> >
> > The Stack Overflow link refers to file download more than multipart form
> > file upload, and I'm already aware of these issues for download.
> >
> > On the server side, I'm reading in the multipart form data with files
> using
> > Apache Commons FileUpload, which can read the filenames supplied by both
> > Firefox and Chrome, containing characters outside the ISO-8859-1 charset
> in
> > the case of Firefox (combining grave accent) and a precomposed "è" in the
> > case of Chrome (with Chrome it's therefore within the ISO-8859-1
> charset).
> > However, with HTTP client, the charset seems to be US-ASCII for the
> > filename, and not ISO-8859-1.
> >
> > With the Firefox developer tools, Network tab, and "edit and resend", I
> can
> > see that the filename is encoded in the Content-Disposition field using
> the
> > unicode character.
> >
> > Thanks anyway,
> > Christopher
> >
>
> Christopher
>
> Have you tried using MultipartEntityBuilder in RFC6532 mode? RFC6532
> permits non-ASCII chars to be present in multipart headers.
>
> You may also consider using Apache Mime4J which offers massively more
> flexibility with regards to MIME content composition compared to
> HttpMime.
>
> Oleg
>
> >
> > On 12 February 2015 at 11:41, Michael Osipov <[email protected]> wrote:
> >
> > > > Hello,
> > > >
> > > > I'm writing a unit test to simulate behavior of different browsers
> > > > regarding multipart file upload where the filenames may contain
> letters
> > > > with accents.  The server has noticed that some browsers (such as
> Chrome)
> > > > use composed accents (a single character code point with the
> character
> > > and
> > > > accent, for example "é") and others (such as Firefox) use combining
> > > > diacriticals ("e" + \u0301).
> > > >
> > > > I'm having difficulty simulating this with HTTP client, because when
> the
> > > > actual HTTP multipart entity is sent to the server, the filenames of
> each
> > > > part are encoded in ASCII.  It seems likely that I need to set the
> > > charset
> > > > somewhere, but I don't see where.  Obviously, I set the content-type
> for
> > > > the entity (such as "application/octet-stream", which has no
> character
> > > set,
> > > > or perhaps a user-supplied text file in some arbitrary encoding,
> which
> > > may
> > > > differ from one part to another), but surely, the character encoding
> (for
> > > > text files) is unrelated to the character encoding of filenames?
> > > >
> > > > Below is the relevant part of the test code (I hope it's readable,
> it's
> > > the
> > > > ByteArrayBody constructor which seems to be the issue), followed by
> the
> > > > HTTP wire log.
> > >
> > > Hi Christopher,
> > >
> > > maybe this will help you: http://stackoverflow.com/q/93551/696632 and
> > > http://greenbytes.de/tech/tc2231/.
> > >
> > > Always keep in mind that HTTP headers are always encoded with
> ISO-8859-1
> > > as per RFC.
> > >
> > > Michael
> > >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Problem with form upload and filenames containing accents

Reply via email to