Thanks Paul. When I get a chance, I try your code. I am swamped with other work that is taking precedent right now.
-----Original Message----- From: Paul Libbrecht [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 17, 2003 5:07 PM To: Jakarta Commons Users List Subject: Re: [FileUpload] Unicode Encoding for a Form Hi Robert, I presume you're a victim of the same syndrom as we had. I have written this somewhere already but... I can't find it anymore, here is the issue� - content-encoding header only allows value something like form-data... nothing meaning encoding of the characters in here, in particular how to convert the unicode character ò to some %xx value... (making it %F2 would mean using iso-8859-1). - what can browsers do ? either ask the user (some browsers have this in preferences) or just use the same encoding as received, this is generally the wise choice... - what can sever-containers do ? Well... they don't know, they have no clue what was the browser-page all this was coming from... so they just convert the bytes to a string matching %F2 to ò hence giving very weird result if UTF-8 is used... We do all in UTF-8, russian, french, and math characters were our interest. Our solution came as follows, once we had guessed that into Tomcat: write a little converter that contains an InputStreamReader(pig,"UTF-8") and read from there with pig defined to be something like a ByteArrayInputStream(request.getParam("xx").getBytes()). Since then, we're happy. But one day, one should file a bug on the HTML specification... Hope that helps. Paul Robert Priest wrote: > and the following does not help: > > try > { > fileName = new String(cd.substring(start + 10, > end).trim().getBytes("UTF-8")); > } > catch (java.io.UnsupportedEncodingException uee) > { > } > > -----Original Message----- > From: Robert Priest [mailto:[EMAIL PROTECTED] > Sent: Wednesday, September 17, 2003 11:19 AM > To: '[EMAIL PROTECTED]' > Subject: [FileUpload] Unicode Encoding for a Form > > > Hello all, > > I have a simple html form which has an <INPUT TYPE="FILE"/> field in it. > > Now when I select a file that contains Scandanavian characters (such as > umlauts) it is not being URL encoded properly before being sent. As a > result, my jsp page which accepts posts of files via the FileUpload package > is not interpreting the file name correctly. > > Has anyone seen this problem, first? And does anyone have a solution for > this issue? > > > For example, if I select a file say: > > filename="C:\Documents and Settings\Robert.Priest\Desktop\���.txt" > > what is sent in the request is: > > C:\Documents and Settings\Robert.Priest\Desktop\???.txt" > > > and what is seen by if you do a FileItem.getName() is: > > C:\Documents and Settings\Robert.Priest\Desktop\???.txt > > > So the method FileUploadBase.getFileName(Map /* String, String */ headers) > does not see the correct filename when it executes: > > if (start != -1 && end != -1) > { > fileName = cd.substring(start + 10, end).trim(); > } > > > The following is the multipart requests that IE sends using such a file > (with umlauts) in the name: > ------------------------------ > > > POST /jsp/upload.jsp HTTP/1.1 > Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, > application/vnd.ms- > powerpoint, application/vnd.ms-excel, application/msword, > application/x-shockwav > e-flash, */* > Referer: http://localhost:8080/roberttest/rptest.html > Accept-Language: en-us > Content-Type: multipart/form-data; > boundary=---------------------------7d39eb580 > 29a > Accept-Encoding: gzip, deflate > User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) > Host: localhost:2000 > Content-Length: 349 > Connection: Keep-Alive > Cache-Control: no-cache --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
