Hi Robert,


I presume you're a victim of the same syndrom as we had.
I have written this somewhere already but... I can't find it anymore, here is the issue�


- content-encoding header only allows value something like form-data... nothing meaning encoding of the characters in here, in particular how to convert the unicode character ò to some %xx value... (making it %F2 would mean using iso-8859-1).
- what can browsers do ? either ask the user (some browsers have this in preferences) or just use the same encoding as received, this is generally the wise choice...
- what can sever-containers do ? Well... they don't know, they have no clue what was the browser-page all this was coming from... so they just convert the bytes to a string matching %F2 to ò hence giving very weird result if UTF-8 is used...


We do all in UTF-8, russian, french, and math characters were our interest.
Our solution came as follows, once we had guessed that into Tomcat: write a little converter that contains an InputStreamReader(pig,"UTF-8") and read from there with pig defined to be something like a ByteArrayInputStream(request.getParam("xx").getBytes()).


Since then, we're happy.
But one day, one should file a bug on the HTML specification...

Hope that helps.

Paul



Robert Priest wrote:
and the following does not help:

 try
  {
  fileName = new String(cd.substring(start + 10,
end).trim().getBytes("UTF-8"));
  }
 catch (java.io.UnsupportedEncodingException uee)
  {
  }

-----Original Message-----
From: Robert Priest [mailto:[EMAIL PROTECTED]
Sent: Wednesday, September 17, 2003 11:19 AM
To: '[EMAIL PROTECTED]'
Subject: [FileUpload] Unicode Encoding for a Form


Hello all,


I have a simple html form which has an <INPUT TYPE="FILE"/> field in it.

Now when I select a file that contains Scandanavian characters (such as
umlauts) it is not being URL encoded properly before being sent. As a
result,  my jsp page which accepts posts of files via the FileUpload package
is not interpreting the file name correctly.

Has anyone seen this problem, first? And does anyone have a solution for
this issue?


For example, if I select a file say:


filename="C:\Documents and Settings\Robert.Priest\Desktop\���.txt"

what is sent in the request is:

C:\Documents and Settings\Robert.Priest\Desktop\???.txt"


and what is seen by if you do a FileItem.getName() is:


C:\Documents and Settings\Robert.Priest\Desktop\???.txt


So the method FileUploadBase.getFileName(Map /* String, String */ headers)
does not see the correct filename when it executes:


 if (start != -1 && end != -1)
            {
                fileName = cd.substring(start + 10, end).trim();
            }


The following is the multipart requests that IE sends using such a file (with umlauts) in the name: ------------------------------


POST /jsp/upload.jsp HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms- powerpoint, application/vnd.ms-excel, application/msword, application/x-shockwav e-flash, */* Referer: http://localhost:8080/roberttest/rptest.html Accept-Language: en-us Content-Type: multipart/form-data; boundary=---------------------------7d39eb580 29a Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Host: localhost:2000 Content-Length: 349 Connection: Keep-Alive Cache-Control: no-cache



--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to