Re: Odd encoding of servlet parameters
Chris Mannion wrote: Hi All I've recently started having a problem with one of the servlets I'm running on a Tomcat 5.5 system. The code of the servlet hasn't changed at all so I'm wondering if there are any Tomcat settings that could affect this kind of thing or if anyone has come across a similar problem before. The servlet in question accepts XML data that is posted to it as a URL parameter called 'xml'. The code to retrieve the XML as a String (which is then used to build a document object) is simply - String xmlMessage = req.getParameter(xml); - where req is the HttpServletRequest object. Until recently this has worked fine with the XML being received properly formatted - ?xml version=1.0 encoding=UTF-8? records record... etc. However, recently something has changed and the XML is now being retrieved from the request object with escape characters in, so the above has become - lt;xml version=quot;1.0quot; encoding=quot;UTF-8quot;?gt; lt;recordsgt; lt;recordgt; Before sending the XML is encoded using the java.net.URLEncoder object and the UTF-8 character set, but using a java.net.URLDecoder on receiving it does not get rid of the encoded characters. I did some reading about a possible Tomcat 6.0 bug and so tried explicitly setting the character encoding (req.setCharacterEncoder(UTF-8)) before retrieving the parameter but that had no effect either and even if there's something that could explicitly decode the lt; gt; etc. I couldn't use it as the XML data often contains characters like amp; which have to remain encoded to keep the XML valid. As I said, this problem started without the servlet code having changed at all so is there any Tomcat setting that could be responsible for this? Just a couple of indirect comments on the above. In your post, you seem to indicate that you also control the client which sends the request to Tomcat. If so, and for that kind of data, might it not be better to send the data in the body of a request, instead of in the URL ? That is probably not the bottom reason of the issue you describe above, but it may avoid similar questions of encoding in the future. (check the HTTP POST method, and enctype=multipart/form-data) It will also avoid the case where your data gets so long that the request URLs (and thus your data) get cut off at a certain length. Next, the way you indicate that the data is now received, shows an html style encoding, rather than a URL style encoding. If the data was now URL-encoded, it would not have (for example) quot; replacing a quotation mark, but it would have some %xy sequence instead (where xy is the iso-8859-1 codepoint of the character, expressed in hexdecimal digits). What I mean is that it is very unlikely that this encoding just happens automatically due to some protocol layer at the browser or HTTP server level. There must be something that explicitly encodes your original request data in this way, before it even gets put in a URL. I guess what I am trying to say, is that maybe you are looking in the wrong place for your problem, by focusing on the receiving Tomcat side first. I believe you should first have a good look at the sending side. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Odd encoding of servlet parameters
André Thanks for the comments, I will definitely look into the approach of sending the data in the request body, probably something that should have been done originally. It's true that the program sending the data is ours as well but I don't suspect it to be the culprit because the problem doesn't occur in a way consistent with that. For example, I can send data from my local client to my local server and it arrives intact but when I send the same data from the same client to the problem server, it arrives with the HTML encoding. And, in fact, the sending program has been distributed to several customers who use it with the same results, uploads to a test server arrive well formed, to the problem server they are HTML encoded. And it's the fact that both servers are running the exact same code that receives the upload that made me wonder if it could be a Tomcat setting that was causing the problem. 2008/11/27 André Warnier [EMAIL PROTECTED]: Chris Mannion wrote: Hi All I've recently started having a problem with one of the servlets I'm running on a Tomcat 5.5 system. The code of the servlet hasn't changed at all so I'm wondering if there are any Tomcat settings that could affect this kind of thing or if anyone has come across a similar problem before. The servlet in question accepts XML data that is posted to it as a URL parameter called 'xml'. The code to retrieve the XML as a String (which is then used to build a document object) is simply - String xmlMessage = req.getParameter(xml); - where req is the HttpServletRequest object. Until recently this has worked fine with the XML being received properly formatted - ?xml version=1.0 encoding=UTF-8? records record... etc. However, recently something has changed and the XML is now being retrieved from the request object with escape characters in, so the above has become - lt;xml version=quot;1.0quot; encoding=quot;UTF-8quot;?gt; lt;recordsgt; lt;recordgt; Before sending the XML is encoded using the java.net.URLEncoder object and the UTF-8 character set, but using a java.net.URLDecoder on receiving it does not get rid of the encoded characters. I did some reading about a possible Tomcat 6.0 bug and so tried explicitly setting the character encoding (req.setCharacterEncoder(UTF-8)) before retrieving the parameter but that had no effect either and even if there's something that could explicitly decode the lt; gt; etc. I couldn't use it as the XML data often contains characters like amp; which have to remain encoded to keep the XML valid. As I said, this problem started without the servlet code having changed at all so is there any Tomcat setting that could be responsible for this? Just a couple of indirect comments on the above. In your post, you seem to indicate that you also control the client which sends the request to Tomcat. If so, and for that kind of data, might it not be better to send the data in the body of a request, instead of in the URL ? That is probably not the bottom reason of the issue you describe above, but it may avoid similar questions of encoding in the future. (check the HTTP POST method, and enctype=multipart/form-data) It will also avoid the case where your data gets so long that the request URLs (and thus your data) get cut off at a certain length. Next, the way you indicate that the data is now received, shows an html style encoding, rather than a URL style encoding. If the data was now URL-encoded, it would not have (for example) quot; replacing a quotation mark, but it would have some %xy sequence instead (where xy is the iso-8859-1 codepoint of the character, expressed in hexdecimal digits). What I mean is that it is very unlikely that this encoding just happens automatically due to some protocol layer at the browser or HTTP server level. There must be something that explicitly encodes your original request data in this way, before it even gets put in a URL. I guess what I am trying to say, is that maybe you are looking in the wrong place for your problem, by focusing on the receiving Tomcat side first. I believe you should first have a good look at the sending side. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Chris Mannion iCasework and LocalAlert implementation team 0208 144 4416 - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Odd encoding of servlet parameters
From: Chris Mannion [mailto:[EMAIL PROTECTED] Subject: Re: Odd encoding of servlet parameters It's true that the program sending the data is ours as well but I don't suspect it to be the culprit because the problem doesn't occur in a way consistent with that. Get Wireshark captures of both the correct and incorrect message flow, and stop speculating. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Odd encoding of servlet parameters
Chris Mannion schrieb am 27.11.2008 um 10:17:43 (+): The servlet in question accepts XML data that is posted to it as a URL parameter called 'xml'. Posted as a URL parameter? POST or GET? You want a POST here. lt;xml version=quot;1.0quot; encoding=quot;UTF-8quot;?gt; lt;recordsgt; lt;recordgt; That's XML as a string, as if intended for inclusion in XML. Markup is parsed as a string, hence the entity references. http://www.w3.org/TR/REC-xml/#sec-predefined-ent Before sending the XML is encoded using the java.net.URLEncoder object and the UTF-8 character set, but using a java.net.URLDecoder on receiving it does not get rid of the encoded characters. Nothing to do with URL encoding, which looks different. Nothing to do with the character encoding. (Note that all escaped characters are in the ASCII character set.) Some element in the chain gets the parsing wrong. I'd suspect the producer of this XML first. Michael Ludwig - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]