Re: Odd encoding of servlet parameters

2008-11-27 Thread André Warnier

Chris Mannion wrote:

Hi All

I've recently started having a problem with one of the servlets I'm
running on a Tomcat 5.5 system.  The code of the servlet hasn't
changed at all so I'm wondering if there are any Tomcat settings that
could affect this kind of thing or if anyone has come across a similar
problem before.

The servlet in question accepts XML data that is posted to it as a URL
parameter called 'xml'.  The code to retrieve the XML as a String
(which is then used to build a document object) is simply -

String xmlMessage = req.getParameter(xml);

- where req is the HttpServletRequest object.  Until recently this has
worked fine with the XML being received properly formatted -
?xml version=1.0 encoding=UTF-8?
  records
record...
etc.

However, recently something has changed and the XML is now being
retrieved from the request object with escape characters in, so the
above has become -
lt;xml version=quot;1.0quot; encoding=quot;UTF-8quot;?gt;
  lt;recordsgt;
lt;recordgt;

Before sending the XML is encoded using the java.net.URLEncoder object
and the UTF-8 character set, but using a java.net.URLDecoder on
receiving it does not get rid of the encoded characters.  I did some
reading about a possible Tomcat 6.0 bug and so tried explicitly
setting the character encoding (req.setCharacterEncoder(UTF-8))
before retrieving the parameter but that had no effect either and even
if there's something that could explicitly decode the lt; gt; etc. I
couldn't use it as the XML data often contains characters like amp;
which have to remain encoded to keep the XML valid.

As I said, this problem started without the servlet code having
changed at all so is there any Tomcat setting that could be
responsible for this?


Just a couple of indirect comments on the above.

In your post, you seem to indicate that you also control the client 
which sends the request to Tomcat.
If so, and for that kind of data, might it not be better to send the 
data in the body of a request, instead of in the URL ?
That is probably not the bottom reason of the issue you describe above, 
but it may avoid similar questions of encoding in the future.

(check the HTTP POST method, and enctype=multipart/form-data)
It will also avoid the case where your data gets so long that the 
request URLs (and thus your data) get cut off at a certain length.


Next, the way you indicate that the data is now received, shows an html 
 style encoding, rather than a URL style encoding.
If the data was now URL-encoded, it would not have (for example) 
quot; replacing a quotation mark, but it would have some %xy sequence 
instead (where xy is the iso-8859-1 codepoint of the character, 
expressed in hexdecimal digits).
What I mean is that it is very unlikely that this encoding just happens 
automatically due to some protocol layer at the browser or HTTP server 
level.  There must be something that explicitly encodes your original 
request data in this way, before it even gets put in a URL.


I guess what I am trying to say, is that maybe you are looking in the 
wrong place for your problem, by focusing on the receiving Tomcat side 
first. I believe you should first have a good look at the sending side.




-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Odd encoding of servlet parameters

2008-11-27 Thread Chris Mannion
André

Thanks for the comments, I will definitely look into the approach of
sending the data in the request body, probably something that should
have been done originally.

It's true that the program sending the data is ours as well but I
don't suspect it to be the culprit because the problem doesn't occur
in a way consistent with that.  For example, I can send data from my
local client to my local server and it arrives intact but when I send
the same data from the same client to the problem server, it arrives
with the HTML encoding.  And, in fact, the sending program has been
distributed to several customers who use it with the same results,
uploads to a test server arrive well formed, to the problem server
they are HTML encoded.  And it's the fact that both servers are
running the exact same code that receives the upload that made me
wonder if it could be a Tomcat setting that was causing the problem.

2008/11/27 André Warnier [EMAIL PROTECTED]:
 Chris Mannion wrote:

 Hi All

 I've recently started having a problem with one of the servlets I'm
 running on a Tomcat 5.5 system.  The code of the servlet hasn't
 changed at all so I'm wondering if there are any Tomcat settings that
 could affect this kind of thing or if anyone has come across a similar
 problem before.

 The servlet in question accepts XML data that is posted to it as a URL
 parameter called 'xml'.  The code to retrieve the XML as a String
 (which is then used to build a document object) is simply -

 String xmlMessage = req.getParameter(xml);

 - where req is the HttpServletRequest object.  Until recently this has
 worked fine with the XML being received properly formatted -
 ?xml version=1.0 encoding=UTF-8?
  records
record...
 etc.

 However, recently something has changed and the XML is now being
 retrieved from the request object with escape characters in, so the
 above has become -
 lt;xml version=quot;1.0quot; encoding=quot;UTF-8quot;?gt;
  lt;recordsgt;
lt;recordgt;

 Before sending the XML is encoded using the java.net.URLEncoder object
 and the UTF-8 character set, but using a java.net.URLDecoder on
 receiving it does not get rid of the encoded characters.  I did some
 reading about a possible Tomcat 6.0 bug and so tried explicitly
 setting the character encoding (req.setCharacterEncoder(UTF-8))
 before retrieving the parameter but that had no effect either and even
 if there's something that could explicitly decode the lt; gt; etc. I
 couldn't use it as the XML data often contains characters like amp;
 which have to remain encoded to keep the XML valid.

 As I said, this problem started without the servlet code having
 changed at all so is there any Tomcat setting that could be
 responsible for this?

 Just a couple of indirect comments on the above.

 In your post, you seem to indicate that you also control the client which
 sends the request to Tomcat.
 If so, and for that kind of data, might it not be better to send the data in
 the body of a request, instead of in the URL ?
 That is probably not the bottom reason of the issue you describe above, but
 it may avoid similar questions of encoding in the future.
 (check the HTTP POST method, and enctype=multipart/form-data)
 It will also avoid the case where your data gets so long that the request
 URLs (and thus your data) get cut off at a certain length.

 Next, the way you indicate that the data is now received, shows an html
  style encoding, rather than a URL style encoding.
 If the data was now URL-encoded, it would not have (for example) quot;
 replacing a quotation mark, but it would have some %xy sequence instead
 (where xy is the iso-8859-1 codepoint of the character, expressed in
 hexdecimal digits).
 What I mean is that it is very unlikely that this encoding just happens
 automatically due to some protocol layer at the browser or HTTP server
 level.  There must be something that explicitly encodes your original
 request data in this way, before it even gets put in a URL.

 I guess what I am trying to say, is that maybe you are looking in the wrong
 place for your problem, by focusing on the receiving Tomcat side first. I
 believe you should first have a good look at the sending side.



 -
 To start a new topic, e-mail: users@tomcat.apache.org
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-- 
Chris Mannion
iCasework and LocalAlert implementation team
0208 144 4416

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Odd encoding of servlet parameters

2008-11-27 Thread Caldarale, Charles R
 From: Chris Mannion [mailto:[EMAIL PROTECTED]
 Subject: Re: Odd encoding of servlet parameters

 It's true that the program sending the data is ours as well but I
 don't suspect it to be the culprit because the problem doesn't occur
 in a way consistent with that.

Get Wireshark captures of both the correct and incorrect message flow, and stop 
speculating.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Odd encoding of servlet parameters

2008-11-27 Thread Michael Ludwig
Chris Mannion schrieb am 27.11.2008 um 10:17:43 (+):
 
 The servlet in question accepts XML data that is posted to it as a URL
 parameter called 'xml'.

Posted as a URL parameter? POST or GET? You want a POST here.

 lt;xml version=quot;1.0quot; encoding=quot;UTF-8quot;?gt;
   lt;recordsgt;
 lt;recordgt;

That's XML as a string, as if intended for inclusion in XML. Markup is
parsed as a string, hence the entity references.

http://www.w3.org/TR/REC-xml/#sec-predefined-ent

 Before sending the XML is encoded using the java.net.URLEncoder object
 and the UTF-8 character set, but using a java.net.URLDecoder on
 receiving it does not get rid of the encoded characters.

Nothing to do with URL encoding, which looks different.

Nothing to do with the character encoding. (Note that all escaped
characters are in the ASCII character set.)

Some element in the chain gets the parsing wrong. I'd suspect the
producer of this XML first.

Michael Ludwig

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]