Re: form urlencoding, was Re: URI query escapes

2003-06-22 Thread Sung-Gu
> If not pre-encoded, the URI would look something like
> "http://host/path?param1=value&1¶m2=value=2";.
>
> Once joined together it is too late to encode.  This is why the
> HttpMethod(String) constructor assumes that all URIs are already
> encoded, as it is not possible to correctly encoded them after the fact.
>
> Please let me know if you will be writing some code for this as I will
> take care of it tomorrow otherwise.

When you regard an URI class as URI core manipulation (It means you never
manipulate any URI components in your code yourself, here in any of
commons-httpclient) in your whole code, it doesn't matter...
Because my URI maniuplates both escaped and unscaped components correctly...
That's the point for the real use of the URI class.   It considers URI on
user side (escaped & unscaped) and even communication side (probably only
escaped preffered).

Sung-Gu

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: form urlencoding, was Re: URI query escapes

2003-06-22 Thread Oleg Kalnichevski
Mike, Laura, Adrian

In their pre-Java 1.4.1 form URLEncoder/URLDecoder classes are pretty
much unusable, as these classes always use default system charset, which
sometimes is not good enough. For instance, there's no way to properly
encode strings that simultaneously contain Cyrillic letters and Latin
accents, as both KOI8-R (default Russian encoding on Unix platforms) &
Win1251  (default Russian encoding on Windows platforms) are 8bit
charsets. One would need to use UTF-8, however, standard pre-Java 1.4
URLEncoder does not provide a means of specifying an alternative
charset. 

We have to live with URIUtil for 2.0 release. In the future I would
suggest moving URL encoding logic into Commons-Codec

Oleg


> My only guess is that URLEncoder may not handle character encodings 
> correctly.  I agree that we might as well stick with the code we 
> already have (once fixed).
> 
> Mike
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: form urlencoding, was Re: URI query escapes

2003-06-21 Thread Michael Becke
On Saturday, June 21, 2003, at 11:47 PM, Laura Werner wrote:

I agree, as long as URLEncoder seems to work.

Do you think we need to modify URI so that it uses URLEncoder to 
encode the query part of URIs?  In cases where a client has a URL 
string that may or may not contain query parameters, it would lead to 
a slightly more natural API usage:
 HttpMethod meth = new GetMethod(new URI(urlString));
as opposed to
 String query = null;
 int index = urlString.indexOf('?');
 if (index != -1) {
   query = urlString.substring(index+1);
   urlString = urlString.substring(0, index);
 }
 HttpMethod meth = new GetMethod(new URI(urlString));
 meth.setQueryString(java.net.URLEncoder.encode(query));
or something like that, with error checking of course.
I don't think URI should be doing any form urlencoding.  The URI spec 
does not use the concept of query params.  It just treats the entire 
query as a single entity.

Also, when creating a URI containing query params, the params must be 
encoded before the URI is generated otherwise it will not be parsable.  
For example, when creating a URI with the following query params:

NameValue
param1  value&1
param2  value=2
If not pre-encoded, the URI would look something like 
"http://host/path?param1=value&1¶m2=value=2";.

Once joined together it is too late to encode.  This is why the 
HttpMethod(String) constructor assumes that all URIs are already 
encoded, as it is not possible to correctly encoded them after the fact.

Please let me know if you will be writing some code for this as I will 
take care of it tomorrow otherwise.

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: form urlencoding, was Re: URI query escapes

2003-06-21 Thread Michael Becke
I know in the product we develop, we switched away from using 
java.net.URLEncoder because it didn't work properly.  Unfortunately 
the decision was made before my time so I'm not entirely sure of the 
details and it could well be that the bugs were only present back in 
JRE 1.1.  I'd say that if we have our own code already I'd continue to 
use it, but if not just test java.net.URLEncoder and I'll see if I can 
find out from some of the old timers at work exactly why we don't use 
it.

I've known us to not use something for legacy reasons or because it 
showed up bugs in our code as well, so don't take this as an 
objection, just a word of caution.
My only guess is that URLEncoder may not handle character encodings 
correctly.  I agree that we might as well stick with the code we 
already have (once fixed).

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: form urlencoding, was Re: URI query escapes

2003-06-21 Thread Laura Werner
Michael Becke wrote:

I propose that we:
  - form urlencode values passed to 
HttpMethodBase.setQueryString(NameValuePair[])
 - use java.net.URLEncoder for form urlencoding
I agree, as long as URLEncoder seems to work.

Do you think we need to modify URI so that it uses URLEncoder to encode 
the query part of URIs?  In cases where a client has a URL string that 
may or may not contain query parameters, it would lead to a slightly 
more natural API usage:
 HttpMethod meth = new GetMethod(new URI(urlString));
as opposed to
 String query = null;
 int index = urlString.indexOf('?');
 if (index != -1) {
   query = urlString.substring(index+1);
   urlString = urlString.substring(0, index);
 }
 HttpMethod meth = new GetMethod(new URI(urlString));
 meth.setQueryString(java.net.URLEncoder.encode(query));
or something like that, with error checking of course.

I'm not sure how much I care, though.  If my fetching code had been 
constructed using the HttpClient code from scratch, I wouldn't even have 
the query parameters in the string in the first place; I'd just add them 
with setQueryString.

I'll see if I can work up a preliminary patch for this stuff later 
tonight or tomorrow morning.   

Adrian Sutton wrote:

I know in the product we develop, we switched away from using 
java.net.URLEncoder because it didn't work properly
FWIW, we're using it and haven't seen any problems.  But we've been on 
1.2 or higher since I started at BeVocal.  (We're moving to 1.4 now 
because the server VM performance is *much* better.)

--Laura



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: form urlencoding, was Re: URI query escapes

2003-06-21 Thread Adrian Sutton
 I am also wondering why we are not using the java.net.URLEncoder for 
this (is also does not encode *-_.).
I know in the product we develop, we switched away from using 
java.net.URLEncoder because it didn't work properly.  Unfortunately the 
decision was made before my time so I'm not entirely sure of the 
details and it could well be that the bugs were only present back in 
JRE 1.1.  I'd say that if we have our own code already I'd continue to 
use it, but if not just test java.net.URLEncoder and I'll see if I can 
find out from some of the old timers at work exactly why we don't use 
it.

I've known us to not use something for legacy reasons or because it 
showed up bugs in our code as well, so don't take this as an objection, 
just a word of caution.

Mike
Regards,

Adrian Sutton

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


form urlencoding, was Re: URI query escapes

2003-06-21 Thread Michael Becke
Laura,

After looking into this some more I agree that query parameters should 
be form urlencoded.  The query param=value convention is an HTML 
specification and has nothing to do with URIs.  Fortunately it turns 
out that form urlencoded values conform to the specification for URI 
queries.

While investigating this question I discovered something else about 
form urlencoding that is strange.  It seems that the characters *, -, _ 
and . are not encoded by browsers (IE, Mozilla and Safari).  I am not 
sure why this is from reading the spec but it seems fairly consistent.  
I am also wondering why we are not using the java.net.URLEncoder for 
this (is also does not encode *-_.).

I propose that we:

 - form urlencode values passed to 
HttpMethodBase.setQueryString(NameValuePair[])
 - use java.net.URLEncoder for form urlencoding

Any thoughts, objections?

Mike

On Friday, June 20, 2003, at 06:05 PM, Laura Werner wrote:

Michael Becke wrote:

Yes, but this is for application/x-www-form-urlencoded values. 
Currently we only assume this content type for post params (this was 
recently fixed).
I think we have to assume it for get params too.  In the HTTP 4.01 
spec, 17.13.3.4 
:

If the method is "get" and the action is an HTTP URI, the user agent 
takes the value of action, appends a `?' to it, then appends the form 
data set, encoded using the "application/x-www-form-urlencoded" 
content type.  The user agent then traverses the link to this URI. In 
this scenario, form data are restricted to ASCII codes.
So urlencoded seems like the right default for "get" query parameters.

-- Laura

BTW, how do you all feel about newsgroup posts in HTML format?  I left 
this one in HTML because of all the links, but I'll stop if any of you 
have news readers that can't deal with it.



-
To unsubscribe, e-mail: 
[EMAIL PROTECTED]
For additional commands, e-mail: 
[EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: URI query escapes

2003-06-20 Thread Laura Werner
Michael Becke wrote:

Yes, but this is for application/x-www-form-urlencoded values. 
Currently we only assume this content type for post params (this was 
recently fixed).
I think we have to assume it for get params too.  In the HTTP 4.01 spec, 
17.13.3.4 :

If the method is "get" and the action is an HTTP URI, the user agent 
takes the value of action, appends a `?' to it, then appends the form 
data set, encoded using the "application/x-www-form-urlencoded" 
content type.  The user agent then traverses the link to this URI. In 
this scenario, form data are restricted to ASCII codes.
So urlencoded seems like the right default for "get" query parameters.

-- Laura

BTW, how do you all feel about newsgroup posts in HTML format?  I left 
this one in HTML because of all the links, but I'll stop if any of you 
have news readers that can't deal with it.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: URI query escapes

2003-06-20 Thread Michael Becke
I wish I'd done a bit more digging before my original post.  It turns 
out the problem is that parameters were being encoded *twice*.  We were 
encoding them once on our own, then the URI constructor was encoding 
them again.  My "Mountain View" example was being encoded as 
"Mountain%2520View".  If we encode them only once (even with %20 rather 
than +) everything works.
Good I'm glad that's working now.

 >... are we correctly handling query params by URI encoding them?

Probably not.  HTTP 4.01 
 says:

application/x-www-form-urlencoded

This is the default content type. Forms submitted with this content 
type must be encoded as follows:

   1. Control names and values are escaped. Space characters are
  replaced by `+', and then reserved characters are escaped as
  described in [RFC1738]
  ,
  section 2.2: Non-alphanumeric characters are replaced by `%HH',
  a percent sign and two hexadecimal digits representing the ASCII
  code of the character. Line breaks are represented as "CR LF"
  pairs (i.e., `%0D%0A').
Section 4 says "must" means MUST in the rfc2119 sense, so I think we 
have to do URL encoding rather than URI encoding for the query 
parameters.  A quick replacement of ' ' with '+' followed by the usual 
URI encoding might do it.
Yes, but this is for application/x-www-form-urlencoded values. 
Currently we only assume this content type for post params (this was 
recently fixed).  Standard query params are using the URI encoding.  The 
confusing part is that it's possible to submit a form via a GET.  In 
this case the params are passed in the query.  In this case should 
application/x-www-form-urlencoded be used?

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: URI query escapes

2003-06-20 Thread Laura Werner
Hi Michael,

I wish I'd done a bit more digging before my original post.  It turns 
out the problem is that parameters were being encoded *twice*.  We were 
encoding them once on our own, then the URI constructor was encoding 
them again.  My "Mountain View" example was being encoded as 
"Mountain%2520View".  If we encode them only once (even with %20 rather 
than +) everything works. 

>... are we correctly handling query params by URI encoding them?

Probably not.  HTTP 4.01 
 says:

application/x-www-form-urlencoded

This is the default content type. Forms submitted with this content 
type must be encoded as follows:

   1. Control names and values are escaped. Space characters are
  replaced by `+', and then reserved characters are escaped as
  described in [RFC1738]
  ,
  section 2.2: Non-alphanumeric characters are replaced by `%HH',
  a percent sign and two hexadecimal digits representing the ASCII
  code of the character. Line breaks are represented as "CR LF"
  pairs (i.e., `%0D%0A').
Section 4 says "must" means MUST in the rfc2119 sense, so I think we 
have to do URL encoding rather than URI encoding for the query 
parameters.  A quick replacement of ' ' with '+' followed by the usual 
URI encoding might do it.

-- Laura

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: URI query escapes

2003-06-20 Thread Michael Becke
Hi Laura,

This is something that Oleg and I were discussing recently in regard to 
post parameters being form urlencoded.  This case is partially related.

It seems the server in question is assuming that query params are form 
urlencoded.  The method you are using is actually URI encoding 
everything.  According to URI(rfc 2396) both space and + are encoded as 
their hex values in the query.

The quick solution for this is to not encode the URIs using 
httpclient.URI.  This can be accomplished in two ways:

 - use the XXMethod(String) constructor.  it assumes everything is 
encoded already
 - use XXMethod.setQueryString(String).  it also assumes the query is 
already encoded

Both of these require you to encode the URI before passing it to the 
HttpMethods.

So the next question is... are we correctly handling query params by URI 
encoding them?  I believe that this is generally okay, but as you have 
discovered, it may not work in all cases.  In the server you are hitting 
HTTP 1.0?

Mike

Laura Werner wrote:
Hi all,

I'm having a weird problem with escaped characters in the query part of 
a URI.  In the old, alpha1 version of HttpClient, we used 
URIUtil.encodeAll() to encode our query parameter values, and it escaped 
spaces with "+", resulting in "Mountain+View". With the latest 
HttpClient, we use URIUtil.encodeWithinQuery().  If we encode a query 
like "Mountain View", we get back "Mountain%20View".
Does anyone know why this behavior changed?  Our app server is barfing 
on it for some reason and giving the JSP the encoded string with the %20 
in it when it calls getParameter.  (This is probably a bug in the 
server, but our applications group is on my case because the behavior of 
our client changed.)

The background ere is that for legacy reasons, we build up the query 
part of the URL ourselves, encoding each parameter as we append it to 
the string.  Then we create a java.net.URL object out of it (yuck).  
Finally, at fetch time, we do new URI(url.toString()).  (Double yuck.)

I'm going to play with this a bit more.  It may be that I don't even 
have to encode the query strings myself, and that the new URI 
constructor will do it for me.  But I figured I'd ask here and see if 
anyone knew what had changed or had run into a problem like this before.

Laura Werner
BeVocal
-
To unsubscribe, e-mail: 
[EMAIL PROTECTED]
For additional commands, e-mail: 
[EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


URI query escapes

2003-06-20 Thread Laura Werner
Hi all,

I'm having a weird problem with escaped characters in the query part of 
a URI.  In the old, alpha1 version of HttpClient, we used 
URIUtil.encodeAll() to encode our query parameter values, and it escaped 
spaces with "+", resulting in "Mountain+View". With the latest 
HttpClient, we use URIUtil.encodeWithinQuery().  If we encode a query 
like "Mountain View", we get back "Mountain%20View". 

Does anyone know why this behavior changed?  Our app server is barfing 
on it for some reason and giving the JSP the encoded string with the %20 
in it when it calls getParameter.  (This is probably a bug in the 
server, but our applications group is on my case because the behavior of 
our client changed.)

The background ere is that for legacy reasons, we build up the query 
part of the URL ourselves, encoding each parameter as we append it to 
the string.  Then we create a java.net.URL object out of it (yuck).  
Finally, at fetch time, we do new URI(url.toString()).  (Double yuck.)

I'm going to play with this a bit more.  It may be that I don't even 
have to encode the query strings myself, and that the new URI 
constructor will do it for me.  But I figured I'd ask here and see if 
anyone knew what had changed or had run into a problem like this before.

Laura Werner
BeVocal
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]