Re: normalizing a URI with ..'s in it ?

2003-07-24 Thread Michael Becke
On Thursday, July 24, 2003, at 06:19 AM, Marcus Crafter wrote:

From a users perspective it would be great if the method explicitly
notified the user if this kind of input was invalid - but if that's
inconsistent with current behaviour across the client I'd be happy 
with it
being in the javadocs :)
I will definitely add something to the Javadocs.

Is this the only boundary case where normalizing relative URI's would 
fail?
I'm definitely a novice with the httpclient code, but if so, would 
there be
other unknown issues if we checked for and handled this case ? (in or 
even
outside of the httpclient code?).
It is hard to say.  There are a number of test cases, most of which are 
based upon the examples in RFC2396bis.  As with most code it is 
difficult to be sure there are no bugs.  You are welcome to create a 
proof:)

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: normalizing a URI with ..'s in it ?

2003-07-24 Thread Michael Becke
I am not positive how this process is being worked out by I would 
suggest sending a message to [EMAIL PROTECTED] or to Ron Fielding.

Mike

On Thursday, July 24, 2003, at 06:16 AM, Mike Moran wrote:

Michael Becke wrote:
On Wednesday, July 23, 2003, at 06:18 PM, Mike Moran wrote:
[Oleg agreed] Right. Out of interest, which set of test cases does 
the URI class use, the ones from rfc2396 or rfc2396bis?
The tests are from rfc2396bis.
This is verging rapidly off-topic, but I was wondering if you knew 
anywhere I could keep up-to-date on the standardis track of 
rfc2396bis? I've written some code to do the normalization we were 
talking about and I am swithering about whether I should enable it. 
The different handling of paths such as "/../../" between rfc2396 and 
rfc2396bis may have knock-on effects in my code, so it would be nice 
if they were *standard* knock-on effects :-)

--
Mike
-
To unsubscribe, e-mail: 
[EMAIL PROTECTED]
For additional commands, e-mail: 
[EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: normalizing a URI with ..'s in it ?

2003-07-24 Thread Marcus Crafter
Hi All,

Thanks for everyone's feedback on the subject. Much appreciated.

>From a users perspective it would be great if the method explicitly 
notified the user if this kind of input was invalid - but if that's 
inconsistent with current behaviour across the client I'd be happy with it 
being in the javadocs :)

Is this the only boundary case where normalizing relative URI's would fail? 

I'm definitely a novice with the httpclient code, but if so, would there be 
other unknown issues if we checked for and handled this case ? (in or even 
outside of the httpclient code?).

Cheers,

Marcus


On Wed, Jul 23, 2003 at 11:08:58PM -0400, Michael Becke wrote:
> On Wednesday, July 23, 2003, at 06:18 PM, Mike Moran wrote:
> >[Oleg agreed] Right. Out of interest, which set of test cases does the 
> >URI class use, the ones from rfc2396 or rfc2396bis?
> 
> The tests are from rfc2396bis.
> 
> >I would agree. Doesn't this mean that normalize() should thrown an 
> >exception if it *is* called on a non-absolute URI?
> 
> I don't know if we want to throw an exception.  It's not really an 
> error case plus it would not be consistent with current behavior.  It 
> should definitely be documented and we should consider adding a test 
> for relative URIs in normalize().  In such a case it would not actually 
> attempt normalization.  Again I do not think it's an error it is just 
> that we do not correctly handle the case currently.
> 
> Mike
> 
> 
> -
> To unsubscribe, e-mail: 
> [EMAIL PROTECTED]
> For additional commands, e-mail: 
> [EMAIL PROTECTED]

-- 
.
 ,,$,  Marcus Crafter
;$'  ':Computer Systems Engineer
$: :   ManageSoft GmbH
 $   o_)$$$:   82-84 Mainzer Landstrasse
 ;$,_/\ &&:'   60327 Frankfurt Germany
   ' /( &&&
   \_'
  .
&&&:

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: normalizing a URI with ..'s in it ?

2003-07-24 Thread Mike Moran
Michael Becke wrote:
On Wednesday, July 23, 2003, at 06:18 PM, Mike Moran wrote:

[Oleg agreed] Right. Out of interest, which set of test cases does the 
URI class use, the ones from rfc2396 or rfc2396bis?


The tests are from rfc2396bis.
This is verging rapidly off-topic, but I was wondering if you knew 
anywhere I could keep up-to-date on the standardis track of rfc2396bis? 
I've written some code to do the normalization we were talking about and 
I am swithering about whether I should enable it. The different handling 
of paths such as "/../../" between rfc2396 and rfc2396bis may have 
knock-on effects in my code, so it would be nice if they were *standard* 
knock-on effects :-)

--
Mike
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: normalizing a URI with ..'s in it ?

2003-07-23 Thread Michael Becke
On Wednesday, July 23, 2003, at 06:18 PM, Mike Moran wrote:
[Oleg agreed] Right. Out of interest, which set of test cases does the 
URI class use, the ones from rfc2396 or rfc2396bis?
The tests are from rfc2396bis.

I would agree. Doesn't this mean that normalize() should thrown an 
exception if it *is* called on a non-absolute URI?
I don't know if we want to throw an exception.  It's not really an 
error case plus it would not be consistent with current behavior.  It 
should definitely be documented and we should consider adding a test 
for relative URIs in normalize().  In such a case it would not actually 
attempt normalization.  Again I do not think it's an error it is just 
that we do not correctly handle the case currently.

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: normalizing a URI with ..'s in it ?

2003-07-23 Thread Mike Moran
On Wednesday, Jul 23, 2003, at 20:37 Europe/London, Michael Becke wrote:

Mike Moran wrote:
Btw, I presume this is the algorithm given in section 5.2 of  
http://www.apache.org/~fielding/uri/rev-2002/ 
rfc2396bis.html#absolutize?  If so, this is just a draft  
(draft-fielding-uri-rfc2396bis-03.txt). It does actually differ from  
rfc2396 in how it handles abnormal URLs (though I think that's  
irrelevant here).
Yes, this is the algorithm.  We decided to upgrade to ensure that URI  
parsing was consistent across Apache.  I think this was at the request  
of Roy Fielding.  Oleg, is that correct?
[Oleg agreed] Right. Out of interest, which set of test cases does the  
URI class use, the ones from rfc2396 or rfc2396bis?


The string "my/relative/../../another/relative" would never be output  
from merge() or given to remove_dot_segments() in the section 5.2  
algorithm. If you are just applying remove_dot_segments() to this  
string then it will get confused and output a wierd answer because  
it's not expecting that input (ie a path that doesn't have a "/" at  
the start).
>
I may be wrong, but I didn't think normalization could be applied to  
anything but absolute URLs.
I agree that when resolving a path relative to a base URI a relative  
path should never be passed to remove_dot_segments().  However,  
according to section 6.2.2.3 remove_dot_segments() can be used for  
path segment normalization.

I guess what is comes down to is that normalization is meant to  
generate a URI with a valid absolute path.  The value output in this  
case is a little strange but I think it's correct.  Essentially  
normalize should not be used on relative URIs.
I would agree. Doesn't this mean that normalize() should thrown an  
exception if it *is* called on a non-absolute URI?

--
Mike
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: normalizing a URI with ..'s in it ?

2003-07-23 Thread Oleg Kalnichevski
> Yes, this is the algorithm.  We decided to upgrade to ensure that URI 
> parsing was consistent across Apache.  I think this was at the request 
> of Roy Fielding.  Oleg, is that correct?
> 

This is correct. 

Oleg


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: normalizing a URI with ..'s in it ?

2003-07-23 Thread Michael Becke
Mike Moran wrote:
Btw, I presume this is the algorithm given in section 5.2 of 
http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html#absolutize? 
 If so, this is just a draft (draft-fielding-uri-rfc2396bis-03.txt). It 
does actually differ from rfc2396 in how it handles abnormal URLs 
(though I think that's irrelevant here).
Yes, this is the algorithm.  We decided to upgrade to ensure that URI 
parsing was consistent across Apache.  I think this was at the request 
of Roy Fielding.  Oleg, is that correct?

The string "my/relative/../../another/relative" would never be output 
from merge() or given to remove_dot_segments() in the section 5.2 
algorithm. If you are just applying remove_dot_segments() to this string 
then it will get confused and output a wierd answer because it's not 
expecting that input (ie a path that doesn't have a "/" at the start).
>
I may be wrong, but I didn't think normalization could be applied to 
anything but absolute URLs.
I agree that when resolving a path relative to a base URI a relative 
path should never be passed to remove_dot_segments().  However, 
according to section 6.2.2.3 remove_dot_segments() can be used for path 
segment normalization.

I guess what is comes down to is that normalization is meant to generate 
a URI with a valid absolute path.  The value output in this case is a 
little strange but I think it's correct.  Essentially normalize should 
not be used on relative URIs.

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: normalizing a URI with ..'s in it ?

2003-07-23 Thread Mike Moran
Michael Becke wrote:
Though I agree this seems a little strange I am not sure it's a bug. 
URI.normalize() is using the algorithm defined at 
 which 
corresponds to the latest update(I believe) to rfc2396.  
[ removed algorithm ]

Btw, I presume this is the algorithm given in section 5.2 of 
http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html#absolutize? 
 If so, this is just a draft (draft-fielding-uri-rfc2396bis-03.txt). It 
does actually differ from rfc2396 in how it handles abnormal URLs 
(though I think that's irrelevant here).

And we're done.  Perhaps we should send and email to [EMAIL PROTECTED] and see 
if they have any input.  Does anyone else have an opinion about this?
The string "my/relative/../../another/relative" would never be output 
from merge() or given to remove_dot_segments() in the section 5.2 
algorithm. If you are just applying remove_dot_segments() to this string 
then it will get confused and output a wierd answer because it's not 
expecting that input (ie a path that doesn't have a "/" at the start).

I may be wrong, but I didn't think normalization could be applied to 
anything but absolute URLs.

--
Mike
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: normalizing a URI with ..'s in it ?

2003-07-23 Thread Michael Becke
Though I agree this seems a little strange I am not sure it's a bug. 
URI.normalize() is using the algorithm defined at 
<http://www.apache.org/~fielding/uri/rev-2002/issues.html> which 
corresponds to the latest update(I believe) to rfc2396.  This algorithm 
is defined below.

1) The buffer is initialized with the unprocessed path component.

2) If the buffer begins with "./" or "../", the "." or ".." segment is 
removed.

3) All occurrences of "/./" in the buffer are replaced with "/".

4) If the buffer ends with "/.", the "." is removed.

5) All occurrences of "//../" in the buffer, where ".." and 
 are complete path segments, are iteratively replaced with "/" 
in order from left to right until no matching pattern remains. If the 
buffer ends with "//..", that is also replaced with "/". Note 
that  may be empty.

6) All prefixes of "/../" in the buffer, where ".." and 
 are complete path segments, are iteratively replaced with "/" 
in order from left to right until no matching pattern remains. If the 
buffer ends with "/..", that is also replaced with "/". Note 
that  may be empty.

7) The remaining buffer is returned as the result of remove_dot_segments.

This algorithm results in the following process:

# at rule 1
buffer = "my/relative/../../another/relative"
# rules 2, 3 and 4 are ignored

# rule 5 is applied once and "/relative/../" is replaced with "/"
buffer = "my/../another/relative"
# rule 6 is applied once and "my/../" is replaced with "/"
buffer = "/another/relative"
And we're done.  Perhaps we should send and email to [EMAIL PROTECTED] and see 
if they have any input.  Does anyone else have an opinion about this?

Mike

Marcus Crafter wrote:
Hi All,

Hope all is well. I have a question to do with normalizing a URI with
..'s in it.
If I execute the following code:

URI u = new URI("my/relative/../../another/relative");
u.normalize();
u.getURI();
should the URI now be /another/relative or 'another/relative'

The code returns /another/relative but I was actually expecting 
'another/relative' since the original URI was without a leading /.

Was my expectation wrong or could this be a defect? Any ideas?

Cheers,

Marcus





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: normalizing a URI with ..'s in it ?

2003-07-23 Thread Ortwin Glück
Sounds like a bug to me. normalize should not change the 
absolute/relative type of an URL.

Odi

Marcus Crafter wrote:
Hi All,

Hope all is well. I have a question to do with normalizing a URI with
..'s in it.
If I execute the following code:

URI u = new URI("my/relative/../../another/relative");
u.normalize();
u.getURI();
should the URI now be /another/relative or 'another/relative'

The code returns /another/relative but I was actually expecting 
'another/relative' since the original URI was without a leading /.

Was my expectation wrong or could this be a defect? Any ideas?

Cheers,

Marcus



--
_
 NOSE applied intelligence ag
   [www]  http://www.nose.ch
 ortwin glück  [email] [EMAIL PROTECTED]
 hardturmstrasse 171   [pgp key]  0x81CF3416
 8005 zurich   [office]  +41-1-277 57 35
 switzerland   [fax] +41-1-277 57 12
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


normalizing a URI with ..'s in it ?

2003-07-23 Thread Marcus Crafter
Hi All,

Hope all is well. I have a question to do with normalizing a URI with
..'s in it.

If I execute the following code:

URI u = new URI("my/relative/../../another/relative");
u.normalize();
u.getURI();

should the URI now be /another/relative or 'another/relative'

The code returns /another/relative but I was actually expecting 
'another/relative' since the original URI was without a leading /.

Was my expectation wrong or could this be a defect? Any ideas?

Cheers,

Marcus



-- 
.
 ,,$,  Marcus Crafter
;$'  ':Computer Systems Engineer
$: :   ManageSoft GmbH
 $   o_)$$$:   82-84 Mainzer Landstrasse
 ;$,_/\ &&:'   60327 Frankfurt Germany
   ' /( &&&
   \_&&&&'
  &&&&.
&&&&&&&:

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]