Re: NetUtils / StringUtils / Tokenizer

2004-04-30 Thread Ugo Cei
Nathaniel Alfred wrote:
The 30 lines are already in an available JAR, only normalize() is
currently private there:
http://cvs.apache.org/viewcvs.cgi/avalon-excalibur/sourceresolve/src/jav
a/org/apache/excalibur/source/SourceUtil.java?rev=1.5&view=auto
Hmmm, interesting ... nowadays you can find an implementation under 
every turned stone, it seems ;-).

We might compare the different implementations, see which is better and 
keep only that.

Just to be picky, the one in Excalibur creates maybe too many temp 
String objects:

path = path.substring(0, i + 1) + path.substring(i + 3);

and scans the input too many times for my taste.

Anyway, all of this will become moot the day we decide not to support 
JDK 1.3 anymore.

	Ugo



Re: NetUtils / StringUtils / Tokenizer

2004-04-30 Thread Ugo Cei
Antonio Gallardo wrote:

There are other places where we can take advantage of this jar.
If so, please go on and change them to make use of the library, but 
remember that current users of our code may rely on some undocumented 
behaviour (like, for instance, never returning null from 
NetUtils.normalize).

It would be fine also, in order to verify that the assumptions still 
hold, to keep the current testcases, even if we would be effectively 
testing others' code.

In the end, I'm not sure it's worth it, particularly after having seen 
of Commons' "peculiarities".

	Ugo




RE: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Nathaniel Alfred
>Re: code reuse. I'm quite eager too reuse code, usually, but I'm -1 on 
>adding another JAR (and another dependency) only for 30 lines of code.
>
>   Ugo

The 30 lines are already in an available JAR, only normalize() is
currently private there:

http://cvs.apache.org/viewcvs.cgi/avalon-excalibur/sourceresolve/src/jav
a/org/apache/excalibur/source/SourceUtil.java?rev=1.5&view=auto

HTH, Alfred.

This message is for the named person's use only. It may contain
confidential, proprietary or legally privileged information. No
confidentiality or privilege is waived or lost by any mistransmission.
If you receive this message in error, please notify the sender urgently
and then immediately delete the message and any copies of it from your
system. Please also immediately destroy any hardcopies of the message.
You must not, directly or indirectly, use, disclose, distribute, print,
or copy any part of this message if you are not the intended recipient.
The sender's company reserves the right to monitor all e-mail
communications through their networks. Any views expressed in this
message are those of the individual sender, except where the message
states otherwise and the sender is authorised to state them to be the
views of the sender's company. 




Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Antonio Gallardo
Ugo Cei dijo:
> Re: code reuse. I'm quite eager too reuse code, usually, but I'm -1 on
> adding another JAR (and another dependency) only for 30 lines of code.

There are other places where we can take advantage of this jar.

Best Regards,

Antonio Gallardo


Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Ugo Cei
Il giorno 30/apr/04, alle 02:11, Joerg Heinicke ha scritto:

Without claiming to read Ugo's mind I think that's exactly the point. 
Too many ".."s means that there are more directories to walk up then 
it is possible. /../ is only the most simple sample. /foo/../../ is 
another one. And FilenameUtils.normalize()'s javadoc says clearly: 
"Returns null if the ..'s went past the root." And that's bizarre.
Exactly. Why put the burden of checking the return value of the 
function on the caller? It's too damn easy to forget and get an NPE.

Re: code reuse. I'm quite eager too reuse code, usually, but I'm -1 on 
adding another JAR (and another dependency) only for 30 lines of code.

	Ugo



Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Joerg Heinicke
On 30.04.2004 02:04, Antonio Gallardo wrote:

This method returns null if there are too many ".."s, which strikes me
as rather bizarre.


Can you explain more about that?
/../ --> null

That's what Ugo called 'bizarre'.


That is not the point. Even me, can clearly understand that. 8-|
I did not query it :-P

The above sentence talk about ''too many ".."s.'' That is what I asked
for.
Without claiming to read Ugo's mind I think that's exactly the point. 
Too many ".."s means that there are more directories to walk up then it 
is possible. /../ is only the most simple sample. /foo/../../ is another 
one. And FilenameUtils.normalize()'s javadoc says clearly: "Returns null 
if the ..'s went past the root." And that's bizarre.

Joerg


Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Antonio Gallardo
Joerg Heinicke dijo:
> On 30.04.2004 01:19, Antonio Gallardo wrote:
>
Yep. I found it :-D
http://jakarta.apache.org/commons/io/apidocs/org/apache/commons/io/
FilenameUtils.html#normalize(java.lang.String)

>>>
>>>This method returns null if there are too many ".."s, which strikes me
>>>as rather bizarre.
>>
>>
>> Can you explain more about that?
>
> /../ --> null
>
> That's what Ugo called 'bizarre'.

That is not the point. Even me, can clearly understand that. 8-|

The above sentence talk about ''too many ".."s.'' That is what I asked
for. From the javadoc of commons IO the method can manage the simple /../
that is all. If not. would be fine to post on the common list about the
problem. I am +1 on sharing as much code as we can to avoid the
reinvention of the wheel.

Another point is if the function meets the Cocoon's requirements.

Best Regards,

Antonio Gallardo.
>
> Joerg
>



Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Joerg Heinicke
On 30.04.2004 01:19, Antonio Gallardo wrote:

Yep. I found it :-D
http://jakarta.apache.org/commons/io/apidocs/org/apache/commons/io/
FilenameUtils.html#normalize(java.lang.String)
This method returns null if there are too many ".."s, which strikes me
as rather bizarre.


Can you explain more about that?
/../ --> null

That's what Ugo called 'bizarre'.

Joerg


Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Antonio Gallardo
Ugo Cei dijo:
> Il giorno 29/apr/04, alle 23:53, Antonio Gallardo ha scritto:
>
>> Yep. I found it :-D
>> http://jakarta.apache.org/commons/io/apidocs/org/apache/commons/io/
>> FilenameUtils.html#normalize(java.lang.String)
>>
>
> This method returns null if there are too many ".."s, which strikes me
> as rather bizarre.

Can you explain more about that?

Best Regards,

Antonio Gallardo


Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Joerg Heinicke
On 29.04.2004 23:36, Joerg Heinicke wrote:

"/../" results in "/../" instead of "/" and I don't know ...
http://www.ietf.org/rfc/rfc2396.txt

5.2. Resolving Relative References to Absolute Form

6) a) All but the last segment of the base URI's path component is
  copied to the buffer.  In other words, any characters after the
  last (right-most) slash character, if any, are excluded.
   b) The reference's path component is appended to the buffer
  string.
   c) All occurrences of "./", where "." is a complete path segment,
  are removed from the buffer string.
   d) If the buffer string ends with "." as a complete path segment,
  that "." is removed.
   e) All occurrences of "/../", where  is a
  complete path segment not equal to "..", are removed from the
  buffer string.  Removal of these path segments is performed
  iteratively, removing the leftmost matching pattern on each
  iteration, until no matching pattern remains.
   f) If the buffer string ends with "/..", where 
  is a complete path segment not equal to "..", that
  "/.." is removed.
   g) If the resulting buffer string still begins with one or more
  complete path segments of "..", then the reference is
  considered to be in error.  Implementations may handle this
  error by retaining these components in the resolved path (i.e.,
  treating them as part of the final URI), by removing them from
  the resolved path (i.e., discarding relative levels above the
  root), or by avoiding traversal of the reference.
   h) The remaining buffer string is the reference URI's new path
  component.
g) means it's an error, but may be handled as /../ or / - 
jakarta.commons.io.FilenameUtils' null in the sense of notifying an 
error is also correct. Hmm ...

Joerg


Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Ugo Cei
Il giorno 29/apr/04, alle 23:53, Antonio Gallardo ha scritto:

Yep. I found it :-D
http://jakarta.apache.org/commons/io/apidocs/org/apache/commons/io/ 
FilenameUtils.html#normalize(java.lang.String)

This meethod returns null if there are too many ".."s, which strikes me  
as rather bizarre.

	Ugo



Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Ugo Cei
Il giorno 29/apr/04, alle 23:36, Joerg Heinicke ha scritto:

The mentioned class java.net.URI#normalize() in JDK 1.4 behaves 
different for two cases:

"foo/bar1/bar2/bar3/../../.." results in "foo/" instead of "foo" and 
that's absolutely correct.

"/../" results in "/../" instead of "/" and I don't know ...
OK, can you please add some more test cases? I'll try to make the code 
behave just the same. By the way, what does "/../../" normalize to?

	Ugo



Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Antonio Gallardo
Antonio Gallardo dijo:
> Joerg Heinicke dijo:
>> On 29.04.2004 22:12, Ugo Cei wrote:
>>
>>> I've just committed a new version that:
>>>
>>> - passes all tests
>>> - uses only JDK core classes
>>> - has less lines of code than the previous one
>>>
>>> Hope you all like it.
>>
>> Very good, but ... first the good news:
>>
>> The build fails now if a test fails.
>> And there is a new target for debugging tests what I find really
>> helpful.
>>
>> Now the bad news:
>>
>> The mentioned class java.net.URI#normalize() in JDK 1.4 behaves
>> different for two cases:
>>
>> "foo/bar1/bar2/bar3/../../.." results in "foo/" instead of "foo" and
>> that's absolutely correct.
>>
>> "/../" results in "/../" instead of "/" and I don't know ...
>
> Hi:
>
> Wonder if there is not already a function for these task in an Apache
> commons jar. Would be fine to look for it.

Yep. I found it :-D
http://jakarta.apache.org/commons/io/apidocs/org/apache/commons/io/FilenameUtils.html#normalize(java.lang.String)

Best Regards,

Antonio Gallardo


Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Antonio Gallardo
Joerg Heinicke dijo:
> On 29.04.2004 22:12, Ugo Cei wrote:
>
>> I've just committed a new version that:
>>
>> - passes all tests
>> - uses only JDK core classes
>> - has less lines of code than the previous one
>>
>> Hope you all like it.
>
> Very good, but ... first the good news:
>
> The build fails now if a test fails.
> And there is a new target for debugging tests what I find really helpful.
>
> Now the bad news:
>
> The mentioned class java.net.URI#normalize() in JDK 1.4 behaves
> different for two cases:
>
> "foo/bar1/bar2/bar3/../../.." results in "foo/" instead of "foo" and
> that's absolutely correct.
>
> "/../" results in "/../" instead of "/" and I don't know ...

Hi:

Wonder if there is not already a function for these task in an Apache
commons jar. Would be fine to look for it.

Best Regards,

Antonio Gallardo
>
> Joerg
>



Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Joerg Heinicke
On 29.04.2004 22:12, Ugo Cei wrote:

I've just committed a new version that:

- passes all tests
- uses only JDK core classes
- has less lines of code than the previous one
Hope you all like it.
Very good, but ... first the good news:

The build fails now if a test fails.
And there is a new target for debugging tests what I find really helpful.
Now the bad news:

The mentioned class java.net.URI#normalize() in JDK 1.4 behaves 
different for two cases:

"foo/bar1/bar2/bar3/../../.." results in "foo/" instead of "foo" and 
that's absolutely correct.

"/../" results in "/../" instead of "/" and I don't know ...

Joerg


Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Ugo Cei
I've just committed a new version that:

- passes all tests
- uses only JDK core classes
- has less lines of code than the previous one
	Hope you all like it.

		Ugo



Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Upayavira
Ugo Cei wrote:

Joerg Heinicke wrote:

3. while the Tokenizer correctly works on "/../" the 
NetUtils.normalize() method has a problem with it:


What is the correct output for "/../"? I'd say "/", but the testcase 
expects "/../".
I would consider /../ an error.

What does new File("/../foo") do?

Upayavira




Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Joerg Heinicke
Ugo Cei  cbim.it> writes:

> > 3. while the Tokenizer correctly works on "/../" the 
> > NetUtils.normalize() method has a problem with it:
> 
> What is the correct output for "/../"? I'd say "/", but the testcase 
> expects "/../".

Good question. I did not thought much about it when adding it as testcase, but I
think you are right with "/". I guess there are a few RFCs telling the correct
way. Googling I found an URI class in JDK 1.4:
http://java.sun.com/j2se/1.4.2/docs/api/java/net/URI.html. It has almost all our
NetUtils methods and claims RFC compatibility. So the best is probably to
implement the same behaviour, if not even copying the algorithm. Is this allowed
to provide this functionality to older JDKs?

Joerg



Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Ugo Cei
Joerg Heinicke wrote:
3. while the Tokenizer correctly works on "/../" the 
NetUtils.normalize() method has a problem with it:
What is the correct output for "/../"? I'd say "/", but the testcase 
expects "/../".

	Ugo




Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Ugo Cei
Bruno Dumon wrote:
My advice: in the 2.1 series, leave that code as is, especially as we're
close to a release. It's too easy to brake it.
The current code does not pass all the tests. Since we have a testcase, 
I suggest (and volunteer for) doing the simplest thing that could 
possibly work and pass all the tests. It should be a handful of code 
lines, so there's no point using regexps or something from Excalibur 
(the less we depend on it, the better). JFDI ;-).

	Ugo






Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Bruno Dumon
On Thu, 2004-04-29 at 14:44, Stefano Mazzocchi wrote:
> Joerg Heinicke wrote:
> > After Cheche's bug report I had a look at NetUtils. Though Ugo fixed it 
> > with a quick fix, it does not really solve the problem as a test with 
> > the updated NetUtilsTestCase and the NetUtils before my latest commit 
> > can easily show.
> > 
> > My problem is that there are many problems in different places in the code:
> > 
> > 1. o.a.commons.lang.StringUtils.split() behaves strange in that way that 
> > o.a.com.l.SU.split("", "/") return an empty String[]
> > while
> > o.a.coc.u.SU.split("", "/") returns String[] {""} which is what I would 
> > expect.
> > Also
> > o.a.com.l.SU.split("/", "/") returns empty String[].
> > This means the iteration over the string array can never really work 
> > with o.a.commons.lang.StringUtils.
> > 
> > => reverted that change.
> > 
> > 
> > 2. our Tokenizer behaves strange in one case:
> > 
> > o.a.coc.u.SU.split("/", "/")
> > => new Tokenizer("/", "/", false)
> > => tokenizer.countTokens() = 2
> > => but tokenizer.hasMoreTokens() returns true only once
> > => leads to String[] {"", null} while it should be String[] {"", ""}
> > 
> > 
> > 3. while the Tokenizer correctly works on "/../" the 
> > NetUtils.normalize() method has a problem with it:
> > 
> > result of split/tokenize: String[] {"", "..", ""}
> > result of normalize(): ""
> 
> What about using ORO regexp for these things? When I first wrote I was 
> not really much into regular expressions, but now I do think it makes 
> sense to use them, these guys spent a lot of time making sure those 
> things don't happen.

There are lots of similar utility functions in the class SourceUtil of
the excalibur sourceresolver. There are probably all kinds of subtle
differences though.

I remember changing the code doing the URL parsing there using a regexp,
but this gave stackoverflowexceptions on long URL's (I think it was
using jakarta-regexp), and many people considered using a regexp for
such critical code not a good idea.

My advice: in the 2.1 series, leave that code as is, especially as we're
close to a release. It's too easy to brake it.

-- 
Bruno Dumon http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
[EMAIL PROTECTED]  [EMAIL PROTECTED]



Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Stefano Mazzocchi
Joerg Heinicke wrote:
After Cheche's bug report I had a look at NetUtils. Though Ugo fixed it 
with a quick fix, it does not really solve the problem as a test with 
the updated NetUtilsTestCase and the NetUtils before my latest commit 
can easily show.

My problem is that there are many problems in different places in the code:

1. o.a.commons.lang.StringUtils.split() behaves strange in that way that 
o.a.com.l.SU.split("", "/") return an empty String[]
while
o.a.coc.u.SU.split("", "/") returns String[] {""} which is what I would 
expect.
Also
o.a.com.l.SU.split("/", "/") returns empty String[].
This means the iteration over the string array can never really work 
with o.a.commons.lang.StringUtils.

=> reverted that change.

2. our Tokenizer behaves strange in one case:

o.a.coc.u.SU.split("/", "/")
=> new Tokenizer("/", "/", false)
=> tokenizer.countTokens() = 2
=> but tokenizer.hasMoreTokens() returns true only once
=> leads to String[] {"", null} while it should be String[] {"", ""}
3. while the Tokenizer correctly works on "/../" the 
NetUtils.normalize() method has a problem with it:

result of split/tokenize: String[] {"", "..", ""}
result of normalize(): ""
What about using ORO regexp for these things? When I first wrote I was 
not really much into regular expressions, but now I do think it makes 
sense to use them, these guys spent a lot of time making sure those 
things don't happen.

--
Stefano.


smime.p7s
Description: S/MIME Cryptographic Signature


Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Bertrand Delacretaz
Le 29 avr. 04, à 11:52, Upayavira a écrit :

...Doesn't it replace, for example /cocoon/src/java/../webapp with 
/cocoon/src/webapp? i.e. removing .. from paths?
FWIW, I have attached the results of
  find src -type f -name *.java | xargs -n200 grep 'NetUtils.*normalize'
-Bertrand

results of
find src -type f -name *.java | xargs -n200 grep 'NetUtils.*normalize'

src/blocks/portal/java/org/apache/cocoon/environment/portlet/ActionResponse.java:  
  absLoc = NetUtils.normalize(absLoc);
src/blocks/scratchpad/java/org/apache/cocoon/acting/LinkTranslatorMapAction.java:  
  path = NetUtils.normalize(path);
src/blocks/scratchpad/java/org/apache/cocoon/ant/CocoonCrawling.java:
String normalizedUriType = NetUtils.normalize(uriType.getUri());
src/blocks/scratchpad/java/org/apache/cocoon/ant/UriType.java:final String 
normalizedUri = NetUtils.normalize(uri);
src/blocks/scratchpad/java/org/apache/cocoon/ant/UriType.java:this.uri = 
NetUtils.normalize(absolutizedUri);
src/blocks/scratchpad/java/org/apache/cocoon/ant/UriType.java:final String 
normalizedUri = NetUtils.normalize(uri);
src/java/org/apache/cocoon/bean/Target.java:sourceURI = 
NetUtils.normalize(root + sourceURI);
src/java/org/apache/cocoon/bean/Target.java:
NetUtils.normalize(NetUtils.absolutize(this.getPath(), linkURI));
src/test/org/apache/cocoon/util/test/NetUtilsTestCase.java: * A unit test for 
NetUtils.normalize()
src/test/org/apache/cocoon/util/test/NetUtilsTestCase.java:String result = 
NetUtils.normalize(test);





Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Ugo Cei
Il giorno 29/apr/04, alle 11:52, Upayavira ha scritto:

Frankly, I don't even know what NetUtils.normalize is supposed to do, 
so I cannot be of any help here.
Doesn't it replace, for example /cocoon/src/java/../webapp with 
/cocoon/src/webapp? i.e. removing .. from paths?

Ah, I see. It would have been nice if the Javadoc actually stated that, 
wouldn't it?

OK, I won't promise anything, but I'll try to refactor it into 
something clearer and satisfying the testcases.

	Ugo



Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Upayavira
Ugo Cei wrote:

Il giorno 29/apr/04, alle 02:48, Joerg Heinicke ha scritto:

After Cheche's bug report I had a look at NetUtils. Though Ugo fixed 
it with a quick fix, it does not really solve the problem as a test 
with the updated NetUtilsTestCase and the NetUtils before my latest 
commit can easily show.


Frankly, I don't even know what NetUtils.normalize is supposed to do, 
so I cannot be of any help here.
Doesn't it replace, for example /cocoon/src/java/../webapp with 
/cocoon/src/webapp? i.e. removing .. from paths?

Regards, Upayavira





Re: NetUtils / StringUtils / Tokenizer

2004-04-29 Thread Ugo Cei
Il giorno 29/apr/04, alle 02:48, Joerg Heinicke ha scritto:

After Cheche's bug report I had a look at NetUtils. Though Ugo fixed 
it with a quick fix, it does not really solve the problem as a test 
with the updated NetUtilsTestCase and the NetUtils before my latest 
commit can easily show.
Frankly, I don't even know what NetUtils.normalize is supposed to do, 
so I cannot be of any help here.

	Ugo