Troy Laurin wrote:
Matthew,

The new behaviour more closely follows the documentation, and as you say
this becomes consistent with handling of files... but any build files
relying on the old behaviour will of course break.
The fix is simple, to replace "**/*" with "*/**/*", but it may be hard
to diagnose if/when upgrading breaks your build script...

Yeah - it's tricky to determine if this is a problem or not. It's such a strange pattern to use (it only appeared in a testcase).


A note on your previous email...

2.  foo/**/*.cs was being converted to "foo/(^\\.*)*/[^\\]*.cs"  This
is now converted to foo/\\?.*[^\\]*.cs, also much faster.

"foo/(^\\.*)*/[^\\]*.cs"

Looking at the code for DirectoryScanner, how is it possible for this to
be produced?  On windows, I get "foo(\\.*)*\\[^\\]*\.cs".  From looking
at the code, on unix/linux I would expect to get "foo(/.*)*/[^/]*\.cs".
The conversion result would then be "foo.*/[^/]*\.cs", which appears to
give exactly the same results.

Note that my message here is out of sync with current CVS until I get it working. :)


Also note that all my assertions were backed up by profiling/timing a test application with the given regexes.

It's faster to scan for single characters, rather than a repeating group. I don't know how this is implemented internally, but profiling (and my suspicions initially) figured that it would be easier to scan using straight patterns rather than a pattern than requires nested pattern scanning (ie: the repeating group with negation).

I'm curious that taking out the ^$ anchors speeds up the search, since
there's an explicit comment in the DirectoryScanner source noting that
they are added to improve the speed... and it makes sense that including
them would reduce the amount of stringspace that would need to be
examined by the regular expression... then again, I haven't tried
comparing the speed of each regular expression, so I'm happy to accept
that my understanding might be flawed :-)

If you are searching for an entire string, anchoring is faster. For instance, if you are searching for "blah foo bar" in the string "blah foo bar", anchoring will be the fastest way to find it.


It turns out to be slower to search for this pattern:

"$.*blah foo bar.*^"

than for this pattern:

"blah foo bar"

I'm only removing anchors if they conflict with wildcards, since the regex engine can terminate early in the latter pattern, while it needs to scan to the end of the string in the former.

A couple of extra comments on DirectoryScanner...
Both the ToRegexPattern and ParseSearchDirectoryAndPattern methods
perform slash replacement.  ToRegexPattern is private, and only called
from ParseSearchDirectoryAndPattern, so should be able to assume that
slashes are already replaced.

Sounds good to me. I'll have to take a look to see if I can remove the extra replacement.


Just being picky, but shouldn't 'if (s.Length == 2 && s[1] ==
Path.VolumeSeparatorChar) {' be 'if
(s.EndsWith(Path.VolumeSeparatorChar)) {'?  This allows for platforms
that support named volumes, as well as just drive letters.  Not that
there are any platforms like this around any more :-)

Seeing as how Path.VolumeSeparatorChar is a "char", our current comparison will support all possible return values. ;)


Thanks for the note,
Matt.



Regards,

-- Troy


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Matthew Mastracci
Sent: Thursday, 8 July 2004 5:34 AM
To: Nant-Developers (E-mail)
Subject: [nant-dev] Two broken testcases - edge case question


It looks like the regex optimization broke an edge case:

**/* now matches the base directory, as well as any subdirectories on a FileSet.DirectoryNames call. For instance, in the following directory structure, all three will be matched with a base directory of "C:\foo":

C:\foo
C:\foo\bar
C:\foo\baz

The old behaviour would only match the two subdirectories.

Is this behaviour important to anyone? This is actually more consistent, considering that:

file/**/*.cs

matches:

file/bar.cs
file/foo/bar.cs
file/foo/foo/bar.cs

"**/" can basically be considered to be "current directory or subdirectories".

Matt.


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com _______________________________________________
nant-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nant-developers






Disclaimer Message:

This message contains confidential information and is intended only for the 
individual(s) named.  If you are not the named addressee you should not disseminate, 
distribute or copy this e-mail. Please immediately delete it and all copies of it from 
your system, destroy any hard copies of it, and notify the sender. E-mail transmission 
cannot be guaranteed to be secure or error-free as information could be intercepted, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. To the 
maximum extent permitted by law, Immersive Technologies Pty. Ltd. does not accept 
liability for any errors or omissions in the contents of this message which arise as a 
result of e-mail transmission.


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com



------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click _______________________________________________ nant-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nant-developers

Reply via email to