Am Sonntag, 15. Oktober 2006 08:24 schrieb Philipp Marek: > Do you have a patch that does all this, maybe :-?
Again it took significantly longer than I had anticipated (partly caused by the fact that I forgot to take my notebook power supply with me last weekend and could not work while being on the train...), but here it is. I split my changes into two small patches actually, as I added two features which are pretty much unrelated. Consider both patches to be a request for comments. :-) dir_ignore ********** changes the matching behaviour of fsvs glob-like filename patterns. With dir_ignore, a glob-like pattern matches the full directory-/filename instead of just a prefix as it currently does. An exception are patterns which end with a slash, which will match the exact full directory-/filename without the slash as well as everything the pattern is a prefix of. This is used to exclude directories and their contents. Examples: ./**/tmp will match all files in any subdirectory which are exactly called "tmp". ./**/tmp** mimics the above pattern's current semantics: match any file or directory whose name starts with "tmp". ./**/tmp/ will match all files in all directories which are called "tmp" and the directory itself. ./**/tmp/** will match all files in all directories which are called "tmp" but NOT the directory itself, the empty directory "tmp" won't be ignored but will be included in the directory This patch works by anchoring all globbing patterns at the end of the line, except if they end with a slash. In this case, the PCRE is closed with '($|/)' which causes an exact match of the directory name to be ignored and everything below the directory as well. My first try was to simply anchor all patterns except patterns ending in '/', but that caused all directories I wanted to ignore to be included. (However, without their contents.) It would have been neccessary to explicitely exclude the directory as well, so I changed to behaviour to the one explained above. This feature has one drawback: ./**/tmp/ will also ignore all FILES which are exactly called "tmp", not only the dirs. :-/ However, I consider the overall matching behaviour with this patch to be a huge improvement over the current situation. escape_mode *********** adds support for escaping characters with a backslash '\' and for bracket expressions (character classes). This implementation requires the RE to be interpreted as a PCRE, it's not correct if the resulting RE is interpreted as a POSIX RE. You can now write stuff like ./**/\[is[_.-]this[_.-]an_intereres*ting\*filename\?[]!]? and it should work as expected. I implemented this as altough any pattern can be directly written as an PCRE of course, a globbing pattern is simpler to read if you eg. just want to use straight character classes. Additionally, much more people know how to use globbing patterns than PCREs. While the basics of PCREs are also simple and straight forward most people do not seem to know that and appear to be frightened by them. I'd love to hear your opinion about and your experiences with these small patches! :-) Greetings, Gunter PS: The one major headache still left in regard of globbing patterns is that it's still not possible write a single pattern which matches the file "tmp" in the top-level directory and any subdirectory... './**/tmp' won't match './tmp' while './**tmp' will match much more... (Any file ending with "tmp".) However, as fsvs relies on "./" as the start of a pattern, I had no good idea of how to fix it... -- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The person on the other side was a young woman. Very obviously a young woman. There was no possible way that she could have been mistaken for a young man in any language, especially Braille. -- The goddess with the nice earrings (Terry Pratchett, Maskerade) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + PGP-verschlüsselte Mails bevorzugt! + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
--- src/ignore.c 2006-09-28 10:11:29.000000000 +0200
+++ src_escapemode/ignore.c 2006-10-26 20:49:09.000000000 +0200
@@ -98,6 +98,8 @@
char *buffer;
char *src, *dest;
int status;
+ int pos_in_bracket_expr, in_bracket_expr;
+ int backslashed;
status=0;
@@ -112,8 +114,42 @@
buffer=malloc(len);
STOPIF_ENOMEM(!buffer);
dest=buffer;
+ pos_in_bracket_expr = -1; // zero-based, -1 == outside
+ in_bracket_expr = backslashed = 0;
+
do
{
+ if (backslashed)
+ {
+ // escaped mode
+ *(dest++) = *(src++);
+ backslashed = 0;
+ }
+ else if (in_bracket_expr)
+ {
+ if (*src == '^' && pos_in_bracket_expr < 0)
+ {
+ *(dest++) = '!';
+ ++src;
+ }
+ else
+ {
+ /* a "^" at the start of a bracket expression does not
+ count as a regular content element. */
+ ++pos_in_bracket_expr;
+
+ if (*src == ']' && pos_in_bracket_expr > 0) {
+ // Bracket expression ends.
+ in_bracket_expr = 0;
+ pos_in_bracket_expr = -1;
+ }
+
+ backslashed = (*src == '\\'); // possibly enter escaped mode
+ *(dest++) = *(src++);
+ }
+ }
+ else
+ {
switch(*src)
{
case '*':
@@ -140,20 +176,29 @@
*(dest++) = '.';
src++;
break;
+ case '[':
+ in_bracket_expr = 1;
+ /* fall through */
case '0' ... '9':
case 'a' ... 'z':
case 'A' ... 'Z':
case '/':
+ case '-':
+ *(dest++) = *(src++);
+ break;
+ case '\\':
+ backslashed = 1; // enter escaped mode
*(dest++) = *(src++);
break;
/* . and all other special characters { ( [ ] ) } + # " \ $
- * get escaped. */
+ * get escaped if we're not within a bracket expression. */
case '.':
default:
*(dest++) = '\\';
*(dest++) = *(src++);
break;
}
+ }
STOPIF_CODE_ERR( buffer+len - dest < 5, ENOSPC,
"not enough space in buffer");
--- src/ignore.c 2006-09-28 10:11:29.000000000 +0200
+++ src_dirignore/ignore.c 2006-10-26 20:46:52.000000000 +0200
@@ -155,10 +155,30 @@
break;
}
- STOPIF_CODE_ERR( buffer+len - dest < 5, ENOSPC,
+ /* Ensure that there is sufficient space in the buffer to
+ process the next character. A "*" might create up to 5
+ characters in dest, the directory matching patterns
+ appended last will add up to five, and we have a
+ terminating '\0'. */
+ STOPIF_CODE_ERR( buffer+len - dest < 11, ENOSPC,
"not enough space in buffer");
} while (*src);
+ if (src != ignore->compare_string) // src has moved at least one char
+ {
+ *(dest++) = '$'; // anchor regexp
+
+ if(*(src-1) == '/')
+ {
+ /* Ok, the glob pattern ends in "/", so our special
+ "ignore directory" handling kicks in.
+ This results in "PATTERN($|/)". */
+ *(dest-2) = '(';
+ *(dest++) = '|';
+ *(dest++) = '/';
+ *(dest++) = ')';
+ }
+ }
*dest=0;
/* return unused space */
buffer=realloc(buffer, dest-buffer+2);
pgpX8LafsSXNh.pgp
Description: PGP signature
