Am Sonntag, 15. Oktober 2006 08:24 schrieb Philipp Marek:
> Do you have a patch that does all this, maybe :-?

Again it took significantly longer than I had anticipated (partly caused 
by the fact that I forgot to take my notebook power supply with me last 
weekend and could not work while being on the train...), but here it is.

I split my changes into two small patches actually, as I added two 
features which are pretty much unrelated.

Consider both patches to be a request for comments. :-)

dir_ignore
**********
changes the matching behaviour of fsvs glob-like filename patterns. With 
dir_ignore, a glob-like pattern matches the full directory-/filename 
instead of just a prefix as it currently does.
An exception are patterns which end with a slash, which will match the 
exact full directory-/filename without the slash as well as everything 
the pattern is a prefix of. This is used to exclude directories and their 
contents.

Examples:

./**/tmp
  will match all files in any subdirectory which are exactly called "tmp".
./**/tmp**
  mimics the above pattern's current semantics: match any file or
  directory whose name starts  with "tmp".
./**/tmp/
  will match all files in all directories which are called "tmp" and the
  directory itself.
./**/tmp/**
  will match all files in all directories which are called "tmp" but NOT
  the directory itself, the empty directory "tmp" won't be ignored but
  will be included in the directory

This patch works by anchoring all globbing patterns at the end of the 
line, except if they end with a slash. In this case, the PCRE is closed 
with '($|/)' which causes an exact match of the directory name to be 
ignored and everything below the directory as well.

My first try was to simply anchor all patterns except patterns ending 
in '/', but that caused all directories I wanted to ignore to be 
included. (However, without their contents.) It would have been neccessary 
to explicitely exclude the directory as well, so I changed to behaviour 
to the one explained above.

This feature has one drawback: ./**/tmp/ will also ignore all FILES which 
are exactly called "tmp", not only the dirs. :-/ However, I consider the 
overall matching behaviour with this patch to be a huge improvement over 
the current situation.


escape_mode
***********
adds support for escaping characters with a backslash '\' and for bracket 
expressions (character classes). This implementation requires the RE to 
be interpreted as a PCRE, it's not correct if the resulting RE is 
interpreted as a POSIX RE.

You can now write stuff like


  ./**/\[is[_.-]this[_.-]an_intereres*ting\*filename\?[]!]?

and it should work as expected. I implemented this as altough any pattern 
can be directly written as an PCRE of course, a globbing pattern is 
simpler to read if you eg. just want to use straight character classes. 
Additionally, much more people know how to use globbing patterns than 
PCREs. While the basics of PCREs are also simple and straight forward  
most people do not seem to know that and appear to be frightened by them.



I'd love to hear your opinion about and your experiences with these small 
patches! :-)

Greetings,

  Gunter

PS: The one major headache still left in regard of globbing patterns is 
that it's still not possible write a single pattern which matches the 
file "tmp" in the top-level directory and any subdirectory...
'./**/tmp' won't match './tmp' while './**tmp' will match much more... 
(Any file ending with "tmp".) However, as fsvs relies on "./" as the 
start of a pattern, I had no good idea of how to fix it...

-- 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The person on the other side was a young woman. Very obviously a young 
woman. There was no possible way that she could have been mistaken for a 
young man in any language, especially Braille.        -- The goddess 
with the nice earrings            (Terry Pratchett, Maskerade)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+                   PGP-verschlüsselte Mails bevorzugt!                 +
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
--- src/ignore.c	2006-09-28 10:11:29.000000000 +0200
+++ src_escapemode/ignore.c	2006-10-26 20:49:09.000000000 +0200
@@ -98,6 +98,8 @@
 	char *buffer;
 	char *src, *dest;
 	int status;
+	int pos_in_bracket_expr, in_bracket_expr;
+	int backslashed;
 
 
 	status=0;
@@ -112,8 +114,42 @@
 		buffer=malloc(len);
 		STOPIF_ENOMEM(!buffer);
 		dest=buffer;
+		pos_in_bracket_expr = -1; // zero-based, -1 == outside
+		in_bracket_expr = backslashed = 0;
+
 		do
 		{
+			if (backslashed)
+			{
+				// escaped mode
+				*(dest++) = *(src++);
+				backslashed = 0;
+			}
+			else if (in_bracket_expr)
+			{
+				if (*src == '^' && pos_in_bracket_expr < 0)
+				{
+					*(dest++) = '!';
+					++src;
+				}
+				else
+				{
+					/* a "^" at the start of a bracket expression does not
+						 count as a regular content element. */
+					++pos_in_bracket_expr;
+
+					if (*src == ']' && pos_in_bracket_expr > 0) {
+						// Bracket expression ends.
+						in_bracket_expr = 0;
+						pos_in_bracket_expr = -1;
+					}
+
+					backslashed = (*src == '\\'); // possibly enter escaped mode
+					*(dest++) = *(src++);
+				}
+			}
+			else
+			{
 			switch(*src)
 			{
 				case '*':
@@ -140,20 +176,29 @@
 					*(dest++) = '.';
 					src++;
 					break;
+					case '[':
+						in_bracket_expr = 1;
+						/* fall through */
 				case '0' ... '9':
 				case 'a' ... 'z':
 				case 'A' ... 'Z':
 				case '/':
+					case '-':
+						*(dest++) = *(src++);
+						break;
+					case '\\':
+						backslashed = 1; // enter escaped mode
 					*(dest++) = *(src++);
 					break;
 					/* . and all other special characters { ( [ ] ) } + # " \ $
-					 * get escaped. */
+						 * get escaped if we're not within a bracket expression. */
 				case '.':
 				default:
 					*(dest++) = '\\';
 					*(dest++) = *(src++);
 					break;
 			}
+			}
 
 			STOPIF_CODE_ERR( buffer+len - dest < 5, ENOSPC,
 					"not enough space in buffer");
--- src/ignore.c	2006-09-28 10:11:29.000000000 +0200
+++ src_dirignore/ignore.c	2006-10-26 20:46:52.000000000 +0200
@@ -155,10 +155,30 @@
 					break;
 			}
 
-			STOPIF_CODE_ERR( buffer+len - dest < 5, ENOSPC,
+			/* Ensure that there is sufficient space in the buffer to
+				 process the next character. A "*" might create up to 5
+				 characters in dest, the directory matching patterns
+				 appended last will add up to five, and we have a
+				 terminating '\0'. */
+			STOPIF_CODE_ERR( buffer+len - dest < 11, ENOSPC,
 					"not enough space in buffer");
 		} while (*src);
 
+		if (src != ignore->compare_string) // src has moved at least one char
+		{
+			*(dest++) = '$'; // anchor regexp
+
+			if(*(src-1) == '/')
+			{
+				/* Ok, the glob pattern ends in "/", so our special
+					 "ignore directory" handling kicks in.
+					 This results in "PATTERN($|/)". */
+				*(dest-2) = '(';
+				*(dest++) = '|';
+				*(dest++) = '/';
+				*(dest++) = ')';
+			}
+		}
 		*dest=0;
 		/* return unused space */
 		buffer=realloc(buffer, dest-buffer+2);

Attachment: pgpX8LafsSXNh.pgp
Description: PGP signature

Reply via email to