Re: regular expressions in bash
On Sat, 2002-04-20 at 19:46, Harry Putnam wrote: In the context of the original post, the comparison was to perl regex. Perl searches for a regex in a string, rather than matching a pattern on a string. I disagree, and I think this is the hub of the matter. Regex always match a pattern. That is what regex do. One may search with a regex, true enough but the regex always matches a pattern. Some applications differentiate between search and match operations, and nothing about the regex spec prevents them from doing so. Arguing that a regex is always matches a pattern doesn't change the intended behavior of those applications. In the case of find, the limitation is simple to understand: consistency. The -name argument doesn't match substrings, and neither does -regex. The regex supplied must match the entire path, beginning to end, just as the argument to -name has to match the whole filename. That's sorta up to the tool that provides the regex match/search. It's not uncommon to differentiate between a search and a match. Since you've pointed to man 7 regex as the authority on regex, can you find this distinction explained there? The distinction was invented in this thread. I don't think you will find mention of it in the documentation you point to: ... One might conclude from the above that if it doesn't act like egrep or at least ed. It isn't posix. It *is* posix. The specification for regex describes only the format of the regular expression, and not what they must match, or how they are used by applications. find matching a regex on an entire path, rather than a substring, is not in any way contrary to the posix regex specification, because such use is *outside* of the spec. All of the regex specification is available for use as arguments to -regex. signature.asc Description: This is a digitally signed message part
Re: regular expressions in bash
On Fri, 2002-04-19 at 14:16, Harry Putnam wrote: I'm not really sure what constitutes a posix legal regex but I don't think it includes trick riders like having to match a specific part of a string, unless put into the regex itself with anchors or the like. A regex is a regex, but a regex search is not a regex match. I don't know that Perl provides both, and if it does I don't recall how they're differentiated. Other applications do. Python, for instance differentiates them thusly: http://www.python.org/doc/current/lib/matching-searching.html Find requires a match like Python's, rather than a search like perl or grep. signature.asc Description: This is a digitally signed message part
Re: regular expressions in bash
Gordon Messmer [EMAIL PROTECTED] writes: On Fri, 2002-04-19 at 14:16, Harry Putnam wrote: I'm not really sure what constitutes a posix legal regex but I don't think it includes trick riders like having to match a specific part of a string, unless put into the regex itself with anchors or the like. A regex is a regex, but a regex search is not a regex match. I don't Not exactly. There are several common sets of regex rules. The one in find is not as powerfull as what I called the `POSIX' set. know that Perl provides both, and if it does I don't recall how they're differentiated. Other applications do. Python, for instance differentiates them thusly: http://www.python.org/doc/current/lib/matching-searching.html Find requires a match like Python's, rather than a search like perl or grep. Now, I may have used the wrong term (POSIX) and still do not really know what constitutes a posix legal regex. However the notation used with find is weaker in several ways (As I mentioned in my 1st post in this thread) than what I referred to as POSIX. In the context of the original post, the comparison was to perl regex. The usage in find would better be described as regex-like. Since it is weaker in several ways, and lacks some of the more powerfull syntax. It is a nice addition none-the-less. I only said it isn't the real mcCoy. Far as I know there is no stipulation on a regex to match in any special way. Making that stipulation has already weakened the regex engine involved. egrep awk and perl all would give a different (more versatile) result than that used in find. Limiting the match in some way only being the first. For example: touch aardvark find . -regex 'a+ardvark' nothing Whereas ls|egrep 'a+rdvark' aardvark works or find . -regex 'a*rdvark' nothing whereas ls |egrep 'a*rdvark' aardvark Using a*rdvark with find -regex fails but with posix regex it is another way to find something like aardvark Or find . -regex '\(a\)\1rdvark' nothing or find . -regex '(a)\1rdvark' find: Invalid back reference Whereas ls |egrep '(a)\1rdvark' aardvark There are more examples. But my only point here was that full regex is more powerfull because it is more versatile. Not that the usage in find is a bad thinkg. The perl script I submitted (barring any scripting errors) would be more versatile as a result. ___ Redhat-list mailing list [EMAIL PROTECTED] https://listman.redhat.com/mailman/listinfo/redhat-list
Re: regular expressions in bash
On Sat, 2002-04-20 at 17:12, Harry Putnam wrote: Not exactly. There are several common sets of regex rules. The one in find is not as powerfull as what I called the `POSIX' set. Find uses the POSIX regex functions in the C library, not some special, weak code. know that Perl provides both, and if it does I don't recall how they're differentiated. Other applications do. Python, for instance differentiates them thusly: http://www.python.org/doc/current/lib/matching-searching.html Find requires a match like Python's, rather than a search like perl or grep. Now, I may have used the wrong term (POSIX) and still do not really know what constitutes a posix legal regex. man 7 regex However the notation used with find is weaker in several ways (As I mentioned in my 1st post in this thread) than what I referred to as POSIX. Your misunderstanding of a regex match does not constitute a weakness in find. :) In the context of the original post, the comparison was to perl regex. Perl searches for a regex in a string, rather than matching a pattern on a string. The usage in find would better be described as regex-like. Since it is weaker in several ways, and lacks some of the more powerfull syntax. It is a nice addition none-the-less. I only said it isn't the real mcCoy. It doesn't lack any syntax. It *is* the real McCoy. Far as I know there is no stipulation on a regex to match in any special way. That's sorta up to the tool that provides the regex match/search. It's not uncommon to differentiate between a search and a match. For example: touch aardvark find . -regex 'a+ardvark' nothing Do it right: $ find . -regex './a+ardvark' ./aardvark 'a+ardvark' doesn't match the full path to ./aardvark which is clearly required, as noted by the man page. or find . -regex 'a*rdvark' nothing $ find . -regex './a*rdvark' ./aardvark Or find . -regex '\(a\)\1rdvark' nothing $ find . -regex './\(a\)\1rdvark' ./aardvark signature.asc Description: This is a digitally signed message part
Re: regular expressions in bash
Gordon Messmer [EMAIL PROTECTED] writes: [...] However the notation used with find is weaker in several ways (As I mentioned in my 1st post in this thread) than what I referred to as POSIX. Your misunderstanding of a regex match does not constitute a weakness in find. :) Yikes... looks like there's been plenty of that.. (my misunderstanding) In the context of the original post, the comparison was to perl regex. Perl searches for a regex in a string, rather than matching a pattern on a string. I disagree, and I think this is the hub of the matter. Regex always match a pattern. That is what regex do. One may search with a regex, true enough but the regex always matches a pattern. What is happening with find is that part of the pattern is predefined. Thereby removing some of the versatility of regex matching. That is, not allowing the operator to choose the match, but predefining it to some degree. The operator can chose within whats left. That is, by definition, less versatile. Which is really all I meant by `weaker'. 'a+ardvark' doesn't match the full path to ./aardvark which is clearly required, as noted by the man page. Noted by man page or not, it is none the less a limitation on regex usage. Far as I know there is no stipulation on a regex to match in any special way. That's sorta up to the tool that provides the regex match/search. It's not uncommon to differentiate between a search and a match. Since you've pointed to man 7 regex as the authority on regex, can you find this distinction explained there? The distinction was invented in this thread. I don't think you will find mention of it in the documentation you point to: Quoting here, but not to support the above comments: DESCRIPTION Regular expressions (``RE''s), as defined in POSIX 1003.2, come in two forms: modern REs (roughly those of egrep; 1003.2 calls these ``extended'' REs) and obsolete REs (roughly those of ed(1); 1003.2 ``basic'' REs). Obsolete REs mostly exist for backward compatibility in some old programs; they will be discussed at the end. 1003.2 leaves some aspects of RE syntax and semantics open; `(*)' marks decisions on these aspects that may not be fully portable to other 1003.2 implementations. One might conclude from the above that if it doesn't act like egrep or at least ed. It isn't posix. Of course any tool can use regex in any way it sees fit. The discussion here was about regex themselves inside find or perl. My contention has been that the usage in perl, awk egrep is more powerfull and versatile than that in `find'. Its not a distinction of search verses match as you claim. In all cases matching is how it works. Its just that the match is partially defined (ie limited) with the `find' usage. And again, not complaining about this -regex addition to find, I think it is a very nice addtion. I also think it is really only regex-like. The real mcCoy has no such limitations. Full regex would have matched in either of our examples. Not searched but matched. :-) ___ Redhat-list mailing list [EMAIL PROTECTED] https://listman.redhat.com/mailman/listinfo/redhat-list
Re: regular expressions in bash
On 18 Apr 2002, Gordon Messmer wrote: On Thu, 2002-04-18 at 15:26, daniel wrote: i'm a perlgeek so i'm familiar with its style of regular expressions but when i'm trying to use one of those regular expressions in a find command, find /home/ -name (.Apple(.*))|(Network Trash Folder)|(TheVolumeSettingsFolder) -print0 | rm -rf Then tell find to use a regex search. 'man find' would tell you that that -regex argument is available, and what you want instead of -name. :) Oops. I completely missed that one ... how long's that been there? I'm guessing it's probably always been there, like Kosh. I'm sooo embarrassed now :o) ___ Redhat-list mailing list [EMAIL PROTECTED] https://listman.redhat.com/mailman/listinfo/redhat-list
Re: regular expressions in bash
Bill Crawford [EMAIL PROTECTED] writes: Oops. I completely missed that one ... how long's that been there? I'm guessing it's probably always been there, like Kosh. I'm sooo embarrassed now :o) It is new within a year or so, I believe but if you look close you'll also notice it isn't posix regex The example given shows it. `b.*r3 Does not match ./fubar3 Where as with grep egrep awk sed it would. There are other peculiarities too making it less usefull as a regex search, but still quite a good development over globbing only. ___ Redhat-list mailing list [EMAIL PROTECTED] https://listman.redhat.com/mailman/listinfo/redhat-list
Re: regular expressions in bash
On Fri, 2002-04-19 at 07:26, Harry Putnam wrote: It is new within a year or so, I believe but if you look close you'll also notice it isn't posix regex The example given shows it. `b.*r3 Does not match ./fubar3 Sure it's a POSIX regex. However, the man page points out that the pattern must *match* the *entire path*. Match vs. search maybe isn't clear if you're accustomed to always searching... A regex search will match if any part of the string fits the regex. A match is only good if the pattern fits the string from the very beginning. ___ Redhat-list mailing list [EMAIL PROTECTED] https://listman.redhat.com/mailman/listinfo/redhat-list
Re: regular expressions in bash
Gordon Messmer [EMAIL PROTECTED] writes: On Fri, 2002-04-19 at 07:26, Harry Putnam wrote: It is new within a year or so, I believe but if you look close you'll also notice it isn't posix regex The example given shows it. `b.*r3 Does not match ./fubar3 Sure it's a POSIX regex. However, the man page points out that the pattern must *match* the *entire path*. Match vs. search maybe isn't clear if you're accustomed to always searching... A regex search will match if any part of the string fits the regex. A match is only good if the pattern fits the string from the very beginning. I'm not really sure what constitutes a posix legal regex but I don't think it includes trick riders like having to match a specific part of a string, unless put into the regex itself with anchors or the like. ___ Redhat-list mailing list [EMAIL PROTECTED] https://listman.redhat.com/mailman/listinfo/redhat-list
Re: regular expressions in bash
On Fri, 19 Apr 2002, Harry Putnam wrote: Gordon Messmer [EMAIL PROTECTED] writes: On Fri, 2002-04-19 at 07:26, Harry Putnam wrote: It is new within a year or so, I believe but if you look close you'll also notice it isn't posix regex The example given shows it. `b.*r3 Does not match ./fubar3 Sure it's a POSIX regex. However, the man page points out that the pattern must *match* the *entire path*. Match vs. search maybe isn't clear if you're accustomed to always searching... A regex search will match if any part of the string fits the regex. A match is only good if the pattern fits the string from the very beginning. I'm not really sure what constitutes a posix legal regex but I don't think it includes trick riders like having to match a specific part of a string, unless put into the regex itself with anchors or the like. No, but the find man page actually specifies that it has to match the whole path, i.e. there is an implicit ^ at the start, as if it were ^(your_regexp_here) ___ Redhat-list mailing list [EMAIL PROTECTED] https://listman.redhat.com/mailman/listinfo/redhat-list
Re: regular expressions in bash
On Thu, 18 Apr 2002, daniel wrote: i'm a perlgeek so i'm familiar with its style of regular expressions but when i'm trying to use one of those regular expressions in a find command, i'm not having much luck here's what i want to do: find /home/ -name (.Apple(.*))|(Network Trash Folder)|(TheVolumeSettingsFolder) -print0 | rm -rf You need shell glob style expressions, not full regexps. You also need to use multiple -name options joined with -o for or between them, something like this: find /home \( -name .Apple* -o -name Network Trash Folder \ -o -name TheVolumeSettingsFolder \) -print0 | xargs -0 rm -rf ___ Redhat-list mailing list [EMAIL PROTECTED] https://listman.redhat.com/mailman/listinfo/redhat-list
Re: regular expressions in bash
On Thu, 2002-04-18 at 15:26, daniel wrote: i'm a perlgeek so i'm familiar with its style of regular expressions but when i'm trying to use one of those regular expressions in a find command, find /home/ -name (.Apple(.*))|(Network Trash Folder)|(TheVolumeSettingsFolder) -print0 | rm -rf Then tell find to use a regex search. 'man find' would tell you that that -regex argument is available, and what you want instead of -name. :) so it looks like i'm not understanding bash's use of regexps Also, bash has nothing to do with find. Find isn't built-in. signature.asc Description: This is a digitally signed message part
Re: regular expressions in bash
daniel [EMAIL PROTECTED] writes: i'm a perlgeek so i'm familiar with its style of regular expressions but when i'm trying to use one of those regular expressions in a find command, i'm not having much luck here's what i want to do: find /home/ -name (.Apple(.*))|(Network Trash Folder)|(TheVolumeSettingsFolder) -print0 | rm -rf to get rid of all the stuff placed by netatalk before i back it up now this works: find /home/ -name .Apple* -print0 | rm -rf so it looks like i'm not understanding bash's use of regexps anyone care to help out here? If you want to use perl regex then use perl, otherwise you'll need old fashioned file globbing, which is much more limited than regular expressions, and has different rules. Since you already know perl, I'd suggest writing a small script like the one below, that will be usefull in may places. I didn't bother with lots of tests and safegaurds so you will want to doctor it up as needed. This script expects you to feed it a regex and a directory to search. This script only prints the file names it finds. To make it do what you want you'll need to lookup the perl module File::Find and the perl function `unlink' perldoc -f unlink perldoc File::Find (if its installed this will bring up the info on how to use either.) Follow the example script below, but when you perldoc File::Find scroll down to a line that begins like this: `The wanted() function does whatever verifications you' That section explains what builtin variables will contain. Like $File::Find::dir contains the current directory name So you'll need to add that in your unlink (in place of print below) so the pathname is available for the unlinking to be accurate NOTE: This script only prints filenames it finds... Its left to the reader to make it `unlink' them. In one of my directories ~/no_bak running it like this: ./find.pl '(file|9)$' /home/reader/no_bak You can see the results: file nmap_file 19 9 refile And how they match the regex in cmdline arg $1 '(file|9)$' cat my_find.pl #!/usr/bin/perl -w ## get this script name ($myscript = $0) =~ s:^.*/::; ## declare variables $regex = ''; $directory = ''; use File::Find; if (!$ARGV[1]){ print Whoa, two cmdline args are required'\n; print Example: \`$myscript REGEX DIRECTORY'\n; exit; } if (! -d $ARGV[1]){ print Sorry .. no such directory as $ARGV[1]\n; exit; } ## Shift the cmdline args into variables $regex = shift; $directory = shift; File::Find::find({wanted = \wanted}, $directory); exit; sub wanted() { ## Look for files matching our REGEX if ($_ =~ /$regex/){ print $_\n; } ## ___ Redhat-list mailing list [EMAIL PROTECTED] https://listman.redhat.com/mailman/listinfo/redhat-list