Re: regular expressions in bash

2002-04-27 Thread Gordon Messmer

On Sat, 2002-04-20 at 19:46, Harry Putnam wrote:
  In the context of the original post, the comparison was to perl regex.
 
  Perl searches for a regex in a string, rather than matching a pattern on
  a string.
 
 I disagree, and I think this is the hub of the matter.  Regex always
 match a pattern.  That is what regex do.  One may search with a regex,
 true enough but the regex always matches a pattern.

Some applications differentiate between search and match operations,
and nothing about the regex spec prevents them from doing so.  Arguing
that a regex is always matches a pattern doesn't change the intended
behavior of those applications.

In the case of find, the limitation is simple to understand:
consistency.  The -name argument doesn't match substrings, and neither
does -regex.  The regex supplied must match the entire path, beginning
to end, just as the argument to -name has to match the whole filename.

  That's sorta up to the tool that provides the regex match/search.  It's
  not uncommon to differentiate between a search and a match.
 
 Since you've pointed to man 7 regex as the authority on regex, can you
 find this distinction explained there?
 
 The distinction was invented in this thread.  I don't think you will
 find mention of it in the documentation you point to:
 
...
 
 One might conclude from the above that if it doesn't act like egrep or
 at least ed.  It isn't posix.

It *is* posix.  The specification for regex describes only the format of
the regular expression, and not what they must match, or how they are
used by applications.  find matching a regex on an entire path, rather
than a substring, is not in any way contrary to the posix regex
specification, because such use is *outside* of the spec.

All of the regex specification is available for use as arguments to
-regex.




signature.asc
Description: This is a digitally signed message part


Re: regular expressions in bash

2002-04-20 Thread Gordon Messmer

On Fri, 2002-04-19 at 14:16, Harry Putnam wrote:
 
 I'm not really sure what constitutes a posix legal regex but I don't 
 think it includes trick riders like having to match a specific part
 of a string, unless put into the regex itself with anchors or the
 like.

A regex is a regex, but a regex search is not a regex match.  I don't
know that Perl provides both, and if it does I don't recall how they're
differentiated.  Other applications do.  Python, for instance
differentiates them thusly:
http://www.python.org/doc/current/lib/matching-searching.html

Find requires a match like Python's, rather than a search like perl
or grep.




signature.asc
Description: This is a digitally signed message part


Re: regular expressions in bash

2002-04-20 Thread Harry Putnam

Gordon Messmer [EMAIL PROTECTED] writes:

 On Fri, 2002-04-19 at 14:16, Harry Putnam wrote:
 
 I'm not really sure what constitutes a posix legal regex but I don't 
 think it includes trick riders like having to match a specific part
 of a string, unless put into the regex itself with anchors or the
 like.

 A regex is a regex, but a regex search is not a regex match.  I don't

Not exactly.  There are several common sets of regex rules.  The one
in find is not as powerfull as what I called the `POSIX' set.

 know that Perl provides both, and if it does I don't recall how they're
 differentiated.  Other applications do.  Python, for instance
 differentiates them thusly:
 http://www.python.org/doc/current/lib/matching-searching.html

 Find requires a match like Python's, rather than a search like perl
 or grep.

Now, I may have used the wrong term (POSIX) and still do not really
know what constitutes a posix legal regex.  However the notation used
with find is weaker in several ways (As I mentioned in my 1st post in
this thread) than what I referred to as POSIX.

In the context of the original post, the comparison was to perl regex.
The usage in find would better be described as regex-like.  Since it
is weaker in several ways, and lacks some of the more powerfull
syntax.  It is a nice addition none-the-less.  I only said it isn't
the real mcCoy.

Far as I know there is no stipulation on a regex to match in any
special way.  Making that stipulation has already weakened the regex
engine involved.

egrep awk and perl  all would give a different (more versatile) result
than that used in find.  Limiting the match in some way only being the
first.

For example:
touch aardvark
find . -regex 'a+ardvark'
nothing

Whereas
ls|egrep 'a+rdvark'
aardvark  
works 

or
find . -regex 'a*rdvark'
nothing

whereas
 
ls |egrep 'a*rdvark'
  aardvark   

Using a*rdvark with find -regex fails but with posix regex it is another way
to find something like  aardvark

Or
find . -regex '\(a\)\1rdvark'
nothing
or
find . -regex '(a)\1rdvark'
find: Invalid back reference

Whereas 

ls |egrep '(a)\1rdvark'
aardvark

There are more examples.  But my only point here was that full regex
is more powerfull because it is more versatile.  Not that the usage in
find is a bad thinkg.

The perl script I submitted (barring any scripting errors) would be
more versatile as a result.



___
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list



Re: regular expressions in bash

2002-04-20 Thread Gordon Messmer

On Sat, 2002-04-20 at 17:12, Harry Putnam wrote:
 
 Not exactly.  There are several common sets of regex rules.  The one
 in find is not as powerfull as what I called the `POSIX' set.

Find uses the POSIX regex functions in the C library, not some special,
weak code.

  know that Perl provides both, and if it does I don't recall how they're
  differentiated.  Other applications do.  Python, for instance
  differentiates them thusly:
  http://www.python.org/doc/current/lib/matching-searching.html
 
  Find requires a match like Python's, rather than a search like perl
  or grep.
 
 Now, I may have used the wrong term (POSIX) and still do not really
 know what constitutes a posix legal regex.

man 7 regex

 However the notation used
 with find is weaker in several ways (As I mentioned in my 1st post in
 this thread) than what I referred to as POSIX.

Your misunderstanding of a regex match does not constitute a weakness in
find. :)

 In the context of the original post, the comparison was to perl regex.

Perl searches for a regex in a string, rather than matching a pattern on
a string.

 The usage in find would better be described as regex-like.  Since it
 is weaker in several ways, and lacks some of the more powerfull
 syntax.  It is a nice addition none-the-less.  I only said it isn't
 the real mcCoy.

It doesn't lack any syntax.  It *is* the real McCoy.

 Far as I know there is no stipulation on a regex to match in any
 special way.

That's sorta up to the tool that provides the regex match/search.  It's
not uncommon to differentiate between a search and a match.

 For example:
 touch aardvark
 find . -regex 'a+ardvark'
 nothing

Do it right:
$ find . -regex './a+ardvark'
./aardvark

'a+ardvark' doesn't match the full path to ./aardvark which is clearly
required, as noted by the man page.

 or
 find . -regex 'a*rdvark'
 nothing

$ find . -regex './a*rdvark'
./aardvark


 Or
 find . -regex '\(a\)\1rdvark'
 nothing

$ find . -regex './\(a\)\1rdvark'
./aardvark





signature.asc
Description: This is a digitally signed message part


Re: regular expressions in bash

2002-04-20 Thread Harry Putnam

Gordon Messmer [EMAIL PROTECTED] writes:

[...]

 However the notation used
 with find is weaker in several ways (As I mentioned in my 1st post in
 this thread) than what I referred to as POSIX.

 Your misunderstanding of a regex match does not constitute a weakness in
 find. :)

Yikes... looks like there's been plenty of that.. (my misunderstanding) 

 In the context of the original post, the comparison was to perl regex.

 Perl searches for a regex in a string, rather than matching a pattern on
 a string.

I disagree, and I think this is the hub of the matter.  Regex always
match a pattern.  That is what regex do.  One may search with a regex,
true enough but the regex always matches a pattern.

What is happening with find is that part of the pattern is predefined.
Thereby removing some of the versatility of regex matching.  That is,
not allowing the operator to choose the match, but predefining it to
some degree.  The operator can chose within whats left.

That is, by definition, less versatile.  Which is really all I meant
by `weaker'.

 'a+ardvark' doesn't match the full path to ./aardvark which is clearly
 required, as noted by the man page.

Noted by man page or not, it is none the less a limitation on regex
usage.

  Far as I know there is no stipulation on a regex to match in any
  special way.
 
 That's sorta up to the tool that provides the regex match/search.  It's
 not uncommon to differentiate between a search and a match.

Since you've pointed to man 7 regex as the authority on regex, can you
find this distinction explained there?

The distinction was invented in this thread.  I don't think you will
find mention of it in the documentation you point to:


Quoting here, but not to support the above comments:

   DESCRIPTION
  Regular expressions (``RE''s), as defined in POSIX 1003.2,
  come in two forms: modern REs  (roughly  those  of  egrep;
  1003.2  calls  these  ``extended''  REs)  and obsolete REs
  (roughly those of ed(1); 1003.2 ``basic'' REs).   Obsolete
  REs  mostly  exist  for backward compatibility in some old
  programs; they will  be  discussed  at  the  end.   1003.2
  leaves some aspects of RE syntax and semantics open; `(*)'
  marks decisions on these aspects that  may  not  be  fully
  portable to other 1003.2 implementations.

One might conclude from the above that if it doesn't act like egrep or
at least ed.  It isn't posix.

Of course any tool can use regex in any way it sees fit.  The
discussion here was about regex themselves inside find or perl.

My contention has been that the usage in perl, awk egrep is more
powerfull and versatile than that in `find'.  Its not a distinction of
search verses match as you claim.  In all cases matching is how it
works.  Its just that the match is partially defined (ie limited) with
the `find' usage.

And again, not complaining about this -regex addition to find, I think
it is a very nice addtion.  I also think it is really only regex-like.
The real mcCoy has no such limitations.

Full regex would have matched in either of our examples.  Not searched
but matched.  :-)




___
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list



Re: regular expressions in bash

2002-04-19 Thread Bill Crawford

On 18 Apr 2002, Gordon Messmer wrote:

 On Thu, 2002-04-18 at 15:26, daniel wrote:
  i'm a perlgeek
  so i'm familiar with its style of regular expressions
  but when i'm trying to use one of those regular expressions in a find
  command,
  
  find /home/ -name (.Apple(.*))|(Network Trash
  Folder)|(TheVolumeSettingsFolder) -print0 | rm -rf
 
 Then tell find to use a regex search.  'man find' would tell you that
 that -regex argument is available, and what you want instead of -name.
 :)

 Oops.

 I completely missed that one ... how long's that been there?

 I'm guessing it's probably always been there, like Kosh.  I'm sooo
embarrassed now :o)




___
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list



Re: regular expressions in bash

2002-04-19 Thread Harry Putnam

Bill Crawford [EMAIL PROTECTED] writes:

  Oops.

  I completely missed that one ... how long's that been there?

  I'm guessing it's probably always been there, like Kosh.  I'm sooo
 embarrassed now :o)

It is new within a year or so, I believe but if you look close you'll
also notice it isn't posix regex 

The example given shows it.
  `b.*r3

Does not match

  ./fubar3

Where as with grep egrep awk sed it would.   There are other
peculiarities too making it less usefull as a regex search, but still
quite a good development over globbing only.



___
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list



Re: regular expressions in bash

2002-04-19 Thread Gordon Messmer

On Fri, 2002-04-19 at 07:26, Harry Putnam wrote:
 
 It is new within a year or so, I believe but if you look close you'll
 also notice it isn't posix regex 
 
 The example given shows it.
   `b.*r3
 
 Does not match
   ./fubar3

Sure it's a POSIX regex.  However, the man page points out that the
pattern must *match* the *entire path*.  Match vs. search maybe isn't
clear if you're accustomed to always searching... A regex search will
match if any part of the string fits the regex.  A match is only good
if the pattern fits the string from the very beginning.





___
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list



Re: regular expressions in bash

2002-04-19 Thread Harry Putnam

Gordon Messmer [EMAIL PROTECTED] writes:

 On Fri, 2002-04-19 at 07:26, Harry Putnam wrote:
 
 It is new within a year or so, I believe but if you look close you'll
 also notice it isn't posix regex 
 
 The example given shows it.
   `b.*r3
 
 Does not match
   ./fubar3

 Sure it's a POSIX regex.  However, the man page points out that the
 pattern must *match* the *entire path*.  Match vs. search maybe isn't
 clear if you're accustomed to always searching... A regex search will
 match if any part of the string fits the regex.  A match is only good
 if the pattern fits the string from the very beginning.

I'm not really sure what constitutes a posix legal regex but I don't 
think it includes trick riders like having to match a specific part
of a string, unless put into the regex itself with anchors or the
like.



___
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list



Re: regular expressions in bash

2002-04-19 Thread Bill Crawford

On Fri, 19 Apr 2002, Harry Putnam wrote:

 Gordon Messmer [EMAIL PROTECTED] writes:
 
  On Fri, 2002-04-19 at 07:26, Harry Putnam wrote:
  
  It is new within a year or so, I believe but if you look close you'll
  also notice it isn't posix regex 
  
  The example given shows it.
`b.*r3
  
  Does not match
./fubar3
 
  Sure it's a POSIX regex.  However, the man page points out that the
  pattern must *match* the *entire path*.  Match vs. search maybe isn't
  clear if you're accustomed to always searching... A regex search will
  match if any part of the string fits the regex.  A match is only good
  if the pattern fits the string from the very beginning.
 
 I'm not really sure what constitutes a posix legal regex but I don't 
 think it includes trick riders like having to match a specific part
 of a string, unless put into the regex itself with anchors or the
 like.

 No, but the find man page actually specifies that it has to match the
whole path, i.e. there is an implicit ^ at the start, as if it were
^(your_regexp_here)




___
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list



Re: regular expressions in bash

2002-04-18 Thread Bill Crawford

On Thu, 18 Apr 2002, daniel wrote:

 i'm a perlgeek
 so i'm familiar with its style of regular expressions
 but when i'm trying to use one of those regular expressions in a find
 command,
 i'm not having much luck
 here's what i want to do:
 
 
 find /home/ -name (.Apple(.*))|(Network Trash
 Folder)|(TheVolumeSettingsFolder) -print0 | rm -rf

 You need shell glob style expressions, not full regexps.  You also
need to use multiple -name options joined with -o for or between
them, something like this:

find /home \( -name .Apple* -o -name Network Trash Folder \
 -o -name TheVolumeSettingsFolder \) -print0 | xargs -0 rm -rf




___
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list



Re: regular expressions in bash

2002-04-18 Thread Gordon Messmer

On Thu, 2002-04-18 at 15:26, daniel wrote:
 i'm a perlgeek
 so i'm familiar with its style of regular expressions
 but when i'm trying to use one of those regular expressions in a find
 command,
 
 find /home/ -name (.Apple(.*))|(Network Trash
 Folder)|(TheVolumeSettingsFolder) -print0 | rm -rf

Then tell find to use a regex search.  'man find' would tell you that
that -regex argument is available, and what you want instead of -name.
:)

 so it looks like i'm not understanding bash's use of regexps

Also, bash has nothing to do with find.  Find isn't built-in.




signature.asc
Description: This is a digitally signed message part


Re: regular expressions in bash

2002-04-18 Thread Harry Putnam

daniel [EMAIL PROTECTED] writes:

 i'm a perlgeek
 so i'm familiar with its style of regular expressions
 but when i'm trying to use one of those regular expressions in a find
 command,
 i'm not having much luck
 here's what i want to do:


 find /home/ -name (.Apple(.*))|(Network Trash
 Folder)|(TheVolumeSettingsFolder) -print0 | rm -rf


 to get rid of all the stuff placed by netatalk before i back it up
 now this works:


 find /home/ -name .Apple* -print0 | rm -rf


 so it looks like i'm not understanding bash's use of regexps
 anyone care to help out here?

If you want to use perl regex then use perl, otherwise you'll need old
fashioned file globbing, which is much more limited than regular
expressions, and has different rules.

Since you already know perl, I'd suggest writing a small script like
the one below, that will be usefull in may places.

I didn't bother with lots of tests and safegaurds so you will want to
doctor it up as needed.  This script expects you to feed it a regex
and a directory to search.  This script only prints the file names it
finds.  To make it do what you want you'll need to lookup the perl
module File::Find and the perl function `unlink'


perldoc -f unlink
perldoc File::Find 

(if its installed this will bring up the info on how to use either.)
Follow the example script below, but when you perldoc File::Find
scroll down to a line that begins like this:

 `The wanted() function does whatever verifications you'

That section explains what builtin variables will contain.

 Like  $File::Find::dir contains the current directory name

So you'll need to add that in your unlink (in place of print below) so
the pathname is available for the unlinking to be accurate

NOTE:  This script only prints filenames it finds... Its left to the
   reader to make it `unlink' them.

In one of my directories ~/no_bak running it like this:

   ./find.pl '(file|9)$' /home/reader/no_bak

You can see the results:
file
nmap_file
19
9
refile
And how they match the regex in cmdline arg $1 '(file|9)$'

cat my_find.pl

#!/usr/bin/perl -w

## get this script name
($myscript = $0) =~ s:^.*/::;

## declare variables
$regex = '';
$directory = '';

use File::Find;

if (!$ARGV[1]){
   print Whoa, two cmdline args are required'\n;
   print Example: \`$myscript REGEX DIRECTORY'\n;
   exit;
}
if (! -d $ARGV[1]){
   print Sorry .. no such directory as $ARGV[1]\n;
   exit;
}
## Shift the cmdline args into variables
$regex = shift;
$directory = shift;

File::Find::find({wanted = \wanted}, $directory);
exit;

sub wanted() {
## Look for files matching our REGEX
   if ($_ =~ /$regex/){
  print $_\n;
   }
## 



___
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list