find(1) -regex/-iregex

2001-02-20 Thread Akinori MUSHA

Hi,

I have implemented -regex and -iregex options for find(1):

http://people.FreeBSD.org/~knu/misc/find_regex.diff

They are meant to be compatible with those of GNU's and NetBSD's:

-regex :

True if the whole path of the file matches  using
basic regular expression.  To match a file named
``./foo/xyzzy'', you can use the regular expression
``.*/[xyz]*'' or ``.*/foo/.*'', but not ``xyzzy'' or
``/foo/''.

-iregex :

Like -regex, but the match is case insensitive.

I'd like to commit it after reviews if there is no convincing
objection against it.  Any suggestion is welcome.

Thanks,

-- 
 /
/__  __Akinori.org / MUSHA.org
   / )  )  ) )  / FreeBSD.org / Ruby-lang.org
Akinori MUSHA aka / (_ /  ( (__(  @ iDaemons.org / and.or.jp

"We're only at home when we're on the run, on the wing, on the fly"

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: find(1) -regex/-iregex

2001-02-20 Thread Alfred Perlstein

* Akinori MUSHA <[EMAIL PROTECTED]> [010220 11:19] wrote:
> Hi,
> 
> I have implemented -regex and -iregex options for find(1):
> 

Sounds good, just make sure the regex engine matches the one that
the other find(1)'s use.

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: find(1) -regex/-iregex

2001-02-20 Thread Akinori MUSHA

At Wed, 21 Feb 2001 08:42:19 +1300,
Craig Carey wrote:
> What about the -iname option?.
> 
> I recently installed GNU 'find' just to get that -iname problem fixed.
> 
> Can you do -iname too?.

Thanks for the info.  It's added now.

I'm ashamed to say that I couldn't resist implementing -E option to
allow extended regexps. ;)  The traditional and POSIX compliant basic
regexp is so hard to handle that you can't even say ".+\.S?o" but
"..*\.S{0,1}o".. (see re_format(7) for details)

Also, I revised the manpage to describe them in detail.  Please check
it out:

http://people.FreeBSD.org/~knu/misc/find_regex.diff

>  >I'd like to commit it after reviews if there is no convincing
>  >objection against it.  Any suggestion is welcome.
>  >
> 
> I would object if it is a new variant of regexp. I'd say it ought
> be between egrep and perl, in its functionality.

I don't think I grasp your meaning..  GNU find(1)'s -regexp uses the
"basic regexp" that is _not_ the "extended regexp" which egrep(1) uses
nor the one perl(1) uses.

Anyway, here lists the facts:

  - I implemented -regexp/-iregexp using FreeBSD's standard regex
  library which is supposed to be compliant with POSIX.2

  - The match is executed with REG_BASIC, which behavior is compatible
  with GNU find(1) and NetBSD find(1)

  - I, however, added -E so we can use extended regexp ;)

  - Perl's regexp is known to be a unique variant that is different
  from the "basic regexp" nor the "extended regexp" ;P

> It sounds like a regexp would be nice.

Me too. :)

-- 
 /
/__  __Akinori.org / MUSHA.org
   / )  )  ) )  / FreeBSD.org / Ruby-lang.org
Akinori MUSHA aka / (_ /  ( (__(  @ iDaemons.org / and.or.jp

"We're only at home when we're on the run, on the wing, on the fly"

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: find(1) -regex/-iregex

2001-02-20 Thread Daniel C. Sobral

Akinori MUSHA wrote:
> 
> Hi,
> 
> I have implemented -regex and -iregex options for find(1):
> 
> http://people.FreeBSD.org/~knu/misc/find_regex.diff

I'm not familiar with find sources, but it seems to me you execute
regcomp() for each file name to be compared? If so... change that! :-)
Regcomp() does expensive setup so that regexec() can be run
inexpensively many times over.

> They are meant to be compatible with those of GNU's and NetBSD's:
> 
> -regex :
> 
> True if the whole path of the file matches  using
> basic regular expression.  To match a file named
> ``./foo/xyzzy'', you can use the regular expression
> ``.*/[xyz]*'' or ``.*/foo/.*'', but not ``xyzzy'' or
> ``/foo/''.
> 
> -iregex :
> 
> Like -regex, but the match is case insensitive.

You forgot -E (use extended regexp syntax), and the example you show
above is extended regexp syntax, not basic regexp syntax.

> I'd like to commit it after reviews if there is no convincing
> objection against it.  Any suggestion is welcome.

Well, I expressed my concerns above.

-- 
Daniel C. Sobral(8-DCS)
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]

"Too bad sentience isn't a marketable commodity."

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: find(1) -regex/-iregex

2001-02-20 Thread Daniel C. Sobral

Alfred Perlstein wrote:
> 
> * Akinori MUSHA <[EMAIL PROTECTED]> [010220 11:19] wrote:
> > Hi,
> >
> > I have implemented -regex and -iregex options for find(1):
> >
> 
> Sounds good, just make sure the regex engine matches the one that
> the other find(1)'s use.

It won't. GNU find certainly uses GNU regexp library, which has lots of
extra stuff. Naturally, our find will be using our library instead.
 Nothing we can do about it. It is the way of the Gnu to extend
and embrace.

-- 
Daniel C. Sobral(8-DCS)
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]

"Too bad sentience isn't a marketable commodity."

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: find(1) -regex/-iregex

2001-02-20 Thread Daniel C. Sobral

Akinori MUSHA wrote:
> 
> > I would object if it is a new variant of regexp. I'd say it ought
> > be between egrep and perl, in its functionality.
...
> 
>   - Perl's regexp is known to be a unique variant that is different
>   from the "basic regexp" nor the "extended regexp" ;P

For that matter, anyone talking about "standard" regexp of any kind I
invite to take a look at the include file for gnu regexp, to see just
how many slightly different variants of regexp there are out there in
various utilities.

-- 
Daniel C. Sobral(8-DCS)
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]

"Too bad sentience isn't a marketable commodity."

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: find(1) -regex/-iregex

2001-02-20 Thread Akinori MUSHA

At Wed, 21 Feb 2001 12:35:09 +0900,
Daniel C. Sobral wrote:
> I'm not familiar with find sources, but it seems to me you execute
> regcomp() for each file name to be compared? If so... change that! :-)
> Regcomp() does expensive setup so that regexec() can be run
> inexpensively many times over.

Indeed.  I'll do it soon, thanks.

> You forgot -E (use extended regexp syntax), and the example you show
> above is extended regexp syntax, not basic regexp syntax.

Noted.

-- 
 /
/__  __Akinori.org / MUSHA.org
   / )  )  ) )  / FreeBSD.org / Ruby-lang.org
Akinori MUSHA aka / (_ /  ( (__(  @ iDaemons.org / and.or.jp

"We're only at home when we're on the run, on the wing, on the fly"

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: find(1) -regex/-iregex

2001-02-20 Thread Akinori MUSHA

At Wed, 21 Feb 2001 14:12:51 +0900,
I wrote:
> At Wed, 21 Feb 2001 12:35:09 +0900,
> Daniel C. Sobral wrote:
> > I'm not familiar with find sources, but it seems to me you execute
> > regcomp() for each file name to be compared? If so... change that! :-)
> > Regcomp() does expensive setup so that regexec() can be run
> > inexpensively many times over.
> 
> Indeed.  I'll do it soon, thanks.

Updated.

http://people.FreeBSD.org/~knu/misc/find_regex.diff

> > You forgot -E (use extended regexp syntax), and the example you show
> > above is extended regexp syntax, not basic regexp syntax.

Well, it was added after I had posted the original article.  The
latest one's find.1 mentions it.

.It Fl E
Interpret regular expressions followed by
.Ic -regex
and
.Ic -iregex
options as extended (modern) regular expressions rather than basic
regular expressions (BRE's).  The
.Xr re_format 7
manual page fully describes both formats.


As for the syntax my examples conform to, I think they are valid both
for basic and extended.

True if the whole path of the file matches
.Ar pattern
using regular expression.  To match a file named ``./foo/xyzzy'', you
can use the regular expression ``.*/[xyz]*'' or ``.*/foo/.*'', but not
``xyzzy'' or ``/foo/''.


Thanks for your suggestions.

-- 
 /
/__  __Akinori.org / MUSHA.org
   / )  )  ) )  / FreeBSD.org / Ruby-lang.org
Akinori MUSHA aka / (_ /  ( (__(  @ iDaemons.org / and.or.jp

"We're only at home when we're on the run, on the wing, on the fly"

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: find(1) -regex/-iregex

2001-02-20 Thread Craig Carey


Can an  -iname  option be provided. Then the FreeBSD find would be
more like GNU find, and lines like this could be written:

find /msdos-disk -iname "*txt" | xargs -n 1 ls -l

I am doubtful that the -regexp needs to be inferior to the the
-egrep option. What software would break: it was said that there is
no regexp?. There are opinions around saying that egrep is better
than grep.

What is the -E option: perhaps this?:   -eregex

Suppose it is case sensitive. Then it could be
  -eiregex  or  -ieregex  or  -eregexi

I hope for no regex if there is no '-iname' feature. [It would be
nice if the advanced regexes settled onto the Perl regex, e.g.
perhaps throughout all utilities. -pregex ]




At 21-02-01 14:12 +0900 Wednesday, Akinori MUSHA wrote:
 >At Wed, 21 Feb 2001 12:35:09 +0900,
 >Daniel C. Sobral wrote:
 >> I'm not familiar with find sources, but it seems to me you execute
 >> regcomp() for each file name to be compared? If so... change that! :-)
 >> Regcomp() does expensive setup so that regexec() can be run
 >> inexpensively many times over.
 >
 >Indeed.  I'll do it soon, thanks.
 >
 >> You forgot -E (use extended regexp syntax), and the example you show
 >> above is extended regexp syntax, not basic regexp syntax.
 >
 >Noted.
 >
...

What about improving 'ls' too?: can there be an option so that it
refuses to list any information about directories (useful in the
above example). Also, is there any plan to stop the wastage of
space in the central columns of "ls"'s output, where it lists
uninteresting information. Maybe a '-p' option, like GNU 'ls'
has.






E-mail: Craig Carey <[EMAIL PROTECTED]>  (backup [EMAIL PROTECTED])
Auckland, NZ. |  Snooz Metasearch: http://www.ijs.co.nz/info/snooz.htm



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: find(1) -regex/-iregex

2001-02-20 Thread Daniel C. Sobral

Akinori MUSHA wrote:
> 
> > > I'm not familiar with find sources, but it seems to me you execute
> > > regcomp() for each file name to be compared? If so... change that! :-)
> > > Regcomp() does expensive setup so that regexec() can be run
> > > inexpensively many times over.
> >
> > Indeed.  I'll do it soon, thanks.
> 
> Updated.
> 
> http://people.FreeBSD.org/~knu/misc/find_regex.diff

You might have done it, but the version above is not it. :-)

-- 
Daniel C. Sobral(8-DCS)
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]

"Too bad sentience isn't a marketable commodity."

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: find(1) -regex/-iregex

2001-02-20 Thread Akinori MUSHA

At Wed, 21 Feb 2001 18:53:26 +1300,
Craig Carey wrote:
> Can an  -iname  option be provided. Then the FreeBSD find would be
> more like GNU find, and lines like this could be written:

Yes, it's already implemented as I wrote in the previous mail.

> I am doubtful that the -regexp needs to be inferior to the the
> -egrep option. What software would break: it was said that there is
> no regexp?. There are opinions around saying that egrep is better
> than grep.

We are not aiming to be that GNU'ish.  Being compatible with NetBSD is
a good thing in a sense.

> What is the -E option: perhaps this?:   -eregex
> 
> Suppose it is case sensitive. Then it could be
>   -eiregex  or  -ieregex  or  -eregexi

I don't like it somehow..  I chose -E because it is consistent with
(our) grep(1) and sed(1).

> What about improving 'ls' too?: can there be an option so that it
> refuses to list any information about directories (useful in the
> above example). Also, is there any plan to stop the wastage of
> space in the central columns of "ls"'s output, where it lists
> uninteresting information. Maybe a '-p' option, like GNU 'ls'
> has.

Interesting, but it would be done at some other time.  I'd like to
focus on regex this time.

-- 
 /
/__  __Akinori.org / MUSHA.org
   / )  )  ) )  / FreeBSD.org / Ruby-lang.org
Akinori MUSHA aka / (_ /  ( (__(  @ iDaemons.org / and.or.jp

"We're only at home when we're on the run, on the wing, on the fly"

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: find(1) -regex/-iregex

2001-02-20 Thread Akinori MUSHA

At Wed, 21 Feb 2001 15:06:22 +0900,
Daniel C. Sobral wrote:
> > http://people.FreeBSD.org/~knu/misc/find_regex.diff
> 
> You might have done it, but the version above is not it. :-)

Oh, would you please reload it?

When you see a function named do_c_regex(), that's it. :)

-- 
 /
/__  __Akinori.org / MUSHA.org
   / )  )  ) )  / FreeBSD.org / Ruby-lang.org
Akinori MUSHA aka / (_ /  ( (__(  @ iDaemons.org / and.or.jp

"We're only at home when we're on the run, on the wing, on the fly"

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: find(1) -regex/-iregex

2001-02-22 Thread Alfred Perlstein

* Daniel C. Sobral <[EMAIL PROTECTED]> [010220 19:39] wrote:
> Alfred Perlstein wrote:
> > 
> > * Akinori MUSHA <[EMAIL PROTECTED]> [010220 11:19] wrote:
> > > Hi,
> > >
> > > I have implemented -regex and -iregex options for find(1):
> > >
> > 
> > Sounds good, just make sure the regex engine matches the one that
> > the other find(1)'s use.
> 
> It won't. GNU find certainly uses GNU regexp library, which has lots of
> extra stuff. Naturally, our find will be using our library instead.
>  Nothing we can do about it. It is the way of the Gnu to extend
> and embrace.

Well a subset or superset is fine, as long as it's the same type
of regex.

-Alfred

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message