Re: How to match regex in bash? (any character)

2011-11-17 Thread Linda Walsh



Chet Ramey wrote:


On 9/27/11 6:41 PM, Roger wrote:


Correct.  After reading the entire Bash Manual page, I didn't see much mention
of documentation resources (of ERE) besides maybe something about egrep from
Bash's Manual Page or elsewhere on the web.  After extensive research for
regex/regexpr, only found Perl Manual Pages.

Might be worth mentioning a link or good reference for this ERE within the Bash
Manual (Page)?


The bash man page refers to regex(3).  On my BSD (Mac OS X) system, that
refers to re_format(7), which documents the BRE and ERE regular expression
formats.  On an Ubuntu box, to choose a representative Linux example, that
refers to regex(7), which contains the same explanation, and the GNU regex
manual.  This sort of "chained" man page reference is common.

If you like info, `info regex' on a Linux box should display both pages.




---
If the poor guy looked at the pages you suggest,
he'd see various examples showing:

An atom is a regular expression enclosed in "()"
A bracket expression is a list of characters enclosed in "[]"
 To  use  a  literal
'-'  as  the first endpoint of a range, enclose it in "[." and ".]"For
example, if o
and ^  are  the  members  of  an  equivalence  class,  then  "[[=o=]]",
"[[=^=]]",  and  "[o^]"  are  all synonymous.ents.  A
null string is considered longer than no match at  all.   For  example,
"bb*"matchesthethreemiddle   characters   of   "abbbc",
"(wee|week)(knights|nights)"  matches  all  ten  characters  of  "week‐
 nights",  when "(.*).*" is matched against "abc" the parenthesized sub‐
expression matches all three characters, and when  "(a*)*"  is  matched
against  "bc"  both  the  whole  RE and the parenthesized subexpression
match the null string.











Re: How to match regex in bash? (any character)

2011-10-03 Thread Stephane CHAZELAS
2011-10-03, 13:48(+02), Andreas Schwab:
> Stephane CHAZELAS  writes:
>
>> The problem and confusion here comes from the fact that "\" is
>> overloaded and used by two different pieces of software (bash
>> and the system regex).
>
> That's nothing new.  The backslash is widely used as a quote character
> in several languages, which requires two levels of quoting if one of
> these languages is embedded in another one.
[...]

Yes, but in this case, contrary to zsh doesn't do two levels of
quoting. Bash quoting means to escape the RE operators, and
that's where the problem comes from. For it to work fully, bash
would need to implement the full RE parsing to know where to put
backslashes when characters are quoted.

Bash turns:

"." to \. before calling the regex(3) API
'[.]' to \[\.\] (fine)
['.'] to [\.] (not fine)
['a]'] to [a\]] (not fine)
(.)\1 to (.)1 (fine or not fine depending on how you want to
  look at it)
(?i:test} to (?i:test) (assuming regex(3) are implemented with
 PCREs: fine or not fine depending on how you want
 to look at it).

In zsh, it's simpler as quoting just quotes shell characters, it
doesn't try to escape regexp operators.

-- 
Stephane




Re: How to match regex in bash? (any character)

2011-10-03 Thread Andreas Schwab
Stephane CHAZELAS  writes:

> The problem and confusion here comes from the fact that "\" is
> overloaded and used by two different pieces of software (bash
> and the system regex).

That's nothing new.  The backslash is widely used as a quote character
in several languages, which requires two levels of quoting if one of
these languages is embedded in another one.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



Re: How to match regex in bash? (any character)

2011-10-03 Thread Stephane CHAZELAS
2011-10-02, 21:51(-04), Chet Ramey:
> On 10/2/11 3:43 PM, Stephane CHAZELAS wrote:
>
>> [*] actually, bash does some (undocumented) preprocessing on the
>> regexps, so even the regex(3) reference is misleading here.
>
> Not really.  The words are documented to undergo quote removal, so
> they undergo quote removal.  That turns \1 into 1, for instance.
[...]

The problem and confusion here comes from the fact that "\" is
overloaded and used by two different pieces of software (bash
and the system regex).

It is used:
  - by bash for quoting
  - by regex(3) to escape regexp characters in some
circumstances (for instance when not inside [...], but it may
vary per implementations (think of the (?{...} type extensions))
  - by some regex(3) implementations to introduce new regexp
operators (\w, \b, \<...)

BTW, another bug:

$ bash -c '[[ "\\" =~ ["."] ]]' && echo yes
yes

And what one could consider a bug:

~$ bash -c 'chars="a]"; [[ "a" =~ ["$chars"] ]]' && echo yes
~$ bash -c 'chars="a]"; [[ "a]" =~ ["$chars"] ]]' && echo yes
yes

I was wrong in saying that bash documentation should refer to
POSIX regexps as it disables extensions. It only  disables
extensions introduced by "\", not the ones introduced by
sequences that would otherwise be invalid in POSIX EREs like
"(?", {{, **...

It should still refer to POSIX regexps as it's the only ones
guaranteed to work. Any extension provided by the system's
regex(3) API may not work with bash.

-- 
Stephane




Re: How to match regex in bash? (any character)

2011-10-02 Thread Chet Ramey
On 10/2/11 3:43 PM, Stephane CHAZELAS wrote:

> [*] actually, bash does some (undocumented) preprocessing on the
> regexps, so even the regex(3) reference is misleading here.

Not really.  The words are documented to undergo quote removal, so
they undergo quote removal.  That turns \1 into 1, for instance.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: How to match regex in bash? (any character)

2011-10-02 Thread Stephane CHAZELAS
2011-10-1, 14:39(-08), rogerx@gmail.com:
[...]
> I took some time to examine the three regex references:
>
> 1) 
> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04
> Written more like a technical specification of regex.  Great if your're
> going to be modifying the regex code.  Difficult to follow if you're new,
> looking for info.

One thing to bear in mind is that bash calls a system library to
perform the regexp expansion (except that [*]), so it can't
really document how it's gonna work because it just can't know,
it may differ from system to system. The only thing that is more
or less guaranteed is that all those various implementation
should comply to that specification.

Above is the specification of the POSIX extended regular
expression, so a bash script writer should refer to that
document if he want to write a script for all the systems where
bash might be used.

> 2) regex(7)
> Although it looks good, upon further examination, I start to see run-on
> sentences.  It's more like a reference, which is what a man file should 
> be.
> At the bottom, "AUTHOR - This page was taken from Henry Spencer's regex
> package"

On the few systems where that man page is available, it may or
may not document the extended regular expressions that are
used when calling the regex(3) API (on my system, it doesn't).
Those regular expressions may or may not have extensions over
the POSIX API, and that document may or may not point out which
ones are extensions and which one are not, so a script writer may
be able to refer to that document if he wants his script to work
on that particular system (except that [*]).

> 3) grep(1)
> Section "REGULAR EXPRESSIONS".  At about half the size of regex(7), the
> section clearly explains regex and seems to be easily understandable for a
> person new to regex.

That's another utility that may or may not use the same API, in
the same way as bash or not. You get no warranty whatsoever that
the regexps covered there will be the same as bash's.

[*] actually, bash does some (undocumented) preprocessing on the
regexps, so even the regex(3) reference is misleading here.

For instance, on my system the regex(3) Extended REs support \1
for backreference, \b for word boundary, but when calling
[[ aa =~ (.)\1 ]], bash changes it to [[ aa =~ (.)1 ]] (note
that (.)\1 is not a portable regex as the behavior is
unspecified) bash won't behave as regex(3) documenta on my
system.

Also (and that could be considered a bug), "[\a]" is meant to
match either "\" or "a", but in bash, because of that
preprocessing, it doesn't:

$ bash -c '[[ "\\" =~ [\a] ]]' || echo no
no
$ bash -c '[[ "\\" =~ [\^] ]]' && echo yes
yes

Once that bug is fixed, bash should probably refer to POSIX EREs
(since its preprocessing would disable any extension introduced
by system libraries) rather than regex(3), as that would be more
accurate.

The situation with zsh:
  - it uses the same API as bash (unless the RE_MATCH_PCRE
option is set in which case it uses PCRE regexps)
  - it doesn't do the same preprocessing as bash because...
  - it doesn't implement that confusing business inherited from
ksh whereby quotes RE characters are taken literally.

  So, in zsh
  - [[ aa =~ '(.)\1' ]] works as documented in regex(3) on my
system (but may work differently on other systems as the
behavior is unspecified as per POSIX).
  - [[ '\' =~ '[\a]' ]] works as POSIX specifies
  - after "setopt RE_MATCH_PCRE", one gets a more portable
behavior as there is only one PCRE library (thouh different
versions).

The situation with ksh93:
  - Not POSIX either but a bit more consistent:
$ ksh -c '[[ "\\" =~ [\a] ]]' || echo no
no
$ ksh -c '[[ "\\" =~ [\^] ]]' || echo no
no
  - it implements its own regexps with its own many extensions
which therefore can be and are documented in its man page
but are not common to any other regex (though are mostly a
superset of the POSIX ERE).

-- 
Stephane



Re: How to match regex in bash? (any character)

2011-10-01 Thread rogerx . oss
> On Thu, Sep 29, 2011 at 11:53:20PM -0800, Roger wrote:
>> On Fri, Sep 30, 2011 at 06:20:32AM +, Stephane CHAZELAS wrote:
>>2011-09-29, 13:52(-08), Roger:
>>[...]
>>> Since you're saying the regex description is found within either regex(3) or
>>> regex(7), couldn't there be a brief note within the Bash Manual Page be 
>>> something
>>> to the effect:
>>[...]
>>
>>No, it's not.
>>
>>I suppose bash could say: See your system regex(3)
>>implementation documentation for the description of extended
>>regular expression syntax on your system. That syntax should be
>>compatible with one version or the other of the POSIX Extended
>>Regular Expression syntax whose specification for the latest
>>version as of writing can be found at:
>>http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04
>>
>>regex(3) points to the API (regex.h), how the system documents
>>the regexps covered by that API is beyond bash knowledge.


I took some time to examine the three regex references:

1) 
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04
Written more like a technical specification of regex.  Great if your're
going to be modifying the regex code.  Difficult to follow if you're new,
looking for info.

2) regex(7)
Although it looks good, upon further examination, I start to see run-on
sentences.  It's more like a reference, which is what a man file should be.
At the bottom, "AUTHOR - This page was taken from Henry Spencer's regex
package"

3) grep(1)
Section "REGULAR EXPRESSIONS".  At about half the size of regex(7), the
section clearly explains regex and seems to be easily understandable for a
person new to regex.


I'm thinking, the most people need to know about regex for Bash would be 3 & 2
from the above listing (in order of my preference or readability as the grep
manual was very concise and easy to follow).

And then, learn regex from other books such as;  Sed & (G)AWK book - has an
entire chapter devoted to regex, search/replace functions within Learning the
VI/VIM Editors book, and Grep Pocket Ref. - has three chapters for each  regex,
eregex, and perl regex.  I'm guessing, since the Grep Manual was good, the Grep
Pocket Ref. book will be equally as good.

(I fear buying the "Mastering Regular Expressions" as most say half of the
material within the book is only relevant to Perl.)

-- 
Roger
http://rogerx.freeshell.org/



Re: How to match regex in bash? (any character)

2011-09-30 Thread Roger
> On Fri, Sep 30, 2011 at 06:20:32AM +, Stephane CHAZELAS wrote:
>2011-09-29, 13:52(-08), Roger:
>[...]
>> Since you're saying the regex description is found within either regex(3) or
>> regex(7), couldn't there be a brief note within the Bash Manual Page be 
>> something
>> to the effect:
>[...]
>
>No, it's not.
>
>I suppose bash could say: See your system regex(3)
>implementation documentation for the description of extended
>regular expression syntax on your system. That syntax should be
>compatible with one version or the other of the POSIX Extended
>Regular Expression syntax whose specification for the latest
>version as of writing can be found at:
>http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04
>
>regex(3) points to the API (regex.h), how the system documents
>the regexps covered by that API is beyond bash knowledge.


Exactly, a simple "man regex" (aka regex(3)) points to regex.h here.  Few will
know how to search for the regex(7) manual file.

-- 
Roger
http://rogerx.freeshell.org/



Re: How to match regex in bash? (any character)

2011-09-29 Thread Stephane CHAZELAS
2011-09-29, 13:52(-08), Roger:
[...]
> Since you're saying the regex description is found within either regex(3) or
> regex(7), couldn't there be a brief note within the Bash Manual Page be 
> something
> to the effect:
[...]

No, it's not.

I suppose bash could say: See your system regex(3)
implementation documentation for the description of extended
regular expression syntax on your system. That syntax should be
compatible with one version or the other of the POSIX Extended
Regular Expression syntax whose specification for the latest
version as of writing can be found at:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04

regex(3) points to the API (regex.h), how the system documents
the regexps covered by that API is beyond bash knowledge.

-- 
Stephane




Re: How to match regex in bash? (any character)

2011-09-29 Thread Roger
> On Thu, Sep 29, 2011 at 12:06:08PM -0400, Chet Ramey wrote:
>On 9/29/11 11:59 AM, Peng Yu wrote:
>> On Thu, Sep 29, 2011 at 10:38 AM, Chet Ramey  wrote:
>>> On 9/29/11 9:48 AM, Peng Yu wrote:
>>>
 Therefore, either bash manpage should specify clearly which regex
 manpage it should be in each system (which a bad choice, because there
 can be a large number of systems), or the bash manpage should omit all
 the non consistent reference and say something like "see more details
 in info" or something else that is platform independent. Referring to
 regex(3) without any quantification is not a very good choice .
>>>
>>> Why, exactly?  regex(3) is the one thing that's portable across systems,
>>> it happens to describe the interfaces bash uses, and it contains the
>>> appropriate system-specific references.  `info' is considerably less
>>> portable and widespread than `man', so a man page reference is the best
>>> choice.
>> 
>> We all have discovered that regex(3) is not consistent across all the
>> platform. Why you say it is portable?
>
>That is not what we discovered.  We discovered that the name and section of
>the man page describing regular expression formats differs across systems.
>We also discovered that every system we checked (at least every system I
>checked) has a regex(3) man page, and that in most cases that page
>(regex(3)) eventually contains the appropriate system-specific reference
>to the page that describes the regular expression format.  That is why it
>is the most portable alternative.

Since you're saying the regex description is found within either regex(3) or
regex(7), couldn't there be a brief note within the Bash Manual Page be 
something
to the effect:

"For a description of regex, an unfortunate variable location depending on
system, see either regex(3) or regex(7)."

Bash Manual Page shows few manual pages within the "See Also" section, while
omitting regex(3)/(7) entirely within the "See Also" section.  And, to also
note, there's only *one* mention of regex (ie. regex(3)) within the Bash Manual
Page.

Checking the Grep Manual Page, it shows many additional manual pages for
"Regular Expressions".  (Info overload really according to what Greg has stated
about about ERE's.  Either the ERE's website URL or regex (7) is all that is 
needed.)

Or, adding a "Regular Expression" title under See Also section and pointing to
regex(3), regex(7) or ERE URL?

(For my use, I've got regex(7) and the ERE xbd_chap09 on my E-Book reader for
reference, with preference towards the xbd_chap09 print-to-pdf HTML page.)


As you stated, I'm only seeing "regex(3)" listed within the Bash Manual,
leaving the reader to try to ascertain, is the description really within
regex(3), or is it a little deeper such as regex(7)?  On initial inspection of
regex(3) by a beginner, they're going to be overwhelmed as regex(3) deals
entirely with the C programming language!  A non-programmer will just fail at
locating regex(7) as they'll be too overwhelmed.  A programmer might realize
the regex(3) isn't a real description and will kind of know that they're
misled and should/might look at regex(7).

Many entry level programmers likely use Bash, before moving on to C/C++, etc.

-- 
Roger
http://rogerx.freeshell.org/



Re: How to match regex in bash? (any character)

2011-09-29 Thread Chet Ramey
On 9/29/11 1:46 PM, Greg Wooledge wrote:

>An additional binary operator, =~, is available, with the same
>precedence as == and !=.  When it is used, the string to the
>right of the operator is considered an extended regular
>expression and matched accordingly (as in regex(3)). The return
>value is 0 if the string matches the pattern, and 1 otherwise.
>If the regular expression is syntactically incorrect, the
>conditional expression's return value is 2.  If the shell option
>nocasematch is enabled, the match is performed without regard to
>the case of alphabetic characters.  Any part of the pattern may
>be quoted to force it to be matched as a string.
> 
> The last sentence in the quote above.

I've changed that line in the current version of the manual page.  It
now reads:

`Any part of the pattern may be quoted to force the quoted portion
to be matched as a string.'

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: How to match regex in bash? (any character)

2011-09-29 Thread Chet Ramey
On 9/29/11 12:06 PM, Greg Wooledge wrote:

>> As I mentioned previously, the best is to add a few examples in man
>> bash.
> 
> I would not object to that, but I can't speak for Chet.

As I said, I will add examples to the info manual and some more
explanation to the man page.  Regular expressions are so fundamental
to Unix, though, I would expect familiarity with them as a given.

> Another option would be to refer to the POSIX definition of
> Extended Regular Expressions as a web site.  I wish they had
> better URLs, though.  The URL I have for it at the moment is
> http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04

That's old (issue 6, Posix 2004).  The current one is

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: How to match regex in bash? (any character)

2011-09-29 Thread Greg Wooledge
On Thu, Sep 29, 2011 at 11:18:57AM -0500, Peng Yu wrote:
> Also, regex(3) does not mention the difference between $x =~ .txt
> and $x=~ ".txt". I think that the difference should be addressed
> in man bash.

It already is.

   An additional binary operator, =~, is available, with the same
   precedence as == and !=.  When it is used, the string to the
   right of the operator is considered an extended regular
   expression and matched accordingly (as in regex(3)). The return
   value is 0 if the string matches the pattern, and 1 otherwise.
   If the regular expression is syntactically incorrect, the
   conditional expression's return value is 2.  If the shell option
   nocasematch is enabled, the match is performed without regard to
   the case of alphabetic characters.  Any part of the pattern may
   be quoted to force it to be matched as a string.

The last sentence in the quote above.

> Bottom line, regex(3) is not a good manpage to refer in the above
> sentence.

Maybe it's not a good one, but it is the only *possible* one.



Re: How to match regex in bash? (any character)

2011-09-29 Thread Roman Rakus

On 09/29/2011 06:18 PM, Peng Yu wrote:

Also, regex(3) does not mention the difference between $x =~ .txt
and $x=~ ".txt". I think that the difference should be addressed
in man bash.

It is in man bash.

RR



Re: How to match regex in bash? (any character)

2011-09-29 Thread Peng Yu
On Thu, Sep 29, 2011 at 11:06 AM, Greg Wooledge  wrote:
> On Thu, Sep 29, 2011 at 10:59:19AM -0500, Peng Yu wrote:
>> We all have discovered that regex(3) is not consistent across all the
>> platform. Why you say it is portable?
>
> The three systems I mentioned earlier today all have regex(3).  Which
> system have you found, which doesn't have it?

I think that I misunderstood some of the previous emails.

However, on ubuntu, there is regex(3) and regex(7). Based on the
context in man bash, regex(7) is more relevant than regex(3), although
regex(3) does mention extend regular expression, it is more of a
document for the C interface.

"When it is used, the string to the right of the operator is
considered an extended regular expression and matched accordingly (as
in regex(3))"

Also, regex(3) does not mention the difference between $x =~ .txt
and $x=~ ".txt". I think that the difference should be addressed
in man bash.

Bottom line, regex(3) is not a good manpage to refer in the above
sentence. It is better to think of other alternative rather than
trying to justify we should stuck with it.

>> As I mentioned previously, the best is to add a few examples in man
>> bash.
>
> I would not object to that, but I can't speak for Chet.
>
> Another option would be to refer to the POSIX definition of
> Extended Regular Expressions as a web site.  I wish they had
> better URLs, though.  The URL I have for it at the moment is
> http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04
>



-- 
Regards,
Peng



Re: How to match regex in bash? (any character)

2011-09-29 Thread Greg Wooledge
On Thu, Sep 29, 2011 at 10:59:19AM -0500, Peng Yu wrote:
> We all have discovered that regex(3) is not consistent across all the
> platform. Why you say it is portable?

The three systems I mentioned earlier today all have regex(3).  Which
system have you found, which doesn't have it?

> As I mentioned previously, the best is to add a few examples in man
> bash.

I would not object to that, but I can't speak for Chet.

Another option would be to refer to the POSIX definition of
Extended Regular Expressions as a web site.  I wish they had
better URLs, though.  The URL I have for it at the moment is
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04



Re: How to match regex in bash? (any character)

2011-09-29 Thread Chet Ramey
On 9/29/11 11:59 AM, Peng Yu wrote:
> On Thu, Sep 29, 2011 at 10:38 AM, Chet Ramey  wrote:
>> On 9/29/11 9:48 AM, Peng Yu wrote:
>>
>>> Therefore, either bash manpage should specify clearly which regex
>>> manpage it should be in each system (which a bad choice, because there
>>> can be a large number of systems), or the bash manpage should omit all
>>> the non consistent reference and say something like "see more details
>>> in info" or something else that is platform independent. Referring to
>>> regex(3) without any quantification is not a very good choice .
>>
>> Why, exactly?  regex(3) is the one thing that's portable across systems,
>> it happens to describe the interfaces bash uses, and it contains the
>> appropriate system-specific references.  `info' is considerably less
>> portable and widespread than `man', so a man page reference is the best
>> choice.
> 
> We all have discovered that regex(3) is not consistent across all the
> platform. Why you say it is portable?

That is not what we discovered.  We discovered that the name and section of
the man page describing regular expression formats differs across systems.
We also discovered that every system we checked (at least every system I
checked) has a regex(3) man page, and that in most cases that page
(regex(3)) eventually contains the appropriate system-specific reference
to the page that describes the regular expression format.  That is why it
is the most portable alternative.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: How to match regex in bash? (any character)

2011-09-29 Thread Peng Yu
On Thu, Sep 29, 2011 at 10:38 AM, Chet Ramey  wrote:
> On 9/29/11 9:48 AM, Peng Yu wrote:
>
>> Therefore, either bash manpage should specify clearly which regex
>> manpage it should be in each system (which a bad choice, because there
>> can be a large number of systems), or the bash manpage should omit all
>> the non consistent reference and say something like "see more details
>> in info" or something else that is platform independent. Referring to
>> regex(3) without any quantification is not a very good choice .
>
> Why, exactly?  regex(3) is the one thing that's portable across systems,
> it happens to describe the interfaces bash uses, and it contains the
> appropriate system-specific references.  `info' is considerably less
> portable and widespread than `man', so a man page reference is the best
> choice.

We all have discovered that regex(3) is not consistent across all the
platform. Why you say it is portable?

As I mentioned previously, the best is to add a few examples in man
bash. Based on the assumption that you don't want to add an example in
man bash, then the next choice to add a reference to info, even though
it may not always be available in all the system (but as least it
should be downloadable from bash gnu website).

-- 
Regards,
Peng



Re: How to match regex in bash? (any character)

2011-09-29 Thread Chet Ramey
On 9/29/11 9:48 AM, Peng Yu wrote:

> Therefore, either bash manpage should specify clearly which regex
> manpage it should be in each system (which a bad choice, because there
> can be a large number of systems), or the bash manpage should omit all
> the non consistent reference and say something like "see more details
> in info" or something else that is platform independent. Referring to
> regex(3) without any quantification is not a very good choice .

Why, exactly?  regex(3) is the one thing that's portable across systems,
it happens to describe the interfaces bash uses, and it contains the
appropriate system-specific references.  `info' is considerably less
portable and widespread than `man', so a man page reference is the best
choice.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: How to match regex in bash? (any character)

2011-09-29 Thread Peng Yu
On Thu, Sep 29, 2011 at 7:22 AM, Greg Wooledge  wrote:
> On Wed, Sep 28, 2011 at 12:43:01PM -0800, Roger wrote:
>> Seems I used 'man regex' as well here.  AKA regex(3).  But I did
>> realize this a few weeks ago; the real regex description being 'man 7 regex'.
>> The Bash Manual Page denotes only regex(3).
>
> You're relatively fortunate that it's *that* easy to find on Linux.  On
> Linux, regex(3) points directly to regex(7), and you're done.
>
> On HP-UX, regex(3X) points to regcomp(3C) which points to regexp(5) which
> contains the actual definitions.
>
> On OpenBSD, regex(3) doesn't even *have* a SEE ALSO section; it's a dead
> end.  And regcomp(3) is the same page as regex(3), so that doesn't help
> either.  One would have to backtrack entirely, perhaps to grep(1).
> However, buried deep in the regex(3) page is a reference to re_format(7)
> (not even boldface).  And re_format(7) has the definitions, but getting
> there takes perseverance.  (For the record, grep(1) does point straight
> to re_format(7).)
>
> So you see, bash(1) *cannot* just link directly to regex(7), because
> that's not actually the correct final destination on most operating
> systems.  It's only correct on Linux.  Bash uses the regex(3) library
> interface, so that is the correct place for bash to refer the reader.

Therefore, either bash manpage should specify clearly which regex
manpage it should be in each system (which a bad choice, because there
can be a large number of systems), or the bash manpage should omit all
the non consistent reference and say something like "see more details
in info" or something else that is platform independent. Referring to
regex(3) without any quantification is not a very good choice .

-- 
Regards,
Peng



Re: How to match regex in bash? (any character)

2011-09-29 Thread Greg Wooledge
On Wed, Sep 28, 2011 at 12:43:01PM -0800, Roger wrote:
> Seems I used 'man regex' as well here.  AKA regex(3).  But I did
> realize this a few weeks ago; the real regex description being 'man 7 regex'.
> The Bash Manual Page denotes only regex(3).

You're relatively fortunate that it's *that* easy to find on Linux.  On
Linux, regex(3) points directly to regex(7), and you're done.

On HP-UX, regex(3X) points to regcomp(3C) which points to regexp(5) which
contains the actual definitions.

On OpenBSD, regex(3) doesn't even *have* a SEE ALSO section; it's a dead
end.  And regcomp(3) is the same page as regex(3), so that doesn't help
either.  One would have to backtrack entirely, perhaps to grep(1).
However, buried deep in the regex(3) page is a reference to re_format(7)
(not even boldface).  And re_format(7) has the definitions, but getting
there takes perseverance.  (For the record, grep(1) does point straight
to re_format(7).)

So you see, bash(1) *cannot* just link directly to regex(7), because
that's not actually the correct final destination on most operating
systems.  It's only correct on Linux.  Bash uses the regex(3) library
interface, so that is the correct place for bash to refer the reader.



Re: How to match regex in bash? (any character)

2011-09-29 Thread Chet Ramey
> Seems I used 'man regex' as well here.  AKA regex(3).  But I did
> realize this a few weeks ago; the real regex description being 'man 7 regex'.
> The Bash Manual Page denotes only regex(3).

Not all the world is Linux.  The regex(3) reference is the only one
that is consistent across different operating systems.  In addition
to the ones I mentioned above, Solaris and HP-UX use regexp(5), for
example.

I think the texinfo manual could stand a few more examples, especially
for the =~ operator.

Chet

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: How to match regex in bash? (any character)

2011-09-28 Thread Roger
> On Tue, Sep 27, 2011 at 07:58:50PM -0500, Peng Yu wrote:
>On Tue, Sep 27, 2011 at 6:51 PM, Chet Ramey  wrote:
>> On 9/27/11 6:41 PM, Roger wrote:
>>
>>> Correct.  After reading the entire Bash Manual page, I didn't see much 
>>> mention
>>> of documentation resources (of ERE) besides maybe something about egrep from
>>> Bash's Manual Page or elsewhere on the web.  After extensive research for
>>> regex/regexpr, only found Perl Manual Pages.
>>>
>>> Might be worth mentioning a link or good reference for this ERE within the 
>>> Bash
>>> Manual (Page)?
>>
>> The bash man page refers to regex(3).  On my BSD (Mac OS X) system, that
>> refers to re_format(7), which documents the BRE and ERE regular expression
>> formats.  On an Ubuntu box, to choose a representative Linux example, that
>> refers to regex(7), which contains the same explanation, and the GNU regex
>> manual.  This sort of "chained" man page reference is common.
>>
>> If you like info, `info regex' on a Linux box should display both pages.
>
>Since regex(7) is actually what should be referred on ubuntu, and
>there is indeed a manpage regex(3) on ubuntu, the difference on which
>regex man page should be specify in man bash. I was looking at
>regex(3) on my ubuntu, which doesn't have any relevant information.

Ditto.

Seems I used 'man regex' as well here.  AKA regex(3).  But I did
realize this a few weeks ago; the real regex description being 'man 7 regex'.
The Bash Manual Page denotes only regex(3).


>Also, adding a few more examples just cost a few extra lines, I don't
>think that the manpage should be so frugal in terms of adding examples
>to elucidate important concepts.

Ditto.

As to why I'm suggesting one or two examples for Bash Parameter
Expansion. ;-)  However, I think the best examples for Parameter Expansion
is code with a sample text next to the code of what text will look like
after it's passed through the code. ie. Greg's Wiki Bash FAQ - Parameter
Expansion.

-- 
Roger
http://rogerx.freeshell.org/



Re: How to match regex in bash? (any character)

2011-09-27 Thread Peng Yu
On Tue, Sep 27, 2011 at 6:51 PM, Chet Ramey  wrote:
> On 9/27/11 6:41 PM, Roger wrote:
>
>> Correct.  After reading the entire Bash Manual page, I didn't see much 
>> mention
>> of documentation resources (of ERE) besides maybe something about egrep from
>> Bash's Manual Page or elsewhere on the web.  After extensive research for
>> regex/regexpr, only found Perl Manual Pages.
>>
>> Might be worth mentioning a link or good reference for this ERE within the 
>> Bash
>> Manual (Page)?
>
> The bash man page refers to regex(3).  On my BSD (Mac OS X) system, that
> refers to re_format(7), which documents the BRE and ERE regular expression
> formats.  On an Ubuntu box, to choose a representative Linux example, that
> refers to regex(7), which contains the same explanation, and the GNU regex
> manual.  This sort of "chained" man page reference is common.
>
> If you like info, `info regex' on a Linux box should display both pages.

Since regex(7) is actually what should be referred on ubuntu, and
there is indeed a manpage regex(3) on ubuntu, the difference on which
regex man page should be specify in man bash. I was looking at
regex(3) on my ubuntu, which doesn't have any relevant information.

Also, adding a few more examples just cost a few extra lines, I don't
think that the manpage should be so frugal in terms of adding examples
to elucidate important concepts.

-- 
Regards,
Peng



Re: How to match regex in bash? (any character)

2011-09-27 Thread Chet Ramey
On 9/27/11 6:41 PM, Roger wrote:

> Correct.  After reading the entire Bash Manual page, I didn't see much mention
> of documentation resources (of ERE) besides maybe something about egrep from
> Bash's Manual Page or elsewhere on the web.  After extensive research for
> regex/regexpr, only found Perl Manual Pages.
> 
> Might be worth mentioning a link or good reference for this ERE within the 
> Bash
> Manual (Page)?

The bash man page refers to regex(3).  On my BSD (Mac OS X) system, that
refers to re_format(7), which documents the BRE and ERE regular expression
formats.  On an Ubuntu box, to choose a representative Linux example, that
refers to regex(7), which contains the same explanation, and the GNU regex
manual.  This sort of "chained" man page reference is common.

If you like info, `info regex' on a Linux box should display both pages.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: How to match regex in bash? (any character)

2011-09-27 Thread Roger
> On Tue, Sep 27, 2011 at 08:15:09AM -0400, Greg Wooledge wrote:
>On Mon, Sep 26, 2011 at 07:06:30PM -0800, Roger wrote:
>> Some good reading I found is under the Bash Manual Page section "Parameter
>> Expansion".
>> 
>> From here, to learn more about regex/regexpr as the Bash Manual is quite 
>> brief
>> on regex, use the following manual pages:
>> 
>> perlretut - Gives a good from the start explanation of regular expressions,
>> including perl
>
>Perl's regular expressions are not the same as Bash's.  Bash uses standard
>POSIX Extended Regular Expressions (ERE).  You can find formal documentation
>at 
>http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04
>
>Or see "man egrep", as egrep (or grep -E) also uses EREs.  Or see any
>web page that discusses EREs.
>
>Avoid reading documentation from a different language (in this case Perl),
>because the features tend to change.  Perl uses a feature set and syntax
>that have been retroactively dubbed Perl Compatible Regular Expressions
>(PCRE).  They're superficially similar to EREs, but have a much broader
>range of features (extensions) that are not compatible and will not work
>in Bash.

Correct.  After reading the entire Bash Manual page, I didn't see much mention
of documentation resources (of ERE) besides maybe something about egrep from
Bash's Manual Page or elsewhere on the web.  After extensive research for
regex/regexpr, only found Perl Manual Pages.

Might be worth mentioning a link or good reference for this ERE within the Bash
Manual (Page)?

-- 
Roger
http://rogerx.freeshell.org/



Re: How to match regex in bash? (any character)

2011-09-27 Thread Greg Wooledge
On Mon, Sep 26, 2011 at 07:06:30PM -0800, Roger wrote:
> Some good reading I found is under the Bash Manual Page section "Parameter
> Expansion".
> 
> From here, to learn more about regex/regexpr as the Bash Manual is quite brief
> on regex, use the following manual pages:
> 
> perlretut - Gives a good from the start explanation of regular expressions,
> including perl

Perl's regular expressions are not the same as Bash's.  Bash uses standard
POSIX Extended Regular Expressions (ERE).  You can find formal documentation
at 
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04

Or see "man egrep", as egrep (or grep -E) also uses EREs.  Or see any
web page that discusses EREs.

Avoid reading documentation from a different language (in this case Perl),
because the features tend to change.  Perl uses a feature set and syntax
that have been retroactively dubbed Perl Compatible Regular Expressions
(PCRE).  They're superficially similar to EREs, but have a much broader
range of features (extensions) that are not compatible and will not work
in Bash.



Re: How to match regex in bash? (any character)

2011-09-26 Thread Roger
> On Mon, Sep 26, 2011 at 09:37:07PM -0500, Dennis Williamson wrote:
>On Mon, Sep 26, 2011 at 8:19 PM, Peng Yu  wrote:
>> Hi,
>>
>> I know that I should use =~ to match regex (bash version 4).
>>
>> However, the man page is not very clear. I don't find how to match
>> (matching any single character). For example, the following regex
>> doesn't match txt. Does anybody know how to match any character
>> (should be '.' in perl) in bash.
>>
>> [[ "$1" =~ "xxx.txt" ]]
>>
>
>When you quote the string on the right hand side of =~ it changes to a
>simple string match instead of a regex match. It is sometimes
>difficult to specify a regex literally (and unquoted), so it's best to
>use a variable as shown in Steven's reply to you.
>

I believe the Bash Manual also strongly suggests, using variables for matching
as well. ;-)

-- 
Roger
http://rogerx.freeshell.org/



Re: How to match regex in bash? (any character)

2011-09-26 Thread Roger
> On Mon, Sep 26, 2011 at 08:19:27PM -0500, Peng Yu wrote:
>Hi,
>
>I know that I should use =~ to match regex (bash version 4).
>
>However, the man page is not very clear. I don't find how to match
>(matching any single character). For example, the following regex
>doesn't match txt. Does anybody know how to match any character
>(should be '.' in perl) in bash.
>
>[[ "$1" =~ "xxx.txt" ]]

Some good reading I found is under the Bash Manual Page section "Parameter
Expansion".

>From here, to learn more about regex/regexpr as the Bash Manual is quite brief
on regex, use the following manual pages:

perlretut - Gives a good from the start explanation of regular expressions,
including perl

perlrequick - If you already know some perl, then just a quick start should do.


There's a lot of Perl Manual Pages and 'man perltoc' will get you a full
listing of manual pages including descriptions.  (I'm currently reading the
perlretut man page as I do not know much perl language at all!)


Is it possible to get more documentation or examples into the Bash Manual
concerning regex.  Maybe some references to the above manual pages or are we
talking severe conflict of interest?  At the very least, one or two common Bash
Parameter examples would be nice!

-- 
Roger
http://rogerx.freeshell.org/



Re: How to match regex in bash? (any character)

2011-09-26 Thread Peng Yu
On Mon, Sep 26, 2011 at 9:49 PM, John Reiser  wrote:
> Peng Yu wrote:
>> I know that I should use =~ to match regex (bash version 4).
>>
>> However, the man page is not very clear. I don't find how to match
>> (matching any single character). For example, the following regex
>> doesn't match txt. Does anybody know how to match any character
>> (should be '.' in perl) in bash.
>>
>> [[ "$1" =~ "xxx.txt" ]]
>
> The manual page for bash says that the rules of regex(3) apply:
>
>              An additional binary operator,  =~,  is  available,  with  the  
> same
>              precedence  as  == and !=.  When it is used, the string to the 
> right
>              of the operator is considered an  extended  regular  expression  
> and
>              matched  accordingly (as in regex(3)).  The return value is 0 if 
> the
>              string matches the pattern, and 1 otherwise.
> and also:
>               Any part of the pattern may be quoted to force it to be matched 
> as a
>              string.
>
> Thus in the expression   [[ "$1" =~ "xxx.txt" ]]   the fact that the pattern
> is quoted [here the whole pattern appears within double quotes] has turned the
> dot '.' into a plain literal character, instead of a meta-character which 
> matches
> any single character.
>
> The usual method of avoiding quotes in the pattern is to omit them:
>              [[ $1 =~ xxx.txt ]]   # the dot '.' in the pattern is a 
> meta-character
> or to use a variable:
>              pattern="xxx.txt"   # a 7-character string
>              [[ $1 =~ $pattern ]]   # the dot '.' in $pattern is a 
> meta-character
> Example: using all literals in an instance of bash:
>               $ [[ txt =~ xxx.txt ]] && echo true
>               true
>               $
>
> Also notice that quotes are not needed around the left-hand side  $1 :
>                                                                       Word 
> split‐
>              ting and pathname expansion are not performed on the  words  
> between
>              the  [[  and  ]] ...
>
> Thus there is no need to use quotation marks to suppress word splitting
> inside double brackets [[ ... ]].

Thanks for the clarifications of all the replies. Now the manual makes
much more sense to me.

-- 
Regards,
Peng



Re: How to match regex in bash? (any character)

2011-09-26 Thread John Reiser
Peng Yu wrote:
> I know that I should use =~ to match regex (bash version 4).
> 
> However, the man page is not very clear. I don't find how to match
> (matching any single character). For example, the following regex
> doesn't match txt. Does anybody know how to match any character
> (should be '.' in perl) in bash.
> 
> [[ "$1" =~ "xxx.txt" ]]

The manual page for bash says that the rules of regex(3) apply:

  An additional binary operator,  =~,  is  available,  with  the  
same
  precedence  as  == and !=.  When it is used, the string to the 
right
  of the operator is considered an  extended  regular  expression  
and
  matched  accordingly (as in regex(3)).  The return value is 0 if 
the
  string matches the pattern, and 1 otherwise.
and also:
   Any part of the pattern may be quoted to force it to be matched 
as a
  string.

Thus in the expression   [[ "$1" =~ "xxx.txt" ]]   the fact that the pattern
is quoted [here the whole pattern appears within double quotes] has turned the
dot '.' into a plain literal character, instead of a meta-character which 
matches
any single character.

The usual method of avoiding quotes in the pattern is to omit them:
  [[ $1 =~ xxx.txt ]]   # the dot '.' in the pattern is a 
meta-character
or to use a variable:
  pattern="xxx.txt"   # a 7-character string
  [[ $1 =~ $pattern ]]   # the dot '.' in $pattern is a 
meta-character
Example: using all literals in an instance of bash:
   $ [[ txt =~ xxx.txt ]] && echo true
   true
   $

Also notice that quotes are not needed around the left-hand side  $1 :
   Word 
split‐
  ting and pathname expansion are not performed on the  words  
between
  the  [[  and  ]] ...

Thus there is no need to use quotation marks to suppress word splitting
inside double brackets [[ ... ]].

-- 



Re: How to match regex in bash? (any character)

2011-09-26 Thread Dennis Williamson
On Mon, Sep 26, 2011 at 8:19 PM, Peng Yu  wrote:
> Hi,
>
> I know that I should use =~ to match regex (bash version 4).
>
> However, the man page is not very clear. I don't find how to match
> (matching any single character). For example, the following regex
> doesn't match txt. Does anybody know how to match any character
> (should be '.' in perl) in bash.
>
> [[ "$1" =~ "xxx.txt" ]]
>
>
> --
> Regards,
> Peng
>
>

When you quote the string on the right hand side of =~ it changes to a
simple string match instead of a regex match. It is sometimes
difficult to specify a regex literally (and unquoted), so it's best to
use a variable as shown in Steven's reply to you.

The quoting is most likely unnecessary on the left hand side as well.

-- 
Visit serverfault.com to get your system administration questions answered.



Re: How to match regex in bash? (any character)

2011-09-26 Thread Steven W. Orr

On 9/26/2011 9:19 PM, Peng Yu wrote:

Hi,

I know that I should use =~ to match regex (bash version 4).

However, the man page is not very clear. I don't find how to match
(matching any single character). For example, the following regex
doesn't match txt. Does anybody know how to match any character
(should be '.' in perl) in bash.

[[ "$1" =~ "xxx.txt" ]]




Looks good to me.

513 > regex='xxx.txt'
514 > [[ txt =~ $regex ]]
515 > echo $?
0
516 >


--
Time flies like the wind. Fruit flies like a banana. Stranger things have  .0.
happened but none stranger than this. Does your driver's license say Organ ..0
Donor?Black holes are where God divided by zero. Listen to me! We are all- 000
individuals! What if this weren't a hypothetical question?
steveo at syslang.net



How to match regex in bash? (any character)

2011-09-26 Thread Peng Yu
Hi,

I know that I should use =~ to match regex (bash version 4).

However, the man page is not very clear. I don't find how to match
(matching any single character). For example, the following regex
doesn't match txt. Does anybody know how to match any character
(should be '.' in perl) in bash.

[[ "$1" =~ "xxx.txt" ]]


-- 
Regards,
Peng