Re: How to match regex in bash? (any character)
Chet Ramey wrote: On 9/27/11 6:41 PM, Roger wrote: Correct. After reading the entire Bash Manual page, I didn't see much mention of documentation resources (of ERE) besides maybe something about egrep from Bash's Manual Page or elsewhere on the web. After extensive research for regex/regexpr, only found Perl Manual Pages. Might be worth mentioning a link or good reference for this ERE within the Bash Manual (Page)? The bash man page refers to regex(3). On my BSD (Mac OS X) system, that refers to re_format(7), which documents the BRE and ERE regular expression formats. On an Ubuntu box, to choose a representative Linux example, that refers to regex(7), which contains the same explanation, and the GNU regex manual. This sort of "chained" man page reference is common. If you like info, `info regex' on a Linux box should display both pages. --- If the poor guy looked at the pages you suggest, he'd see various examples showing: An atom is a regular expression enclosed in "()" A bracket expression is a list of characters enclosed in "[]" To use a literal '-' as the first endpoint of a range, enclose it in "[." and ".]"For example, if o and ^ are the members of an equivalence class, then "[[=o=]]", "[[=^=]]", and "[o^]" are all synonymous.ents. A null string is considered longer than no match at all. For example, "bb*"matchesthethreemiddle characters of "abbbc", "(wee|week)(knights|nights)" matches all ten characters of "week‐ nights", when "(.*).*" is matched against "abc" the parenthesized sub‐ expression matches all three characters, and when "(a*)*" is matched against "bc" both the whole RE and the parenthesized subexpression match the null string.
Re: How to match regex in bash? (any character)
2011-10-03, 13:48(+02), Andreas Schwab: > Stephane CHAZELAS writes: > >> The problem and confusion here comes from the fact that "\" is >> overloaded and used by two different pieces of software (bash >> and the system regex). > > That's nothing new. The backslash is widely used as a quote character > in several languages, which requires two levels of quoting if one of > these languages is embedded in another one. [...] Yes, but in this case, contrary to zsh doesn't do two levels of quoting. Bash quoting means to escape the RE operators, and that's where the problem comes from. For it to work fully, bash would need to implement the full RE parsing to know where to put backslashes when characters are quoted. Bash turns: "." to \. before calling the regex(3) API '[.]' to \[\.\] (fine) ['.'] to [\.] (not fine) ['a]'] to [a\]] (not fine) (.)\1 to (.)1 (fine or not fine depending on how you want to look at it) (?i:test} to (?i:test) (assuming regex(3) are implemented with PCREs: fine or not fine depending on how you want to look at it). In zsh, it's simpler as quoting just quotes shell characters, it doesn't try to escape regexp operators. -- Stephane
Re: How to match regex in bash? (any character)
Stephane CHAZELAS writes: > The problem and confusion here comes from the fact that "\" is > overloaded and used by two different pieces of software (bash > and the system regex). That's nothing new. The backslash is widely used as a quote character in several languages, which requires two levels of quoting if one of these languages is embedded in another one. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: How to match regex in bash? (any character)
2011-10-02, 21:51(-04), Chet Ramey: > On 10/2/11 3:43 PM, Stephane CHAZELAS wrote: > >> [*] actually, bash does some (undocumented) preprocessing on the >> regexps, so even the regex(3) reference is misleading here. > > Not really. The words are documented to undergo quote removal, so > they undergo quote removal. That turns \1 into 1, for instance. [...] The problem and confusion here comes from the fact that "\" is overloaded and used by two different pieces of software (bash and the system regex). It is used: - by bash for quoting - by regex(3) to escape regexp characters in some circumstances (for instance when not inside [...], but it may vary per implementations (think of the (?{...} type extensions)) - by some regex(3) implementations to introduce new regexp operators (\w, \b, \<...) BTW, another bug: $ bash -c '[[ "\\" =~ ["."] ]]' && echo yes yes And what one could consider a bug: ~$ bash -c 'chars="a]"; [[ "a" =~ ["$chars"] ]]' && echo yes ~$ bash -c 'chars="a]"; [[ "a]" =~ ["$chars"] ]]' && echo yes yes I was wrong in saying that bash documentation should refer to POSIX regexps as it disables extensions. It only disables extensions introduced by "\", not the ones introduced by sequences that would otherwise be invalid in POSIX EREs like "(?", {{, **... It should still refer to POSIX regexps as it's the only ones guaranteed to work. Any extension provided by the system's regex(3) API may not work with bash. -- Stephane
Re: How to match regex in bash? (any character)
On 10/2/11 3:43 PM, Stephane CHAZELAS wrote: > [*] actually, bash does some (undocumented) preprocessing on the > regexps, so even the regex(3) reference is misleading here. Not really. The words are documented to undergo quote removal, so they undergo quote removal. That turns \1 into 1, for instance. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: How to match regex in bash? (any character)
2011-10-1, 14:39(-08), rogerx@gmail.com: [...] > I took some time to examine the three regex references: > > 1) > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04 > Written more like a technical specification of regex. Great if your're > going to be modifying the regex code. Difficult to follow if you're new, > looking for info. One thing to bear in mind is that bash calls a system library to perform the regexp expansion (except that [*]), so it can't really document how it's gonna work because it just can't know, it may differ from system to system. The only thing that is more or less guaranteed is that all those various implementation should comply to that specification. Above is the specification of the POSIX extended regular expression, so a bash script writer should refer to that document if he want to write a script for all the systems where bash might be used. > 2) regex(7) > Although it looks good, upon further examination, I start to see run-on > sentences. It's more like a reference, which is what a man file should > be. > At the bottom, "AUTHOR - This page was taken from Henry Spencer's regex > package" On the few systems where that man page is available, it may or may not document the extended regular expressions that are used when calling the regex(3) API (on my system, it doesn't). Those regular expressions may or may not have extensions over the POSIX API, and that document may or may not point out which ones are extensions and which one are not, so a script writer may be able to refer to that document if he wants his script to work on that particular system (except that [*]). > 3) grep(1) > Section "REGULAR EXPRESSIONS". At about half the size of regex(7), the > section clearly explains regex and seems to be easily understandable for a > person new to regex. That's another utility that may or may not use the same API, in the same way as bash or not. You get no warranty whatsoever that the regexps covered there will be the same as bash's. [*] actually, bash does some (undocumented) preprocessing on the regexps, so even the regex(3) reference is misleading here. For instance, on my system the regex(3) Extended REs support \1 for backreference, \b for word boundary, but when calling [[ aa =~ (.)\1 ]], bash changes it to [[ aa =~ (.)1 ]] (note that (.)\1 is not a portable regex as the behavior is unspecified) bash won't behave as regex(3) documenta on my system. Also (and that could be considered a bug), "[\a]" is meant to match either "\" or "a", but in bash, because of that preprocessing, it doesn't: $ bash -c '[[ "\\" =~ [\a] ]]' || echo no no $ bash -c '[[ "\\" =~ [\^] ]]' && echo yes yes Once that bug is fixed, bash should probably refer to POSIX EREs (since its preprocessing would disable any extension introduced by system libraries) rather than regex(3), as that would be more accurate. The situation with zsh: - it uses the same API as bash (unless the RE_MATCH_PCRE option is set in which case it uses PCRE regexps) - it doesn't do the same preprocessing as bash because... - it doesn't implement that confusing business inherited from ksh whereby quotes RE characters are taken literally. So, in zsh - [[ aa =~ '(.)\1' ]] works as documented in regex(3) on my system (but may work differently on other systems as the behavior is unspecified as per POSIX). - [[ '\' =~ '[\a]' ]] works as POSIX specifies - after "setopt RE_MATCH_PCRE", one gets a more portable behavior as there is only one PCRE library (thouh different versions). The situation with ksh93: - Not POSIX either but a bit more consistent: $ ksh -c '[[ "\\" =~ [\a] ]]' || echo no no $ ksh -c '[[ "\\" =~ [\^] ]]' || echo no no - it implements its own regexps with its own many extensions which therefore can be and are documented in its man page but are not common to any other regex (though are mostly a superset of the POSIX ERE). -- Stephane
Re: How to match regex in bash? (any character)
> On Thu, Sep 29, 2011 at 11:53:20PM -0800, Roger wrote: >> On Fri, Sep 30, 2011 at 06:20:32AM +, Stephane CHAZELAS wrote: >>2011-09-29, 13:52(-08), Roger: >>[...] >>> Since you're saying the regex description is found within either regex(3) or >>> regex(7), couldn't there be a brief note within the Bash Manual Page be >>> something >>> to the effect: >>[...] >> >>No, it's not. >> >>I suppose bash could say: See your system regex(3) >>implementation documentation for the description of extended >>regular expression syntax on your system. That syntax should be >>compatible with one version or the other of the POSIX Extended >>Regular Expression syntax whose specification for the latest >>version as of writing can be found at: >>http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04 >> >>regex(3) points to the API (regex.h), how the system documents >>the regexps covered by that API is beyond bash knowledge. I took some time to examine the three regex references: 1) http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04 Written more like a technical specification of regex. Great if your're going to be modifying the regex code. Difficult to follow if you're new, looking for info. 2) regex(7) Although it looks good, upon further examination, I start to see run-on sentences. It's more like a reference, which is what a man file should be. At the bottom, "AUTHOR - This page was taken from Henry Spencer's regex package" 3) grep(1) Section "REGULAR EXPRESSIONS". At about half the size of regex(7), the section clearly explains regex and seems to be easily understandable for a person new to regex. I'm thinking, the most people need to know about regex for Bash would be 3 & 2 from the above listing (in order of my preference or readability as the grep manual was very concise and easy to follow). And then, learn regex from other books such as; Sed & (G)AWK book - has an entire chapter devoted to regex, search/replace functions within Learning the VI/VIM Editors book, and Grep Pocket Ref. - has three chapters for each regex, eregex, and perl regex. I'm guessing, since the Grep Manual was good, the Grep Pocket Ref. book will be equally as good. (I fear buying the "Mastering Regular Expressions" as most say half of the material within the book is only relevant to Perl.) -- Roger http://rogerx.freeshell.org/
Re: How to match regex in bash? (any character)
> On Fri, Sep 30, 2011 at 06:20:32AM +, Stephane CHAZELAS wrote: >2011-09-29, 13:52(-08), Roger: >[...] >> Since you're saying the regex description is found within either regex(3) or >> regex(7), couldn't there be a brief note within the Bash Manual Page be >> something >> to the effect: >[...] > >No, it's not. > >I suppose bash could say: See your system regex(3) >implementation documentation for the description of extended >regular expression syntax on your system. That syntax should be >compatible with one version or the other of the POSIX Extended >Regular Expression syntax whose specification for the latest >version as of writing can be found at: >http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04 > >regex(3) points to the API (regex.h), how the system documents >the regexps covered by that API is beyond bash knowledge. Exactly, a simple "man regex" (aka regex(3)) points to regex.h here. Few will know how to search for the regex(7) manual file. -- Roger http://rogerx.freeshell.org/
Re: How to match regex in bash? (any character)
2011-09-29, 13:52(-08), Roger: [...] > Since you're saying the regex description is found within either regex(3) or > regex(7), couldn't there be a brief note within the Bash Manual Page be > something > to the effect: [...] No, it's not. I suppose bash could say: See your system regex(3) implementation documentation for the description of extended regular expression syntax on your system. That syntax should be compatible with one version or the other of the POSIX Extended Regular Expression syntax whose specification for the latest version as of writing can be found at: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04 regex(3) points to the API (regex.h), how the system documents the regexps covered by that API is beyond bash knowledge. -- Stephane
Re: How to match regex in bash? (any character)
> On Thu, Sep 29, 2011 at 12:06:08PM -0400, Chet Ramey wrote: >On 9/29/11 11:59 AM, Peng Yu wrote: >> On Thu, Sep 29, 2011 at 10:38 AM, Chet Ramey wrote: >>> On 9/29/11 9:48 AM, Peng Yu wrote: >>> Therefore, either bash manpage should specify clearly which regex manpage it should be in each system (which a bad choice, because there can be a large number of systems), or the bash manpage should omit all the non consistent reference and say something like "see more details in info" or something else that is platform independent. Referring to regex(3) without any quantification is not a very good choice . >>> >>> Why, exactly? regex(3) is the one thing that's portable across systems, >>> it happens to describe the interfaces bash uses, and it contains the >>> appropriate system-specific references. `info' is considerably less >>> portable and widespread than `man', so a man page reference is the best >>> choice. >> >> We all have discovered that regex(3) is not consistent across all the >> platform. Why you say it is portable? > >That is not what we discovered. We discovered that the name and section of >the man page describing regular expression formats differs across systems. >We also discovered that every system we checked (at least every system I >checked) has a regex(3) man page, and that in most cases that page >(regex(3)) eventually contains the appropriate system-specific reference >to the page that describes the regular expression format. That is why it >is the most portable alternative. Since you're saying the regex description is found within either regex(3) or regex(7), couldn't there be a brief note within the Bash Manual Page be something to the effect: "For a description of regex, an unfortunate variable location depending on system, see either regex(3) or regex(7)." Bash Manual Page shows few manual pages within the "See Also" section, while omitting regex(3)/(7) entirely within the "See Also" section. And, to also note, there's only *one* mention of regex (ie. regex(3)) within the Bash Manual Page. Checking the Grep Manual Page, it shows many additional manual pages for "Regular Expressions". (Info overload really according to what Greg has stated about about ERE's. Either the ERE's website URL or regex (7) is all that is needed.) Or, adding a "Regular Expression" title under See Also section and pointing to regex(3), regex(7) or ERE URL? (For my use, I've got regex(7) and the ERE xbd_chap09 on my E-Book reader for reference, with preference towards the xbd_chap09 print-to-pdf HTML page.) As you stated, I'm only seeing "regex(3)" listed within the Bash Manual, leaving the reader to try to ascertain, is the description really within regex(3), or is it a little deeper such as regex(7)? On initial inspection of regex(3) by a beginner, they're going to be overwhelmed as regex(3) deals entirely with the C programming language! A non-programmer will just fail at locating regex(7) as they'll be too overwhelmed. A programmer might realize the regex(3) isn't a real description and will kind of know that they're misled and should/might look at regex(7). Many entry level programmers likely use Bash, before moving on to C/C++, etc. -- Roger http://rogerx.freeshell.org/
Re: How to match regex in bash? (any character)
On 9/29/11 1:46 PM, Greg Wooledge wrote: >An additional binary operator, =~, is available, with the same >precedence as == and !=. When it is used, the string to the >right of the operator is considered an extended regular >expression and matched accordingly (as in regex(3)). The return >value is 0 if the string matches the pattern, and 1 otherwise. >If the regular expression is syntactically incorrect, the >conditional expression's return value is 2. If the shell option >nocasematch is enabled, the match is performed without regard to >the case of alphabetic characters. Any part of the pattern may >be quoted to force it to be matched as a string. > > The last sentence in the quote above. I've changed that line in the current version of the manual page. It now reads: `Any part of the pattern may be quoted to force the quoted portion to be matched as a string.' Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: How to match regex in bash? (any character)
On 9/29/11 12:06 PM, Greg Wooledge wrote: >> As I mentioned previously, the best is to add a few examples in man >> bash. > > I would not object to that, but I can't speak for Chet. As I said, I will add examples to the info manual and some more explanation to the man page. Regular expressions are so fundamental to Unix, though, I would expect familiarity with them as a given. > Another option would be to refer to the POSIX definition of > Extended Regular Expressions as a web site. I wish they had > better URLs, though. The URL I have for it at the moment is > http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04 That's old (issue 6, Posix 2004). The current one is http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09 Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: How to match regex in bash? (any character)
On Thu, Sep 29, 2011 at 11:18:57AM -0500, Peng Yu wrote: > Also, regex(3) does not mention the difference between $x =~ .txt > and $x=~ ".txt". I think that the difference should be addressed > in man bash. It already is. An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)). The return value is 0 if the string matches the pattern, and 1 otherwise. If the regular expression is syntactically incorrect, the conditional expression's return value is 2. If the shell option nocasematch is enabled, the match is performed without regard to the case of alphabetic characters. Any part of the pattern may be quoted to force it to be matched as a string. The last sentence in the quote above. > Bottom line, regex(3) is not a good manpage to refer in the above > sentence. Maybe it's not a good one, but it is the only *possible* one.
Re: How to match regex in bash? (any character)
On 09/29/2011 06:18 PM, Peng Yu wrote: Also, regex(3) does not mention the difference between $x =~ .txt and $x=~ ".txt". I think that the difference should be addressed in man bash. It is in man bash. RR
Re: How to match regex in bash? (any character)
On Thu, Sep 29, 2011 at 11:06 AM, Greg Wooledge wrote: > On Thu, Sep 29, 2011 at 10:59:19AM -0500, Peng Yu wrote: >> We all have discovered that regex(3) is not consistent across all the >> platform. Why you say it is portable? > > The three systems I mentioned earlier today all have regex(3). Which > system have you found, which doesn't have it? I think that I misunderstood some of the previous emails. However, on ubuntu, there is regex(3) and regex(7). Based on the context in man bash, regex(7) is more relevant than regex(3), although regex(3) does mention extend regular expression, it is more of a document for the C interface. "When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3))" Also, regex(3) does not mention the difference between $x =~ .txt and $x=~ ".txt". I think that the difference should be addressed in man bash. Bottom line, regex(3) is not a good manpage to refer in the above sentence. It is better to think of other alternative rather than trying to justify we should stuck with it. >> As I mentioned previously, the best is to add a few examples in man >> bash. > > I would not object to that, but I can't speak for Chet. > > Another option would be to refer to the POSIX definition of > Extended Regular Expressions as a web site. I wish they had > better URLs, though. The URL I have for it at the moment is > http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04 > -- Regards, Peng
Re: How to match regex in bash? (any character)
On Thu, Sep 29, 2011 at 10:59:19AM -0500, Peng Yu wrote: > We all have discovered that regex(3) is not consistent across all the > platform. Why you say it is portable? The three systems I mentioned earlier today all have regex(3). Which system have you found, which doesn't have it? > As I mentioned previously, the best is to add a few examples in man > bash. I would not object to that, but I can't speak for Chet. Another option would be to refer to the POSIX definition of Extended Regular Expressions as a web site. I wish they had better URLs, though. The URL I have for it at the moment is http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04
Re: How to match regex in bash? (any character)
On 9/29/11 11:59 AM, Peng Yu wrote: > On Thu, Sep 29, 2011 at 10:38 AM, Chet Ramey wrote: >> On 9/29/11 9:48 AM, Peng Yu wrote: >> >>> Therefore, either bash manpage should specify clearly which regex >>> manpage it should be in each system (which a bad choice, because there >>> can be a large number of systems), or the bash manpage should omit all >>> the non consistent reference and say something like "see more details >>> in info" or something else that is platform independent. Referring to >>> regex(3) without any quantification is not a very good choice . >> >> Why, exactly? regex(3) is the one thing that's portable across systems, >> it happens to describe the interfaces bash uses, and it contains the >> appropriate system-specific references. `info' is considerably less >> portable and widespread than `man', so a man page reference is the best >> choice. > > We all have discovered that regex(3) is not consistent across all the > platform. Why you say it is portable? That is not what we discovered. We discovered that the name and section of the man page describing regular expression formats differs across systems. We also discovered that every system we checked (at least every system I checked) has a regex(3) man page, and that in most cases that page (regex(3)) eventually contains the appropriate system-specific reference to the page that describes the regular expression format. That is why it is the most portable alternative. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: How to match regex in bash? (any character)
On Thu, Sep 29, 2011 at 10:38 AM, Chet Ramey wrote: > On 9/29/11 9:48 AM, Peng Yu wrote: > >> Therefore, either bash manpage should specify clearly which regex >> manpage it should be in each system (which a bad choice, because there >> can be a large number of systems), or the bash manpage should omit all >> the non consistent reference and say something like "see more details >> in info" or something else that is platform independent. Referring to >> regex(3) without any quantification is not a very good choice . > > Why, exactly? regex(3) is the one thing that's portable across systems, > it happens to describe the interfaces bash uses, and it contains the > appropriate system-specific references. `info' is considerably less > portable and widespread than `man', so a man page reference is the best > choice. We all have discovered that regex(3) is not consistent across all the platform. Why you say it is portable? As I mentioned previously, the best is to add a few examples in man bash. Based on the assumption that you don't want to add an example in man bash, then the next choice to add a reference to info, even though it may not always be available in all the system (but as least it should be downloadable from bash gnu website). -- Regards, Peng
Re: How to match regex in bash? (any character)
On 9/29/11 9:48 AM, Peng Yu wrote: > Therefore, either bash manpage should specify clearly which regex > manpage it should be in each system (which a bad choice, because there > can be a large number of systems), or the bash manpage should omit all > the non consistent reference and say something like "see more details > in info" or something else that is platform independent. Referring to > regex(3) without any quantification is not a very good choice . Why, exactly? regex(3) is the one thing that's portable across systems, it happens to describe the interfaces bash uses, and it contains the appropriate system-specific references. `info' is considerably less portable and widespread than `man', so a man page reference is the best choice. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: How to match regex in bash? (any character)
On Thu, Sep 29, 2011 at 7:22 AM, Greg Wooledge wrote: > On Wed, Sep 28, 2011 at 12:43:01PM -0800, Roger wrote: >> Seems I used 'man regex' as well here. AKA regex(3). But I did >> realize this a few weeks ago; the real regex description being 'man 7 regex'. >> The Bash Manual Page denotes only regex(3). > > You're relatively fortunate that it's *that* easy to find on Linux. On > Linux, regex(3) points directly to regex(7), and you're done. > > On HP-UX, regex(3X) points to regcomp(3C) which points to regexp(5) which > contains the actual definitions. > > On OpenBSD, regex(3) doesn't even *have* a SEE ALSO section; it's a dead > end. And regcomp(3) is the same page as regex(3), so that doesn't help > either. One would have to backtrack entirely, perhaps to grep(1). > However, buried deep in the regex(3) page is a reference to re_format(7) > (not even boldface). And re_format(7) has the definitions, but getting > there takes perseverance. (For the record, grep(1) does point straight > to re_format(7).) > > So you see, bash(1) *cannot* just link directly to regex(7), because > that's not actually the correct final destination on most operating > systems. It's only correct on Linux. Bash uses the regex(3) library > interface, so that is the correct place for bash to refer the reader. Therefore, either bash manpage should specify clearly which regex manpage it should be in each system (which a bad choice, because there can be a large number of systems), or the bash manpage should omit all the non consistent reference and say something like "see more details in info" or something else that is platform independent. Referring to regex(3) without any quantification is not a very good choice . -- Regards, Peng
Re: How to match regex in bash? (any character)
On Wed, Sep 28, 2011 at 12:43:01PM -0800, Roger wrote: > Seems I used 'man regex' as well here. AKA regex(3). But I did > realize this a few weeks ago; the real regex description being 'man 7 regex'. > The Bash Manual Page denotes only regex(3). You're relatively fortunate that it's *that* easy to find on Linux. On Linux, regex(3) points directly to regex(7), and you're done. On HP-UX, regex(3X) points to regcomp(3C) which points to regexp(5) which contains the actual definitions. On OpenBSD, regex(3) doesn't even *have* a SEE ALSO section; it's a dead end. And regcomp(3) is the same page as regex(3), so that doesn't help either. One would have to backtrack entirely, perhaps to grep(1). However, buried deep in the regex(3) page is a reference to re_format(7) (not even boldface). And re_format(7) has the definitions, but getting there takes perseverance. (For the record, grep(1) does point straight to re_format(7).) So you see, bash(1) *cannot* just link directly to regex(7), because that's not actually the correct final destination on most operating systems. It's only correct on Linux. Bash uses the regex(3) library interface, so that is the correct place for bash to refer the reader.
Re: How to match regex in bash? (any character)
> Seems I used 'man regex' as well here. AKA regex(3). But I did > realize this a few weeks ago; the real regex description being 'man 7 regex'. > The Bash Manual Page denotes only regex(3). Not all the world is Linux. The regex(3) reference is the only one that is consistent across different operating systems. In addition to the ones I mentioned above, Solaris and HP-UX use regexp(5), for example. I think the texinfo manual could stand a few more examples, especially for the =~ operator. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: How to match regex in bash? (any character)
> On Tue, Sep 27, 2011 at 07:58:50PM -0500, Peng Yu wrote: >On Tue, Sep 27, 2011 at 6:51 PM, Chet Ramey wrote: >> On 9/27/11 6:41 PM, Roger wrote: >> >>> Correct. After reading the entire Bash Manual page, I didn't see much >>> mention >>> of documentation resources (of ERE) besides maybe something about egrep from >>> Bash's Manual Page or elsewhere on the web. After extensive research for >>> regex/regexpr, only found Perl Manual Pages. >>> >>> Might be worth mentioning a link or good reference for this ERE within the >>> Bash >>> Manual (Page)? >> >> The bash man page refers to regex(3). On my BSD (Mac OS X) system, that >> refers to re_format(7), which documents the BRE and ERE regular expression >> formats. On an Ubuntu box, to choose a representative Linux example, that >> refers to regex(7), which contains the same explanation, and the GNU regex >> manual. This sort of "chained" man page reference is common. >> >> If you like info, `info regex' on a Linux box should display both pages. > >Since regex(7) is actually what should be referred on ubuntu, and >there is indeed a manpage regex(3) on ubuntu, the difference on which >regex man page should be specify in man bash. I was looking at >regex(3) on my ubuntu, which doesn't have any relevant information. Ditto. Seems I used 'man regex' as well here. AKA regex(3). But I did realize this a few weeks ago; the real regex description being 'man 7 regex'. The Bash Manual Page denotes only regex(3). >Also, adding a few more examples just cost a few extra lines, I don't >think that the manpage should be so frugal in terms of adding examples >to elucidate important concepts. Ditto. As to why I'm suggesting one or two examples for Bash Parameter Expansion. ;-) However, I think the best examples for Parameter Expansion is code with a sample text next to the code of what text will look like after it's passed through the code. ie. Greg's Wiki Bash FAQ - Parameter Expansion. -- Roger http://rogerx.freeshell.org/
Re: How to match regex in bash? (any character)
On Tue, Sep 27, 2011 at 6:51 PM, Chet Ramey wrote: > On 9/27/11 6:41 PM, Roger wrote: > >> Correct. After reading the entire Bash Manual page, I didn't see much >> mention >> of documentation resources (of ERE) besides maybe something about egrep from >> Bash's Manual Page or elsewhere on the web. After extensive research for >> regex/regexpr, only found Perl Manual Pages. >> >> Might be worth mentioning a link or good reference for this ERE within the >> Bash >> Manual (Page)? > > The bash man page refers to regex(3). On my BSD (Mac OS X) system, that > refers to re_format(7), which documents the BRE and ERE regular expression > formats. On an Ubuntu box, to choose a representative Linux example, that > refers to regex(7), which contains the same explanation, and the GNU regex > manual. This sort of "chained" man page reference is common. > > If you like info, `info regex' on a Linux box should display both pages. Since regex(7) is actually what should be referred on ubuntu, and there is indeed a manpage regex(3) on ubuntu, the difference on which regex man page should be specify in man bash. I was looking at regex(3) on my ubuntu, which doesn't have any relevant information. Also, adding a few more examples just cost a few extra lines, I don't think that the manpage should be so frugal in terms of adding examples to elucidate important concepts. -- Regards, Peng
Re: How to match regex in bash? (any character)
On 9/27/11 6:41 PM, Roger wrote: > Correct. After reading the entire Bash Manual page, I didn't see much mention > of documentation resources (of ERE) besides maybe something about egrep from > Bash's Manual Page or elsewhere on the web. After extensive research for > regex/regexpr, only found Perl Manual Pages. > > Might be worth mentioning a link or good reference for this ERE within the > Bash > Manual (Page)? The bash man page refers to regex(3). On my BSD (Mac OS X) system, that refers to re_format(7), which documents the BRE and ERE regular expression formats. On an Ubuntu box, to choose a representative Linux example, that refers to regex(7), which contains the same explanation, and the GNU regex manual. This sort of "chained" man page reference is common. If you like info, `info regex' on a Linux box should display both pages. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
Re: How to match regex in bash? (any character)
> On Tue, Sep 27, 2011 at 08:15:09AM -0400, Greg Wooledge wrote: >On Mon, Sep 26, 2011 at 07:06:30PM -0800, Roger wrote: >> Some good reading I found is under the Bash Manual Page section "Parameter >> Expansion". >> >> From here, to learn more about regex/regexpr as the Bash Manual is quite >> brief >> on regex, use the following manual pages: >> >> perlretut - Gives a good from the start explanation of regular expressions, >> including perl > >Perl's regular expressions are not the same as Bash's. Bash uses standard >POSIX Extended Regular Expressions (ERE). You can find formal documentation >at >http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04 > >Or see "man egrep", as egrep (or grep -E) also uses EREs. Or see any >web page that discusses EREs. > >Avoid reading documentation from a different language (in this case Perl), >because the features tend to change. Perl uses a feature set and syntax >that have been retroactively dubbed Perl Compatible Regular Expressions >(PCRE). They're superficially similar to EREs, but have a much broader >range of features (extensions) that are not compatible and will not work >in Bash. Correct. After reading the entire Bash Manual page, I didn't see much mention of documentation resources (of ERE) besides maybe something about egrep from Bash's Manual Page or elsewhere on the web. After extensive research for regex/regexpr, only found Perl Manual Pages. Might be worth mentioning a link or good reference for this ERE within the Bash Manual (Page)? -- Roger http://rogerx.freeshell.org/
Re: How to match regex in bash? (any character)
On Mon, Sep 26, 2011 at 07:06:30PM -0800, Roger wrote: > Some good reading I found is under the Bash Manual Page section "Parameter > Expansion". > > From here, to learn more about regex/regexpr as the Bash Manual is quite brief > on regex, use the following manual pages: > > perlretut - Gives a good from the start explanation of regular expressions, > including perl Perl's regular expressions are not the same as Bash's. Bash uses standard POSIX Extended Regular Expressions (ERE). You can find formal documentation at http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04 Or see "man egrep", as egrep (or grep -E) also uses EREs. Or see any web page that discusses EREs. Avoid reading documentation from a different language (in this case Perl), because the features tend to change. Perl uses a feature set and syntax that have been retroactively dubbed Perl Compatible Regular Expressions (PCRE). They're superficially similar to EREs, but have a much broader range of features (extensions) that are not compatible and will not work in Bash.
Re: How to match regex in bash? (any character)
> On Mon, Sep 26, 2011 at 09:37:07PM -0500, Dennis Williamson wrote: >On Mon, Sep 26, 2011 at 8:19 PM, Peng Yu wrote: >> Hi, >> >> I know that I should use =~ to match regex (bash version 4). >> >> However, the man page is not very clear. I don't find how to match >> (matching any single character). For example, the following regex >> doesn't match txt. Does anybody know how to match any character >> (should be '.' in perl) in bash. >> >> [[ "$1" =~ "xxx.txt" ]] >> > >When you quote the string on the right hand side of =~ it changes to a >simple string match instead of a regex match. It is sometimes >difficult to specify a regex literally (and unquoted), so it's best to >use a variable as shown in Steven's reply to you. > I believe the Bash Manual also strongly suggests, using variables for matching as well. ;-) -- Roger http://rogerx.freeshell.org/
Re: How to match regex in bash? (any character)
> On Mon, Sep 26, 2011 at 08:19:27PM -0500, Peng Yu wrote: >Hi, > >I know that I should use =~ to match regex (bash version 4). > >However, the man page is not very clear. I don't find how to match >(matching any single character). For example, the following regex >doesn't match txt. Does anybody know how to match any character >(should be '.' in perl) in bash. > >[[ "$1" =~ "xxx.txt" ]] Some good reading I found is under the Bash Manual Page section "Parameter Expansion". >From here, to learn more about regex/regexpr as the Bash Manual is quite brief on regex, use the following manual pages: perlretut - Gives a good from the start explanation of regular expressions, including perl perlrequick - If you already know some perl, then just a quick start should do. There's a lot of Perl Manual Pages and 'man perltoc' will get you a full listing of manual pages including descriptions. (I'm currently reading the perlretut man page as I do not know much perl language at all!) Is it possible to get more documentation or examples into the Bash Manual concerning regex. Maybe some references to the above manual pages or are we talking severe conflict of interest? At the very least, one or two common Bash Parameter examples would be nice! -- Roger http://rogerx.freeshell.org/
Re: How to match regex in bash? (any character)
On Mon, Sep 26, 2011 at 9:49 PM, John Reiser wrote: > Peng Yu wrote: >> I know that I should use =~ to match regex (bash version 4). >> >> However, the man page is not very clear. I don't find how to match >> (matching any single character). For example, the following regex >> doesn't match txt. Does anybody know how to match any character >> (should be '.' in perl) in bash. >> >> [[ "$1" =~ "xxx.txt" ]] > > The manual page for bash says that the rules of regex(3) apply: > > An additional binary operator, =~, is available, with the > same > precedence as == and !=. When it is used, the string to the > right > of the operator is considered an extended regular expression > and > matched accordingly (as in regex(3)). The return value is 0 if > the > string matches the pattern, and 1 otherwise. > and also: > Any part of the pattern may be quoted to force it to be matched > as a > string. > > Thus in the expression [[ "$1" =~ "xxx.txt" ]] the fact that the pattern > is quoted [here the whole pattern appears within double quotes] has turned the > dot '.' into a plain literal character, instead of a meta-character which > matches > any single character. > > The usual method of avoiding quotes in the pattern is to omit them: > [[ $1 =~ xxx.txt ]] # the dot '.' in the pattern is a > meta-character > or to use a variable: > pattern="xxx.txt" # a 7-character string > [[ $1 =~ $pattern ]] # the dot '.' in $pattern is a > meta-character > Example: using all literals in an instance of bash: > $ [[ txt =~ xxx.txt ]] && echo true > true > $ > > Also notice that quotes are not needed around the left-hand side $1 : > Word > split‐ > ting and pathname expansion are not performed on the words > between > the [[ and ]] ... > > Thus there is no need to use quotation marks to suppress word splitting > inside double brackets [[ ... ]]. Thanks for the clarifications of all the replies. Now the manual makes much more sense to me. -- Regards, Peng
Re: How to match regex in bash? (any character)
Peng Yu wrote: > I know that I should use =~ to match regex (bash version 4). > > However, the man page is not very clear. I don't find how to match > (matching any single character). For example, the following regex > doesn't match txt. Does anybody know how to match any character > (should be '.' in perl) in bash. > > [[ "$1" =~ "xxx.txt" ]] The manual page for bash says that the rules of regex(3) apply: An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)). The return value is 0 if the string matches the pattern, and 1 otherwise. and also: Any part of the pattern may be quoted to force it to be matched as a string. Thus in the expression [[ "$1" =~ "xxx.txt" ]] the fact that the pattern is quoted [here the whole pattern appears within double quotes] has turned the dot '.' into a plain literal character, instead of a meta-character which matches any single character. The usual method of avoiding quotes in the pattern is to omit them: [[ $1 =~ xxx.txt ]] # the dot '.' in the pattern is a meta-character or to use a variable: pattern="xxx.txt" # a 7-character string [[ $1 =~ $pattern ]] # the dot '.' in $pattern is a meta-character Example: using all literals in an instance of bash: $ [[ txt =~ xxx.txt ]] && echo true true $ Also notice that quotes are not needed around the left-hand side $1 : Word split‐ ting and pathname expansion are not performed on the words between the [[ and ]] ... Thus there is no need to use quotation marks to suppress word splitting inside double brackets [[ ... ]]. --
Re: How to match regex in bash? (any character)
On Mon, Sep 26, 2011 at 8:19 PM, Peng Yu wrote: > Hi, > > I know that I should use =~ to match regex (bash version 4). > > However, the man page is not very clear. I don't find how to match > (matching any single character). For example, the following regex > doesn't match txt. Does anybody know how to match any character > (should be '.' in perl) in bash. > > [[ "$1" =~ "xxx.txt" ]] > > > -- > Regards, > Peng > > When you quote the string on the right hand side of =~ it changes to a simple string match instead of a regex match. It is sometimes difficult to specify a regex literally (and unquoted), so it's best to use a variable as shown in Steven's reply to you. The quoting is most likely unnecessary on the left hand side as well. -- Visit serverfault.com to get your system administration questions answered.
Re: How to match regex in bash? (any character)
On 9/26/2011 9:19 PM, Peng Yu wrote: Hi, I know that I should use =~ to match regex (bash version 4). However, the man page is not very clear. I don't find how to match (matching any single character). For example, the following regex doesn't match txt. Does anybody know how to match any character (should be '.' in perl) in bash. [[ "$1" =~ "xxx.txt" ]] Looks good to me. 513 > regex='xxx.txt' 514 > [[ txt =~ $regex ]] 515 > echo $? 0 516 > -- Time flies like the wind. Fruit flies like a banana. Stranger things have .0. happened but none stranger than this. Does your driver's license say Organ ..0 Donor?Black holes are where God divided by zero. Listen to me! We are all- 000 individuals! What if this weren't a hypothetical question? steveo at syslang.net
How to match regex in bash? (any character)
Hi, I know that I should use =~ to match regex (bash version 4). However, the man page is not very clear. I don't find how to match (matching any single character). For example, the following regex doesn't match txt. Does anybody know how to match any character (should be '.' in perl) in bash. [[ "$1" =~ "xxx.txt" ]] -- Regards, Peng