Re: Riddle me this: grep / regx experts
Subject: Re: Riddle me this: grep / regx experts Allegedly, on or about 2 February 2018, R. G. Newbury sent: I am cleaning up some html code, using sed to standardize the formatting. I was searching for specific instances of code to amend using grep. In case you're not aware of it, there's a HTML tidy command that neatens up HTML. I am using sed basically for search and replace. I already use tidy, but it does not deal with my problem which is that the text has multiple variations in the *text* formatting in the many different files. This screws up the parsing and requires normalization. Tidy does not touch that. Unfortunately. Geoff R. Geoffrey Newbury 954 Owenwood Drive Mississauga, Ontario, L5H 3J2 t905-271-9600 newb...@mandamus.org ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org
Re: Riddle me this: grep / regx experts
On Fri, 2018-02-02 at 12:32 -0500, R. G. Newbury wrote: > > Thanks to all for the quick responses. I *tried* to RTFM but that was > > not clear, even on a re-read. I took [0-9]* as multiple instances of > [0-9] but NOT zero instances.. From 'man grep': Repetition A regular expression may be followed by one of several repetition operators: ? The preceding item is optional and matched at most once. * The preceding item will be matched zero or more times. + The preceding item will be matched one or more times. {n}The preceding item is matched exactly n times. {n,} The preceding item is matched n or more times. {,m} The preceding item is matched at most m times. This is a GNU extension. {n,m} The preceding item is matched at least n times, but not more than m times. poc ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org
Re: Riddle me this: grep / regx experts
Allegedly, on or about 2 February 2018, R. G. Newbury sent: > I am cleaning up some html code, using sed to standardize the > formatting. I was searching for specific instances of code to amend > using grep. In case you're not aware of it, there's a HTML tidy command that neatens up HTML. dnf install tidy -- [tim@localhost ~]$ uname -rsvp Linux 4.14.14-200.fc26.x86_64 #1 SMP Fri Jan 19 13:27:06 UTC 2018 x86_64 Boilerplate: All mail to my mailbox is automatically deleted. There is no point trying to privately email me, I only get to see the messages posted to the mailing list. ZNQR LBH YBBX! ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org
Re: Riddle me this: grep / regx experts
On Fri, Feb 02, 2018 at 11:04:01AM -0500, R. G. Newbury wrote: A bug in regx handling??? I am cleaning up some html code . # grep -h '[0-9]*s[0-9]*">' temp >> Returns the example line with the 's[0-9]">' highlighted. Can anyone explain what is happening?. This isn't politics so the group [0-9] should not equal [0-9"#]. Or even [0-9\"\#]. . Fri, 2 Feb 2018 10:14:37 -0600 From: Chris Adams A * in a regex is "0 or more of the previous", so basically you are just matching 's[0-9]*">' (because there will always be at least 0 of the [0-9] part at the start). If you really mean "1 or more", you can use an extended regex (the -E argument to grep/sed) and use + instead of *, so '[0-9]+s[0-9]*">'. Fri, 02 Feb 2018 16:15:37 + From: Patrick O'Callaghan In grep, * matches any number of instances, including 0. You want to use + rather than * to guarantee at least one digit. Date: Fri, 2 Feb 2018 11:26:02 -0500 > From: Jon LaBadie You are misunderstanding the "*". It means any sequence of the associated character including a ZERO length sequence. So [0-9]*s matches "s (actually just the s) as is is a zero length sequence of digits followed by an s. When you grep for [0-9]s, there must be at least one digit before the s (but any extra digits are not part of the match). Sometimes the sequence [0-9][0-9]*s is useful to say "one or more digits before the s". jl Thanks to all for the quick responses. I *tried* to RTFM but that was not clear, even on a re-read. I took [0-9]* as multiple instances of [0-9] but NOT zero instances.. Geoff ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org
Re: Riddle me this: grep / regx experts
On Fri, Feb 02, 2018 at 11:04:01AM -0500, R. G. Newbury wrote: > A bug in regx handling??? > > I am cleaning up some html code, using sed to standardize the formatting. I > was searching for specific instances of code to amend using grep. > I was looking for instances like > Example text in a file: ( here named, quite originally, temp ) > 8. > > And # grep -h '[0-9]s[0-9]*">' temp > Returns nothing (which is the expected result: there are no [0-9]s[0-9}"> > instances. > > BUT!!! > # grep -h '[0-9]*s[0-9]*">' temp > Returns the example line with the 's[0-9]">' highlighted. > > Note that the character before the 's' is either " or # > > Can anyone explain what is happening?. This isn't politics so the group > [0-9] should not equal [0-9"#]. Or even [0-9\"\#]. You are misunderstanding the "*". It means any sequence of the associated character including a ZERO length sequence. So [0-9]*s matches "s (actually just the s) as is is a zero length sequence of digits followed by an s. When you grep for [0-9]s, there must be at least one digit before the s (but any extra digits are not part of the match). Sometimes the sequence [0-9][0-9]*s is useful to say "one or more digits before the s". jl -- Jon H. LaBadie jo...@jgcomp.com ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org
Re: Riddle me this: grep / regx experts
On Fri, 2018-02-02 at 11:04 -0500, R. G. Newbury wrote: > # grep -h '[0-9]*s[0-9]*">' temp > Returns the example line with the 's[0-9]">' highlighted. In grep, * matches any number of instances, including 0. You want to use + rather than * to guarantee at least one digit. poc ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org
Re: Riddle me this: grep / regx experts
Once upon a time, R. G. Newbury said: > # grep -h '[0-9]*s[0-9]*">' temp > Returns the example line with the 's[0-9]">' highlighted. A * in a regex is "0 or more of the previous", so basically you are just matching 's[0-9]*">' (because there will always be at least 0 of the [0-9] part at the start). If you really mean "1 or more", you can use an extended regex (the -E argument to grep/sed) and use + instead of *, so '[0-9]+s[0-9]*">'. -- Chris Adams ___ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-le...@lists.fedoraproject.org