Re: [Tutor] regexp: a bit lost
On 10/1/10, Steven D'Aprano wrote: > On Sat, 2 Oct 2010 01:14:27 am Alex Hall wrote: >> >> Here is my test: >> >> s=re.search(r"[\d+\s+\d+\s+\d]", l) >> > >> > Try this instead: >> > >> > re.search(r'\d+\s+\D*\d+\s+\d', l) > [...] >> Understood. My intent was to ask why my regexp would match anything >> at all. > > Square brackets create a character set, so your regex tests for a string > that contains a single character matching a digit (\d), a plus sign (+) > or a whitespace character (\s). The additional \d + \s in the square > brackets are redundant and don't add anything. Oh, that explains it then. :) Thanks. > > -- > Steven D'Aprano > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > -- Have a great day, Alex (msg sent from GMail website) mehg...@gmail.com; http://www.facebook.com/mehgcap ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regexp: a bit lost
On Sat, 2 Oct 2010 01:14:27 am Alex Hall wrote: > >> Here is my test: > >> s=re.search(r"[\d+\s+\d+\s+\d]", l) > > > > Try this instead: > > > > re.search(r'\d+\s+\D*\d+\s+\d', l) [...] > Understood. My intent was to ask why my regexp would match anything > at all. Square brackets create a character set, so your regex tests for a string that contains a single character matching a digit (\d), a plus sign (+) or a whitespace character (\s). The additional \d + \s in the square brackets are redundant and don't add anything. -- Steven D'Aprano ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regexp: a bit lost
On 10/1/10, Steven D'Aprano wrote: > On Fri, 1 Oct 2010 12:45:38 pm Alex Hall wrote: >> Hi, once again... >> I have a regexp that I am trying to use to make sure a line matches >> the format: [c*]n [c*]n n >> where c* is (optionally) 0 or more non-numeric characters and n is >> any numeric character. The spacing should not matter. These should >> pass: v1 v2 5 >> 2 someword7 3 >> >> while these should not: >> word 2 3 >> 1 2 >> >> Here is my test: >> s=re.search(r"[\d+\s+\d+\s+\d]", l) > > Try this instead: > > re.search(r'\d+\s+\D*\d+\s+\d', l) > > This searches for: > one or more digits > at least one whitespace char (space, tab, etc) > zero or more non-digits > at least one digit > at least one whitespace > exactly one digit Makes sense. > > >> However: >> 1. this seems to pass with *any* string, even when l is a single >> character. This causes many problems > [...] > > I'm sure it does. > > You don't have to convince us that if the regular expression is broken, > the rest of your code has a problem. That's a given. It's enough to > know that the regex doesn't do what you need it to do. Understood. My intent was to ask why my regexp would match anything at all. > > >> 3. Once I get the above working, I will need a way of pulling the >> characters out of the string and sticking them somewhere. For >> example, if the string were >> v9 v10 15 >> I would want an array: >> n=[9, 10, 15] > > > Modify the regex to be this: > > r'(\d+)\s+\D*(\d+)\s+(\d)' > > and then query the groups of the match object that is returned: > mo = re.search(r'(\d+)\s+\D*(\d+)\s+(\d)', 'spam42 eggs239') mo.groups() > ('42', '23', '9') > > Don't forget that mo will be None if the regex doesn't match, and don't > forget that the items returned are strings. Alright that worked perfectly, after a lot of calls to int()! I also finally understand what a group is and, at a basic level, how to use it. I have wondered how to extract matched text from a string for a long time, and this has finally answered that. Thanks! > > > > -- > Steven D'Aprano > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > -- Have a great day, Alex (msg sent from GMail website) mehg...@gmail.com; http://www.facebook.com/mehgcap ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regexp: a bit lost
with coffee: yes = """ v1 v2 5 2 someword7 3 """.splitlines()[1:] no = """ word 2 3 1 2 """.splitlines()[1:] import re pattern = "(\w*\d\s+?)(\w*\d\s+?)(\d)$" rx = re.compile(pattern) for line in yes: m = rx.match(line) assert m print([part.rstrip() for part in m.groups()]) for line in no: m = rx.match(line) assert not m ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regexp: a bit lost
On Fri, 1 Oct 2010 12:45:38 pm Alex Hall wrote: > Hi, once again... > I have a regexp that I am trying to use to make sure a line matches > the format: [c*]n [c*]n n > where c* is (optionally) 0 or more non-numeric characters and n is > any numeric character. The spacing should not matter. These should > pass: v1 v2 5 > 2 someword7 3 > > while these should not: > word 2 3 > 1 2 > > Here is my test: > s=re.search(r"[\d+\s+\d+\s+\d]", l) Try this instead: re.search(r'\d+\s+\D*\d+\s+\d', l) This searches for: one or more digits at least one whitespace char (space, tab, etc) zero or more non-digits at least one digit at least one whitespace exactly one digit > However: > 1. this seems to pass with *any* string, even when l is a single > character. This causes many problems [...] I'm sure it does. You don't have to convince us that if the regular expression is broken, the rest of your code has a problem. That's a given. It's enough to know that the regex doesn't do what you need it to do. > 3. Once I get the above working, I will need a way of pulling the > characters out of the string and sticking them somewhere. For > example, if the string were > v9 v10 15 > I would want an array: > n=[9, 10, 15] Modify the regex to be this: r'(\d+)\s+\D*(\d+)\s+(\d)' and then query the groups of the match object that is returned: >>> mo = re.search(r'(\d+)\s+\D*(\d+)\s+(\d)', 'spam42 eggs239') >>> mo.groups() ('42', '23', '9') Don't forget that mo will be None if the regex doesn't match, and don't forget that the items returned are strings. -- Steven D'Aprano ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regexp: a bit lost
Alex Hall wrote: Hi, once again... I have a regexp that I am trying to use to make sure a line matches the format: [c*]n [c*]n n where c* is (optionally) 0 or more non-numeric characters and n is any numeric character. The spacing should not matter. These should pass: v1 v2 5 2 someword7 3 while these should not: word 2 3 1 2 Here is my test: s=re.search(r"[\d+\s+\d+\s+\d]", l) if s: #do stuff However: 1. this seems to pass with *any* string, even when l is a single character. This causes many problems and cannot happen since I have to [...] You want to match a whole line, so you should use re.match not re.search. See the docs: http://docs.python.org/library/re.html#matching-vs-searching You can also use re.split in this case: yes = """ v1 v2 5 2 someword7 3 """.splitlines() yes = [line for line in yes if line.strip()] import re pattern = "(\w*\d\s+?)" # there may be a better pattern than this rx = re.compile(pattern) for line in yes: print [part for part in rx.split(line) if part] ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] regexp: a bit lost
Hi, once again... I have a regexp that I am trying to use to make sure a line matches the format: [c*]n [c*]n n where c* is (optionally) 0 or more non-numeric characters and n is any numeric character. The spacing should not matter. These should pass: v1 v2 5 2 someword7 3 while these should not: word 2 3 1 2 Here is my test: s=re.search(r"[\d+\s+\d+\s+\d]", l) if s: #do stuff However: 1. this seems to pass with *any* string, even when l is a single character. This causes many problems and cannot happen since I have to ignore any strings not formatted as described above. So if I have for a in b: s=re.search(r"[\d+\s+\d+\s+\d]", l) if s: c.append(a) then c will have every string in b, even if the string being examined looks nothing like the pattern I am after. 2. How would I make my regexp able to match 0-n characters? I know to use \D*, but I am not sure about brackets or parentheses for putting the \D* into the parent expression (the \d\s one). 3. Once I get the above working, I will need a way of pulling the characters out of the string and sticking them somewhere. For example, if the string were v9 v10 15 I would want an array: n=[9, 10, 15] but the array would be created from a regexp. This has to be possible, but none of the manuals or tutorials on regexp say just how this is done. Mentions are made of groups, but nothing explicit (to me at least). -- Have a great day, Alex (msg sent from GMail website) mehg...@gmail.com; http://www.facebook.com/mehgcap ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor