Re: [Tutor] regexp: a bit lost

2010-10-01 Thread Alex Hall
On 10/1/10, Steven D'Aprano  wrote:
> On Sat, 2 Oct 2010 01:14:27 am Alex Hall wrote:
>> >> Here is my test:
>> >> s=re.search(r"[\d+\s+\d+\s+\d]", l)
>> >
>> > Try this instead:
>> >
>> > re.search(r'\d+\s+\D*\d+\s+\d', l)
> [...]
>> Understood. My intent was to ask why my regexp would match anything
>> at all.
>
> Square brackets create a character set, so your regex tests for a string
> that contains a single character matching a digit (\d), a plus sign (+)
> or a whitespace character (\s). The additional \d + \s in the square
> brackets are redundant and don't add anything.
Oh, that explains it then. :) Thanks.
>
> --
> Steven D'Aprano
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>


-- 
Have a great day,
Alex (msg sent from GMail website)
mehg...@gmail.com; http://www.facebook.com/mehgcap
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regexp: a bit lost

2010-10-01 Thread Steven D'Aprano
On Sat, 2 Oct 2010 01:14:27 am Alex Hall wrote:
> >> Here is my test:
> >> s=re.search(r"[\d+\s+\d+\s+\d]", l)
> >
> > Try this instead:
> >
> > re.search(r'\d+\s+\D*\d+\s+\d', l)
[...]
> Understood. My intent was to ask why my regexp would match anything
> at all.

Square brackets create a character set, so your regex tests for a string 
that contains a single character matching a digit (\d), a plus sign (+) 
or a whitespace character (\s). The additional \d + \s in the square 
brackets are redundant and don't add anything.

-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regexp: a bit lost

2010-10-01 Thread Alex Hall
On 10/1/10, Steven D'Aprano  wrote:
> On Fri, 1 Oct 2010 12:45:38 pm Alex Hall wrote:
>> Hi, once again...
>> I have a regexp that I am trying to use to make sure a line matches
>> the format: [c*]n [c*]n n
>> where c* is (optionally) 0 or more non-numeric characters and n is
>> any numeric character. The spacing should not matter. These should
>> pass: v1 v2   5
>> 2 someword7 3
>>
>> while these should not:
>> word 2  3
>> 1 2
>>
>> Here is my test:
>> s=re.search(r"[\d+\s+\d+\s+\d]", l)
>
> Try this instead:
>
> re.search(r'\d+\s+\D*\d+\s+\d', l)
>
> This searches for:
> one or more digits
> at least one whitespace char (space, tab, etc)
> zero or more non-digits
> at least one digit
> at least one whitespace
> exactly one digit
Makes sense.
>
>
>> However:
>> 1. this seems to pass with *any* string, even when l is a single
>> character. This causes many problems
> [...]
>
> I'm sure it does.
>
> You don't have to convince us that if the regular expression is broken,
> the rest of your code has a problem. That's a given. It's enough to
> know that the regex doesn't do what you need it to do.
Understood. My intent was to ask why my regexp would match anything at all.
>
>
>> 3. Once I get the above working, I will need a way of pulling the
>> characters out of the string and sticking them somewhere. For
>> example, if the string were
>> v9 v10 15
>> I would want an array:
>> n=[9, 10, 15]
>
>
> Modify the regex to be this:
>
> r'(\d+)\s+\D*(\d+)\s+(\d)'
>
> and then query the groups of the match object that is returned:
>
 mo = re.search(r'(\d+)\s+\D*(\d+)\s+(\d)', 'spam42   eggs239')
 mo.groups()
> ('42', '23', '9')
>
> Don't forget that mo will be None if the regex doesn't match, and don't
> forget that the items returned are strings.
Alright that worked perfectly, after a lot of calls to int()! I also
finally understand what a group is and, at a basic level, how to use
it. I have wondered how to extract matched text from a string for a
long time, and this has finally answered that. Thanks!
>
>
>
> --
> Steven D'Aprano
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>


-- 
Have a great day,
Alex (msg sent from GMail website)
mehg...@gmail.com; http://www.facebook.com/mehgcap
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regexp: a bit lost

2010-09-30 Thread Gerard Flanagan

with coffee:



yes = """
v1 v2   5
2 someword7 3
""".splitlines()[1:]

no = """
word 2 3
1 2
""".splitlines()[1:]

import re

pattern = "(\w*\d\s+?)(\w*\d\s+?)(\d)$"
rx = re.compile(pattern)

for line in yes:
m = rx.match(line)
assert m
print([part.rstrip() for part in m.groups()])

for line in no:
m = rx.match(line)
assert not m




___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regexp: a bit lost

2010-09-30 Thread Steven D'Aprano
On Fri, 1 Oct 2010 12:45:38 pm Alex Hall wrote:
> Hi, once again...
> I have a regexp that I am trying to use to make sure a line matches
> the format: [c*]n [c*]n n
> where c* is (optionally) 0 or more non-numeric characters and n is
> any numeric character. The spacing should not matter. These should
> pass: v1 v2   5
> 2 someword7 3
>
> while these should not:
> word 2  3
> 1 2
>
> Here is my test:
> s=re.search(r"[\d+\s+\d+\s+\d]", l)

Try this instead:

re.search(r'\d+\s+\D*\d+\s+\d', l)

This searches for:
one or more digits
at least one whitespace char (space, tab, etc)
zero or more non-digits
at least one digit
at least one whitespace
exactly one digit


> However:
> 1. this seems to pass with *any* string, even when l is a single
> character. This causes many problems
[...]

I'm sure it does.

You don't have to convince us that if the regular expression is broken, 
the rest of your code has a problem. That's a given. It's enough to 
know that the regex doesn't do what you need it to do.


> 3. Once I get the above working, I will need a way of pulling the
> characters out of the string and sticking them somewhere. For
> example, if the string were
> v9 v10 15
> I would want an array:
> n=[9, 10, 15]


Modify the regex to be this:

r'(\d+)\s+\D*(\d+)\s+(\d)'

and then query the groups of the match object that is returned:

>>> mo = re.search(r'(\d+)\s+\D*(\d+)\s+(\d)', 'spam42   eggs239')
>>> mo.groups()
('42', '23', '9')

Don't forget that mo will be None if the regex doesn't match, and don't 
forget that the items returned are strings.



-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regexp: a bit lost

2010-09-30 Thread Gerard Flanagan

Alex Hall wrote:

Hi, once again...
I have a regexp that I am trying to use to make sure a line matches the format:
[c*]n [c*]n n
where c* is (optionally) 0 or more non-numeric characters and n is any
numeric character. The spacing should not matter. These should pass:
v1 v2   5
2 someword7 3

while these should not:
word 2  3
1 2

Here is my test:
s=re.search(r"[\d+\s+\d+\s+\d]", l)
if s: #do stuff

However:
1. this seems to pass with *any* string, even when l is a single
character. This causes many problems and cannot happen since I have to

[...]

You want to match a whole line, so you should use re.match not 
re.search. See the docs:


http://docs.python.org/library/re.html#matching-vs-searching


You can also use re.split in this case:

yes = """
v1 v2   5
2 someword7 3
""".splitlines()
yes = [line for line in yes if line.strip()]

import re

pattern = "(\w*\d\s+?)" # there may be a better pattern than this
rx = re.compile(pattern)

for line in yes:
print [part for part in rx.split(line) if part]


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] regexp: a bit lost

2010-09-30 Thread Alex Hall
Hi, once again...
I have a regexp that I am trying to use to make sure a line matches the format:
[c*]n [c*]n n
where c* is (optionally) 0 or more non-numeric characters and n is any
numeric character. The spacing should not matter. These should pass:
v1 v2   5
2 someword7 3

while these should not:
word 2  3
1 2

Here is my test:
s=re.search(r"[\d+\s+\d+\s+\d]", l)
if s: #do stuff

However:
1. this seems to pass with *any* string, even when l is a single
character. This causes many problems and cannot happen since I have to
ignore any strings not formatted as described above. So if I have
for a in b:
  s=re.search(r"[\d+\s+\d+\s+\d]", l)
  if s: c.append(a)

then c will have every string in b, even if the string being examined
looks nothing like the pattern I am after.

2. How would I make my regexp able to match 0-n characters? I know to
use \D*, but I am not sure about brackets or parentheses for putting
the \D* into the parent expression (the \d\s one).

3. Once I get the above working, I will need a way of pulling the
characters out of the string and sticking them somewhere. For example,
if the string were
v9 v10 15
I would want an array:
n=[9, 10, 15]
but the array would be created from a regexp. This has to be possible,
but none of the manuals or tutorials on regexp say just how this is
done. Mentions are made of groups, but nothing explicit (to me at
least).

-- 
Have a great day,
Alex (msg sent from GMail website)
mehg...@gmail.com; http://www.facebook.com/mehgcap
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor