Re: regex help

2015-03-13 Thread Steven D'Aprano
Larry Martell wrote:

> I need to remove all trailing zeros to the right of the decimal point,
> but leave one zero if it's whole number. 


def strip_zero(s):
if '.' not in s:
return s
s = s.rstrip('0')
if s.endswith('.'):
s += '0'
return s


And in use:

py> strip_zero('-10.2500')
'-10.25'
py> strip_zero('123000')
'123000'
py> strip_zero('123000.')
'123000.0'


It doesn't support exponential format:

py> strip_zero('1.230e3')
'1.230e3'

because it isn't clear what you intend to do under those circumstances.


-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: regex help

2015-03-13 Thread Cameron Simpson

On 13Mar2015 12:05, Larry Martell  wrote:

I need to remove all trailing zeros to the right of the decimal point,
but leave one zero if it's whole number. For example, if I have this:

14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196

I want to end up with:

14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196

I have a regex to remove the zeros:

'0+[,$]', ''

But I can't figure out how to get the 5. to be 5.0.
I've been messing with the negative lookbehind, but I haven't found
one that works for this.


Leaving aside the suggested non-greedy match, you can rephrase this: strip 
trailing zeroes _after_ the first decimal digit. Then you can consider a number 
to be:


 digits
 point
 any digit
 other digits to be right-zero stripped

so:

 (\d+\.\d)(\d*[1-9])?0*\b

and keep .group(1) and .group(2) from the match.

Another way of considering the problem.

Or you could two step it. Strip all trailing zeroes. If the result ends in a 
dot, add a single zero.


Cheers,
Cameron Simpson 

C'mon. Take the plunge. By the time you go through rehab the first time,
you'll be surrounded by the most interesting people, and if it takes years
off of your life, don't sweat it. They'll be the last ones anyway.
   - Vinnie Jordan, alt.peeves
--
https://mail.python.org/mailman/listinfo/python-list


Re: regex help

2015-03-13 Thread Larry Martell
On Fri, Mar 13, 2015 at 1:29 PM, MRAB  wrote:
> On 2015-03-13 16:05, Larry Martell wrote:
>>
>> I need to remove all trailing zeros to the right of the decimal point,
>> but leave one zero if it's whole number. For example, if I have this:
>>
>>
>> 14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196
>>
>> I want to end up with:
>>
>>
>> 14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196
>>
>> I have a regex to remove the zeros:
>>
>> '0+[,$]', ''
>>
>> But I can't figure out how to get the 5. to be 5.0.
>> I've been messing with the negative lookbehind, but I haven't found
>> one that works for this.
>>
> Search: (\.\d+?)0+\b
> Replace: \1
>
> which is:
>
> re.sub(r'(\.\d+?)0+\b', r'\1', string)

Thanks! That works perfectly.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: regex help

2015-03-13 Thread Tim Chase
On 2015-03-13 12:05, Larry Martell wrote:
> I need to remove all trailing zeros to the right of the decimal
> point, but leave one zero if it's whole number. 
> 
> But I can't figure out how to get the 5. to be 5.0.
> I've been messing with the negative lookbehind, but I haven't found
> one that works for this.

You can do it with string-ops, or you can resort to regexp.
Personally, I like the clarity of the string-ops version, but use
what suits you.

-tkc

import re
input = [
'14S',
'5.',
'4.5686274500',
'3.7272727272727271',
'3.3947368421052630',
'5.7307692307692308',
'5.7547169811320753',
'4.9423076923076925',
'5.7884615384615383',
'5.13725490196',
]

output = [
'14S',
'5.0',
'4.56862745',
'3.7272727272727271',
'3.394736842105263',
'5.7307692307692308',
'5.7547169811320753',
'4.9423076923076925',
'5.7884615384615383',
'5.13725490196',
]


def fn1(s):
if '.' in s:
s = s.rstrip('0')
if s.endswith('.'):
s += '0'
return s

def fn2(s):
return re.sub(r'(\.\d+?)0+$', r'\1', s)

for fn in (fn1, fn2):
for i, o in zip(input, output):
v = fn(i)
print "%s: %s -> %s [%s]" % (v == o, i, v, o)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: regex help

2015-03-13 Thread MRAB

On 2015-03-13 16:05, Larry Martell wrote:

I need to remove all trailing zeros to the right of the decimal point,
but leave one zero if it's whole number. For example, if I have this:

14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196

I want to end up with:

14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196

I have a regex to remove the zeros:

'0+[,$]', ''

But I can't figure out how to get the 5. to be 5.0.
I've been messing with the negative lookbehind, but I haven't found
one that works for this.


Search: (\.\d+?)0+\b
Replace: \1

which is:

re.sub(r'(\.\d+?)0+\b', r'\1', string)

--
https://mail.python.org/mailman/listinfo/python-list


Re: regex help

2015-03-13 Thread Thomas 'PointedEars' Lahn
Larry Martell wrote:

> I need to remove all trailing zeros to the right of the decimal point,
> but leave one zero if it's whole number. For example, if I have this:
> 
> 
14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196
> 
> I want to end up with:
> 
> 
14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196
> 
> I have a regex to remove the zeros:
> 
> '0+[,$]', ''
> 
> But I can't figure out how to get the 5. to be 5.0.
> I've been messing with the negative lookbehind, but I haven't found
> one that works for this.

First of all, I find it unlikely that you really want to solve your problem 
with regular expressions.  Google “X-Y problem”.

Second, if you must use regular expressions, the most simple approach is to 
use backreferences.

Third, you need to show the relevant (Python) code.



-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.
-- 
https://mail.python.org/mailman/listinfo/python-list


regex help

2015-03-13 Thread Larry Martell
I need to remove all trailing zeros to the right of the decimal point,
but leave one zero if it's whole number. For example, if I have this:

14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196

I want to end up with:

14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196

I have a regex to remove the zeros:

'0+[,$]', ''

But I can't figure out how to get the 5. to be 5.0.
I've been messing with the negative lookbehind, but I haven't found
one that works for this.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs regex help

2010-12-06 Thread Dan M
On Mon, 06 Dec 2010 18:12:33 +0100, Peter Otten wrote:

> By the way:
>  
 print quopri.decodestring("=E4=F6=FC").decode("iso-8859-1")
> äöü
 print r"\xe4\xf6\xfc".decode("string-escape").decode("iso-8859-1")
> äöü

Ah - better than a regex. Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs regex help

2010-12-06 Thread Peter Otten
Dan M wrote:

> I'm getting bogged down with backslash escaping.
> 
> I have some text files containing characters with the 8th bit set. These
> characters are encoded one of two ways: either "=hh" or "\xhh", where "h"
> represents a hex digit, and "\x" is a literal backslash followed by a
> lower-case x.

By the way:
 
>>> print quopri.decodestring("=E4=F6=FC").decode("iso-8859-1")
äöü
>>> print r"\xe4\xf6\xfc".decode("string-escape").decode("iso-8859-1")
äöü

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs regex help

2010-12-06 Thread Dan M
On Mon, 06 Dec 2010 09:44:39 -0600, Dan M wrote:

> That's what I had initially assumed was the case, but looking at the
> data files with a hex editor showed me that I do indeed have
> four-character sequences. That's what makes this such as interesting
> task!

Sorry, I misunderstood the first time I read your reply.

You're right, the string I showed did indeed contain single-byte 
characters, not four-character sequences. The data file I work with, 
though, does contain four-character sequences.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs regex help

2010-12-06 Thread Dan M
On Mon, 06 Dec 2010 16:34:56 +0100, Alain Ketterlin wrote:

> Dan M  writes:
> 
>> I took at look at http://docs.python.org/howto/regex.html, especially
>> the section titled "The Backslash Plague". I started out trying :
> 
> import re
> r = re.compile('x([0-9a-fA-F]{2})') a = "This \xef file \xef has
> \x20 a bunch \xa0 of \xb0 crap \xc0
> 
> The backslash trickery applies to string literals also, not only
> regexps.
> 
> Your string does not have the value you think it has. Double each
> backslash (or make your string raw) and you'll get what you expect.
> 
> -- Alain.

D'oh! I hadn't thought of that. If I read my data file in from disk, use 
the raw string version of the regex, and do the search that way I do 
indeed get the results I'm looking for.

Thanks for pointing that out. I guess I need to think a little deeper 
into what I'm doing when I escape stuff.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs regex help

2010-12-06 Thread Dan M
On Mon, 06 Dec 2010 10:29:41 -0500, Mel wrote:

> What you're missing is that string `a` doesn't actually contain four-
> character sequences like '\', 'x', 'a', 'a' .  It contains single
> characters that you encode in string literals as '\xaa' and so on.  You
> might do better with
> 
> p1 = r'([\x80-\xff])'
> r1 = re.compile (p1)
> m = r1.search (a)
> 
> I get at least an <_sre.SRE_Match object at 0xb749a6e0> when I try this.
> 
>   Mel.

That's what I had initially assumed was the case, but looking at the data 
files with a hex editor showed me that I do indeed have four-character 
sequences. That's what makes this such as interesting task!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs regex help

2010-12-06 Thread Alain Ketterlin
Dan M  writes:

> I took at look at http://docs.python.org/howto/regex.html, especially the 
> section titled "The Backslash Plague". I started out trying :

 import re
 r = re.compile('x([0-9a-fA-F]{2})')
 a = "This \xef file \xef has \x20 a bunch \xa0 of \xb0 crap \xc0 

The backslash trickery applies to string literals also, not only regexps.

Your string does not have the value you think it has. Double each
backslash (or make your string raw) and you'll get what you expect.

-- Alain.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs regex help

2010-12-06 Thread Mel
Dan M wrote:

> I'm getting bogged down with backslash escaping.
> 
> I have some text files containing characters with the 8th bit set. These
> characters are encoded one of two ways: either "=hh" or "\xhh", where "h"
> represents a hex digit, and "\x" is a literal backslash followed by a
> lower-case x.
> 
> Catching the first case with a regex is simple. But when I try to write a
> regex to catch the second case, I mess up the escaping.
> 
> I took at look at http://docs.python.org/howto/regex.html, especially the
> section titled "The Backslash Plague". I started out trying :
> 
> d...@dan:~/personal/usenet$ python
> Python 2.7 (r27:82500, Nov 15 2010, 12:10:23)
> [GCC 4.3.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
 import re
 r = re.compile('x([0-9a-fA-F]{2})')
 a = "This \xef file \xef has \x20 a bunch \xa0 of \xb0 crap \xc0
> characters \xefn \xeft."
 m = r.search(a)
 m
> 
> No match.
> 
> I then followed the advice of the above-mentioned document, and expressed
> the regex as a raw string:
> 
 r = re.compile(r'\\x([0-9a-fA-F]{2})')
 r.search(a)
> 
> Still no match.
> 
> I'm obviously missing something. I spent a fair bit of time playing with
> this over the weekend, and I got nowhere. Now it's time to ask for help.
> What am I doing wrong here?

What you're missing is that string `a` doesn't actually contain four-
character sequences like '\', 'x', 'a', 'a' .  It contains single characters 
that you encode in string literals as '\xaa' and so on.  You might do better 
with

p1 = r'([\x80-\xff])'
r1 = re.compile (p1)
m = r1.search (a)

I get at least an <_sre.SRE_Match object at 0xb749a6e0> when I try this.

Mel.

-- 
http://mail.python.org/mailman/listinfo/python-list


Newbie needs regex help

2010-12-06 Thread Dan M
I'm getting bogged down with backslash escaping.

I have some text files containing characters with the 8th bit set. These 
characters are encoded one of two ways: either "=hh" or "\xhh", where "h" 
represents a hex digit, and "\x" is a literal backslash followed by a 
lower-case x.

Catching the first case with a regex is simple. But when I try to write a 
regex to catch the second case, I mess up the escaping.

I took at look at http://docs.python.org/howto/regex.html, especially the 
section titled "The Backslash Plague". I started out trying :

d...@dan:~/personal/usenet$ python
Python 2.7 (r27:82500, Nov 15 2010, 12:10:23) 
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> r = re.compile('x([0-9a-fA-F]{2})')
>>> a = "This \xef file \xef has \x20 a bunch \xa0 of \xb0 crap \xc0 
characters \xefn \xeft."
>>> m = r.search(a)
>>> m

No match.

I then followed the advice of the above-mentioned document, and expressed 
the regex as a raw string:

>>> r = re.compile(r'\\x([0-9a-fA-F]{2})')
>>> r.search(a)

Still no match.

I'm obviously missing something. I spent a fair bit of time playing with 
this over the weekend, and I got nowhere. Now it's time to ask for help. 
What am I doing wrong here?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread Patrick Maupin
On Apr 8, 3:40 pm, gry  wrote:
> >    >>> s='555tHe-rain.in#=1234'
> >    >>> import re
> >    >>> r=re.compile(r'([a-zA-Z]+|\d+|.)')
> >    >>> r.findall(s)
> >    ['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234']
>
> This is nice and simple and has the invertible property that Patrick
> mentioned above.  Thanks much!

Yes, like using split(), this is invertible.  But you will see a
difference (and for a given task, you might prefer one way or the
other) if, for example, you put a few consecutive spaces in the middle
of your string, where this pattern and findall() will return each
space individually, and split() will return them all together.

You *can* fix up the pattern for findall() where it will have the same
properties as the split(), but it will almost always be a more
complicated pattern than for the equivalent split().

Another thing you can do with split(): if you *think* you have a
pattern that fully covers every string you expect to throw at it, but
would like to verify this, you can make use of the fact that split()
returns a string between each match (and before the first match and
after the last match).  So if you expect that every character in your
entire string should be a part of a match, you can do something like:

strings = splitter(s)
tokens = strings[1::2]
assert not ''.join(strings[::2])

Regards,
Pat
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread gry
>    >>> s='555tHe-rain.in#=1234'
>    >>> import re
>    >>> r=re.compile(r'([a-zA-Z]+|\d+|.)')
>    >>> r.findall(s)
>    ['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234']
This is nice and simple and has the invertible property that Patrick
mentioned above.  Thanks much!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread Jon Clements
On 8 Apr, 19:49, gry  wrote:
> [ python3.1.1, re.__version__='2.2.1' ]
> I'm trying to use re to split a string into (any number of) pieces of
> these kinds:
> 1) contiguous runs of letters
> 2) contiguous runs of digits
> 3) single other characters
>
> e.g.   555tHe-rain.in#=1234   should give:   [555, 'tHe', '-', 'rain',
> '.', 'in', '#', '=', 1234]
> I tried:>>> re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', 
> '555tHe-rain.in#=1234').groups()
>
> ('1234', 'in', '1234', '=')
>
> Why is 1234 repeated in two groups?  and why doesn't "tHe" appear as a
> group?  Is my regexp illegal somehow and confusing the engine?
>
> I *would* like to understand what's wrong with this regex, though if
> someone has a neat other way to do the above task, I'm also interested
> in suggestions.

Avoiding re's (for a bit of fun):
(no good for unicode obviously)

import string
from itertools import groupby, chain, repeat, count, izip

s = """555tHe-rain.in#=1234"""

unique_group = count()
lookup = dict(
chain(
izip(string.ascii_letters, repeat('L')),
izip(string.digits, repeat('D')),
izip(string.punctuation, unique_group)
)
)
parse = dict(D=int, L=str.capitalize)


print [ parse.get(key, lambda L: L)(''.join(items)) for key, items in
groupby(s, lambda L: lookup[L]) ]
[555, 'The', '-', 'Rain', '.', 'In', '#', '=', 1234]

Jon.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread gry
On Apr 8, 3:40 pm, MRAB  wrote:

...
> Group 1 and group 4 match '='.
> Group 1 and group 3 match '1234'.
>
> If a group matches then any earlier match of that group is discarded,
Wow, that makes this much clearer!  I wonder if this behaviour
shouldn't be mentioned in some form in the python docs?
Thanks much!

> so:
>
> Group 1 finishes with '1234'.
> Group 2 finishes with 'in'.
> Group 3 finishes with '1234'.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread Tim Chase

gry wrote:

[ python3.1.1, re.__version__='2.2.1' ]
I'm trying to use re to split a string into (any number of) pieces of
these kinds:
1) contiguous runs of letters
2) contiguous runs of digits
3) single other characters

e.g.   555tHe-rain.in#=1234   should give:   [555, 'tHe', '-', 'rain',
'.', 'in', '#', '=', 1234]
I tried:

re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', '555tHe-rain.in#=1234').groups()

('1234', 'in', '1234', '=')

Why is 1234 repeated in two groups?  and why doesn't "tHe" appear as a
group?  Is my regexp illegal somehow and confusing the engine?


well, I'm not sure what it thinks its finding but nested capture-groups 
always produce somewhat weird results for me (I suspect that's what's 
triggering the duplication).  Additionally, you're only searching for 
one match (.match() returns a single match-object or None; not all 
possible matches within the repeated super-group).



I *would* like to understand what's wrong with this regex, though if
someone has a neat other way to do the above task, I'm also interested
in suggestions.


Tweaking your original, I used

  >>> s='555tHe-rain.in#=1234'
  >>> import re
  >>> r=re.compile(r'([a-zA-Z]+|\d+|.)')
  >>> r.findall(s)
  ['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234']

The only difference between my results and your results is that the 555 
and 1234 come back as strings, not ints.


-tkc




--
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread Patrick Maupin
On Apr 8, 1:49 pm, gry  wrote:
> [ python3.1.1, re.__version__='2.2.1' ]
> I'm trying to use re to split a string into (any number of) pieces of
> these kinds:
> 1) contiguous runs of letters
> 2) contiguous runs of digits
> 3) single other characters
>
> e.g.   555tHe-rain.in#=1234   should give:   [555, 'tHe', '-', 'rain',
> '.', 'in', '#', '=', 1234]
> I tried:>>> re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', 
> '555tHe-rain.in#=1234').groups()
>
> ('1234', 'in', '1234', '=')
>
> Why is 1234 repeated in two groups?  and why doesn't "tHe" appear as a
> group?  Is my regexp illegal somehow and confusing the engine?
>
> I *would* like to understand what's wrong with this regex, though if
> someone has a neat other way to do the above task, I'm also interested
> in suggestions.

IMO, for most purposes, for people who don't want to become re
experts, the easiest, fastest, best, most predictable way to use re is
re.split.  You can either call re.split directly, or, if you are going
to be splitting on the same pattern over and over, compile the pattern
and grab its split method.  Use a *single* capture group in the
pattern, that covers the *whole* pattern.  In the case of your example
data:

>>> import re
>>> splitter=re.compile('([A-Za-z]+|[0-9]+|[-.#=])').split
>>> s='555tHe-rain.in#=1234'
>>> [x for x in splitter(s) if x]
['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234']

The reason for the list comprehension is that re.split will always
return a non-matching string between matches.  Sometimes this is
useful even when it is a null string (see recent discussion in the
group about splitting digits out of a string), but if you don't care
to see null (empty) strings, this comprehension will remove them.

The reason for a single capture group that covers the whole pattern is
that it is much easier to reason about the output.  The split will
give you all your data, in order, e.g.

>>> ''.join(splitter(s)) == s
True

HTH,
Pat
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread Jon Clements
On 8 Apr, 19:49, gry  wrote:
> [ python3.1.1, re.__version__='2.2.1' ]
> I'm trying to use re to split a string into (any number of) pieces of
> these kinds:
> 1) contiguous runs of letters
> 2) contiguous runs of digits
> 3) single other characters
>
> e.g.   555tHe-rain.in#=1234   should give:   [555, 'tHe', '-', 'rain',
> '.', 'in', '#', '=', 1234]
> I tried:>>> re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', 
> '555tHe-rain.in#=1234').groups()
>
> ('1234', 'in', '1234', '=')
>
> Why is 1234 repeated in two groups?  and why doesn't "tHe" appear as a
> group?  Is my regexp illegal somehow and confusing the engine?
>
> I *would* like to understand what's wrong with this regex, though if
> someone has a neat other way to do the above task, I'm also interested
> in suggestions.

I would avoid .match and use .findall
(if you walk through them both together, it'll make sense what's
happening
with your match string).

>>> s = """555tHe-rain.in#=1234"""
>>> re.findall('[A-Za-z]+|[0-9]+|[-.#=]', s)
['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234']

hth,

Jon.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread MRAB

gry wrote:

[ python3.1.1, re.__version__='2.2.1' ]
I'm trying to use re to split a string into (any number of) pieces of
these kinds:
1) contiguous runs of letters
2) contiguous runs of digits
3) single other characters

e.g.   555tHe-rain.in#=1234   should give:   [555, 'tHe', '-', 'rain',
'.', 'in', '#', '=', 1234]
I tried:

re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', '555tHe-rain.in#=1234').groups()

('1234', 'in', '1234', '=')

Why is 1234 repeated in two groups?  and why doesn't "tHe" appear as a
group?  Is my regexp illegal somehow and confusing the engine?

I *would* like to understand what's wrong with this regex, though if
someone has a neat other way to do the above task, I'm also interested
in suggestions.


If the regex was illegal then it would raise an exception. It's doing
exactly what you're asking it to do!

First of all, there are 4 groups, with group 1 containing groups 2..4 as
alternatives, so group 1 will match whatever groups 2..4 match:

Group 1: (([A-Za-z]+)|([0-9]+)|([-.#=]))
Group 2: ([A-Za-z]+)
Group 3: ([0-9]+)
Group 4: ([-.#=])

It matches like this:

Group 1 and group 3 match '555'.
Group 1 and group 2 match 'tHe'.
Group 1 and group 4 match '-'.
Group 1 and group 2 match 'rain'.
Group 1 and group 4 match '.'.
Group 1 and group 2 match 'in'.
Group 1 and group 4 match '#'.
Group 1 and group 4 match '='.
Group 1 and group 3 match '1234'.

If a group matches then any earlier match of that group is discarded,
so:

Group 1 finishes with '1234'.
Group 2 finishes with 'in'.
Group 3 finishes with '1234'.
Group 4 finishes with '='.

A solution is:

>>> re.findall('[A-Za-z]+|[0-9]+|[-.#=]', '555tHe-rain.in#=1234')
['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234']

Note: re.findall() returns a list of matches, so if the regex doesn't
contain any groups then it returns the matched substrings. Compare:

>>> re.findall("a(.)", "ax ay")
['x', 'y']
>>> re.findall("a.", "ax ay")
['ax', 'ay']
--
http://mail.python.org/mailman/listinfo/python-list


regex help: splitting string gets weird groups

2010-04-08 Thread gry
[ python3.1.1, re.__version__='2.2.1' ]
I'm trying to use re to split a string into (any number of) pieces of
these kinds:
1) contiguous runs of letters
2) contiguous runs of digits
3) single other characters

e.g.   555tHe-rain.in#=1234   should give:   [555, 'tHe', '-', 'rain',
'.', 'in', '#', '=', 1234]
I tried:
>>> re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', 
>>> '555tHe-rain.in#=1234').groups()
('1234', 'in', '1234', '=')

Why is 1234 repeated in two groups?  and why doesn't "tHe" appear as a
group?  Is my regexp illegal somehow and confusing the engine?

I *would* like to understand what's wrong with this regex, though if
someone has a neat other way to do the above task, I'm also interested
in suggestions.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2010-01-07 Thread Rolando Espinoza La Fuente
# http://gist.github.com/271661

import lxml.html
import re

src = """
lksjdfls  kdjff lsdfs  sdjfls sdfsdwelcome
hello, my age is 86 years old and I was born in 1945. Do you know
that
PI is roughly 3.1443534534534534534 """

regex = re.compile('amazon_(\d+)')

doc = lxml.html.document_fromstring(src)

for div in doc.xpath('//div[starts-with(@id, "amazon_")]'):
match = regex.match(div.get('id'))
if match:
print match.groups()[0]



On Thu, Jan 7, 2010 at 4:42 PM, Aahz  wrote:
> In article 
> <19de1d6e-5ba9-42b5-9221-ed7246e39...@u36g2000prn.googlegroups.com>,
> Oltmans   wrote:
>>
>>I've written this regex that's kind of working
>>re.findall("\w+\s*\W+amazon_(\d+)",str)
>>
>>but I was just wondering that there might be a better RegEx to do that
>>same thing. Can you kindly suggest a better/improved Regex. Thank you
>>in advance.
>
> 'Some people, when confronted with a problem, think "I know, I'll use
> regular expressions."  Now they have two problems.'
> --Jamie Zawinski
>
> Take the advice other people gave you and use BeautifulSoup.
> --
> Aahz (a...@pythoncraft.com)           <*>         http://www.pythoncraft.com/
>
> "If you think it's expensive to hire a professional to do the job, wait
> until you hire an amateur."  --Red Adair
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
Rolando Espinoza La fuente
www.rolandoespinoza.info
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2010-01-07 Thread Aahz
In article <19de1d6e-5ba9-42b5-9221-ed7246e39...@u36g2000prn.googlegroups.com>,
Oltmans   wrote:
>
>I've written this regex that's kind of working
>re.findall("\w+\s*\W+amazon_(\d+)",str)
>
>but I was just wondering that there might be a better RegEx to do that
>same thing. Can you kindly suggest a better/improved Regex. Thank you
>in advance.

'Some people, when confronted with a problem, think "I know, I'll use
regular expressions."  Now they have two problems.'
--Jamie Zawinski

Take the advice other people gave you and use BeautifulSoup.
-- 
Aahz (a...@pythoncraft.com)   <*> http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-24 Thread F.R.



On 21.12.2009 12:38, Oltmans wrote:

Hello,. everyone.

I've a string that looks something like

lksjdfls  kdjff lsdfs  sdjflssdfsdwelcome


> From above string I need the digits within the ID attribute. For
example, required output from above string is
- 35343433
- 345343
- 8898

I've written this regex that's kind of working
re.findall("\w+\s*\W+amazon_(\d+)",str)

but I was just wondering that there might be a better RegEx to do that
same thing. Can you kindly suggest a better/improved Regex. Thank you
in advance.
   


If you filter in two or even more sequential steps the problem becomes a 
lot simpler, not least because you can

test each step separately:

>>> r1 = re.compile (']*')   # Add ignore case and 
variable white space

>>> r2 = re.compile ('\d+')
>>> [r2.search (item).group () for item in r1.findall (s) if item] 
# s is your sample

['345343', '35343433', '8898'] # Supposing all ids have digits

Frederic

--
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-22 Thread Paul McGuire
On Dec 21, 5:38 am, Oltmans  wrote:
> Hello,. everyone.
>
> I've a string that looks something like
> 
> lksjdfls  kdjff lsdfs  sdjfls  =   "amazon_35343433">sdfsdwelcome
> 
>
> From above string I need the digits within the ID attribute. For
> example, required output from above string is
> - 35343433
> - 345343
> - 8898
>
> I've written this regex that's kind of working
> re.findall("\w+\s*\W+amazon_(\d+)",str)
>

The issue with using regexen for parsing HTML is that you often get
surprised by attributes that you never expected, or out of order, or
with weird or missing quotation marks, or tags or attributes that are
in upper/lower case.  BeautifulSoup is one tool to use for HTML
scraping, here is a pyparsing example, with hopefully descriptive
comments:


from pyparsing import makeHTMLTags,ParseException

src = """
lksjdfls  kdjff lsdfs  sdjfls sdfsdwelcome
hello, my age is 86 years old and I was born in 1945. Do you know
that
PI is roughly 3.1443534534534534534 """

# use makeHTMLTags to return an expression that will match
# HTML  tags, including attributes, upper/lower case,
# etc. (makeHTMLTags will return expressions for both
# opening and closing tags, but we only care about the
# opening one, so just use the [0]th returned item
div = makeHTMLTags("div")[0]

# define a parse action to filter only for  tags
# with the proper id form
def filterByIdStartingWithAmazon(tokens):
if not tokens.id.startswith("amazon_"):
raise ParseException(
  "must have id attribute starting with 'amazon_'")

# define a parse action that will add a pseudo-
# attribute 'amazon_id', to make it easier to get the
# numeric portion of the id after the leading 'amazon_'
def makeAmazonIdAttribute(tokens):
tokens["amazon_id"] = tokens.id[len("amazon_"):]

# attach parse action callbacks to the div expression -
# these will be called during parse time
div.setParseAction(filterByIdStartingWithAmazon,
 makeAmazonIdAttribute)

# search through the input string for matching s,
# and print out their amazon_id's
for divtag in div.searchString(src):
print divtag.amazon_id


Prints:

345343
35343433
8898

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-22 Thread Umakanth
how about re.findall(r'\w+.=\W\D+(\d+)?',str) ?

this will work for any string within id !

~Ukanth

On Dec 21, 6:06 pm, Oltmans  wrote:
> On Dec 21, 5:05 pm, Umakanth  wrote:
>
> > How about re.findall(r'\d+(?:\.\d+)?',str)
>
> > extracts only numbers from any string
>
> Thank you. However, I only need the digits within the ID attribute of
> the DIV. Regex that you suggested fails on the following string
>
> 
> lksjdfls  kdjff lsdfs  sdjfls  =   "amazon_35343433">sdfsdwelcome
> hello, my age is 86 years old and I was born in 1945. Do you know that
> PI is roughly 3.1443534534534534534
> 
>
> > ~uk
>
> > On Dec 21, 4:38 pm, Oltmans  wrote:
>
> > > Hello,. everyone.
>
> > > I've a string that looks something like
> > > 
> > > lksjdfls  kdjff lsdfs  sdjfls  > > =   "amazon_35343433">sdfsdwelcome
> > > 
>
> > > From above string I need the digits within the ID attribute. For
> > > example, required output from above string is
> > > - 35343433
> > > - 345343
> > > - 8898
>
> > > I've written this regex that's kind of working
> > > re.findall("\w+\s*\W+amazon_(\d+)",str)
>
> > > but I was just wondering that there might be a better RegEx to do that
> > > same thing. Can you kindly suggest a better/improved Regex. Thank you
> > > in advance.
>
>

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-21 Thread Johann Spies
> Oltmans wrote:
> >I've a string that looks something like
> >
> >lksjdfls  kdjff lsdfs  sdjfls  >=   "amazon_35343433">sdfsdwelcome
> >
> >
> >>From above string I need the digits within the ID attribute. For
> >example, required output from above string is
> >- 35343433
> >- 345343
> >- 8898
> >

Your string is in /tmp/y in this example:

$ grep -o [0-9]+ /tmp/y
345343
35343433
8898

Much simpler, isn't it?  But that is not python.

Regards
Johann

-- 
Johann Spies  Telefoon: 021-808 4599
Informasietegnologie, Universiteit van Stellenbosch

 "And there were in the same country shepherds abiding 
  in the field, keeping watch over their flock by night.
  And, lo, the angel of the Lord came upon them, and the
  glory of the Lord shone round about them: and they were 
  sore afraid. And the angel said unto them, Fear not:
  for behold I bring you good tidings of great joy, which
  shall be to all people. For unto you is born this day 
  in the city of David a Saviour, which is Christ the 
  Lord."Luke 2:8-11 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-21 Thread MRAB

Oltmans wrote:

Hello,. everyone.

I've a string that looks something like

lksjdfls  kdjff lsdfs  sdjfls sdfsdwelcome



From above string I need the digits within the ID attribute. For

example, required output from above string is
- 35343433
- 345343
- 8898

I've written this regex that's kind of working
re.findall("\w+\s*\W+amazon_(\d+)",str)

but I was just wondering that there might be a better RegEx to do that
same thing. Can you kindly suggest a better/improved Regex. Thank you
in advance.


Try:

re.findall(r"", str)

You shouldn't be using 'str' as a variable name because it hides the
builtin string class 'str'.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-21 Thread Umakanth
Ok. how about re.findall(r'\w+_(\d+)',str) ?

returns ['345343', '35343433', '8898', '8898'] !

On Dec 21, 6:06 pm, Oltmans  wrote:
> On Dec 21, 5:05 pm, Umakanth  wrote:
>
> > How about re.findall(r'\d+(?:\.\d+)?',str)
>
> > extracts only numbers from any string
>
> Thank you. However, I only need the digits within the ID attribute of
> the DIV. Regex that you suggested fails on the following string
>
> 
> lksjdfls  kdjff lsdfs  sdjfls  =   "amazon_35343433">sdfsdwelcome
> hello, my age is 86 years old and I was born in 1945. Do you know that
> PI is roughly 3.1443534534534534534
> 
>
> > ~uk
>
> > On Dec 21, 4:38 pm, Oltmans  wrote:
>
> > > Hello,. everyone.
>
> > > I've a string that looks something like
> > > 
> > > lksjdfls  kdjff lsdfs  sdjfls  > > =   "amazon_35343433">sdfsdwelcome
> > > 
>
> > > From above string I need the digits within the ID attribute. For
> > > example, required output from above string is
> > > - 35343433
> > > - 345343
> > > - 8898
>
> > > I've written this regex that's kind of working
> > > re.findall("\w+\s*\W+amazon_(\d+)",str)
>
> > > but I was just wondering that there might be a better RegEx to do that
> > > same thing. Can you kindly suggest a better/improved Regex. Thank you
> > > in advance.
>
>

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-21 Thread Oltmans
On Dec 21, 5:05 pm, Umakanth  wrote:
> How about re.findall(r'\d+(?:\.\d+)?',str)
>
> extracts only numbers from any string
>

Thank you. However, I only need the digits within the ID attribute of
the DIV. Regex that you suggested fails on the following string


lksjdfls  kdjff lsdfs  sdjfls sdfsdwelcome
hello, my age is 86 years old and I was born in 1945. Do you know that
PI is roughly 3.1443534534534534534





> ~uk
>
> On Dec 21, 4:38 pm, Oltmans  wrote:
>
> > Hello,. everyone.
>
> > I've a string that looks something like
> > 
> > lksjdfls  kdjff lsdfs  sdjfls  > =   "amazon_35343433">sdfsdwelcome
> > 
>
> > From above string I need the digits within the ID attribute. For
> > example, required output from above string is
> > - 35343433
> > - 345343
> > - 8898
>
> > I've written this regex that's kind of working
> > re.findall("\w+\s*\W+amazon_(\d+)",str)
>
> > but I was just wondering that there might be a better RegEx to do that
> > same thing. Can you kindly suggest a better/improved Regex. Thank you
> > in advance.
>
>

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-21 Thread Peter Otten
Oltmans wrote:

> I've a string that looks something like
> 
> lksjdfls  kdjff lsdfs  sdjfls  =   "amazon_35343433">sdfsdwelcome
> 
> 
> From above string I need the digits within the ID attribute. For
> example, required output from above string is
> - 35343433
> - 345343
> - 8898
> 
> I've written this regex that's kind of working
> re.findall("\w+\s*\W+amazon_(\d+)",str)
> 
> but I was just wondering that there might be a better RegEx to do that
> same thing. Can you kindly suggest a better/improved Regex. Thank you
> in advance.

>>> from BeautifulSoup import BeautifulSoup
>>> bs = BeautifulSoup("""lksjdfls  kdjff lsdfs 
 sdjfls sdfsdwelcome""")
>>> [node["id"][7:] for node in bs(id=lambda id: id.startswith("amazon_"))]
[u'345343', u'35343433', u'8898']

I think BeautifulSoup is a better tool for the task since it actually 
"understands" HTML.

Peter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-21 Thread mik3
On Dec 21, 7:38 pm, Oltmans  wrote:
> Hello,. everyone.
>
> I've a string that looks something like
> 
> lksjdfls  kdjff lsdfs  sdjfls  =   "amazon_35343433">sdfsdwelcome
> 
>
> From above string I need the digits within the ID attribute. For
> example, required output from above string is
> - 35343433
> - 345343
> - 8898
>
> I've written this regex that's kind of working
> re.findall("\w+\s*\W+amazon_(\d+)",str)
>
> but I was just wondering that there might be a better RegEx to do that
> same thing. Can you kindly suggest a better/improved Regex. Thank you
> in advance.

don't need regular expression. just do a split on amazon

>>> s="""lksjdfls  kdjff lsdfs  sdjfls >> =   "amazon_35343433">sdfsdwelcome"""

>>> for item in s.split("amazon_")[1:]:
...   print item
...
345343'> kdjff lsdfs  sdjfls sdfsdwelcome

then find  ' or " indices and do index  slicing.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-21 Thread Umakanth
How about re.findall(r'\d+(?:\.\d+)?',str)

extracts only numbers from any string

~uk

On Dec 21, 4:38 pm, Oltmans  wrote:
> Hello,. everyone.
>
> I've a string that looks something like
> 
> lksjdfls  kdjff lsdfs  sdjfls  =   "amazon_35343433">sdfsdwelcome
> 
>
> From above string I need the digits within the ID attribute. For
> example, required output from above string is
> - 35343433
> - 345343
> - 8898
>
> I've written this regex that's kind of working
> re.findall("\w+\s*\W+amazon_(\d+)",str)
>
> but I was just wondering that there might be a better RegEx to do that
> same thing. Can you kindly suggest a better/improved Regex. Thank you
> in advance.

-- 
http://mail.python.org/mailman/listinfo/python-list


Regex help needed!

2009-12-21 Thread Oltmans
Hello,. everyone.

I've a string that looks something like

lksjdfls  kdjff lsdfs  sdjfls sdfsdwelcome


>From above string I need the digits within the ID attribute. For
example, required output from above string is
- 35343433
- 345343
- 8898

I've written this regex that's kind of working
re.findall("\w+\s*\W+amazon_(\d+)",str)

but I was just wondering that there might be a better RegEx to do that
same thing. Can you kindly suggest a better/improved Regex. Thank you
in advance.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2009-12-17 Thread S.Selvam
On Wed, Dec 16, 2009 at 10:46 PM, Gabriel Rossetti <
gabriel.rosse...@arimaz.com> wrote:

> Hello everyone,
>
> I'm going nuts with some regex, could someone please show me what I'm doing
> wrong?
>
> I have an XMPP msg :
>
> 
>   
>   
>   123
>   456
>   
>   ...
>   
>   
> 
>
> the  node may be absent or empty (), the  node
> may be absent. I'd like to grab everything exept the  nod and
> create something new using regex, with the XMPP message example above I'd
> get this :
>
> 
>   
>   
>   123
>   456
>   
>   
>   
> 
>
> for some reason my regex doesn't work correctly :
>
> r"().*?( .*?>).*?(?:(.*?)|)?.*?()?"
>
>
If all you need is to remove payload node ,this could be useful,

s1="123456..."

pat=re.compile(r"")
s1=pat.sub("",s1)


-- 
Regards,
S.Selvam
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2009-12-16 Thread Intchanter / Daniel Fackrell
On Dec 16, 10:22 am, r0g  wrote:
> Gabriel Rossetti wrote:
> > Hello everyone,
>
> > I'm going nuts with some regex, could someone please show me what I'm
> > doing wrong?
>
> > I have an XMPP msg :
>
> 
>
> > Does someone know what is wrong with my expression? Thank you, Gabriel
>
> Gabriel, trying to debug a long regex in situ can be a nightmare however
> the following technique always works for me...
>
> Use the interactive interpreter and see if half the regex works, if it
> does your problem is in the second half, if not it's in the first so try
> the first half of that and so on an so forth. You'll find the point at
> which it goes wrong in a snip.
>
> Non-trivial regexes are always best built up and tested a bit at a time,
> the interactive interpreter is great for this.
>
> Roger.

I'll just add that the "now you have two problems" quip applies here,
especially when there are very good XML parsing libraries for Python
that will keep you from having to reinvent the wheel for every little
change.

See sections 20.5 through 20.13 of the Python Documentation for
several built-in options, and I'm sure there are many community
projects that may fit the bill if none of those happen to.

Personally, I consider regular expressions of any substantial length
and complexity to be bad practice as it inhibits readability and
maintainability.  They are also decidedly non-Zen on at least
"Readability counts" and "Sparse is better than dense".

Intchanter
Daniel Fackrell

P.S. I'm not sure how any of these libraries are implemented yet, but
I'd hope they're using a finite state machine tailored to the parsing
task rather than using regexes, but even if they do the latter, having
that abstracted out in a mature library with a clean interface is
still a huge win.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2009-12-16 Thread r0g
Gabriel Rossetti wrote:
> Hello everyone,
> 
> I'm going nuts with some regex, could someone please show me what I'm
> doing wrong?
> 
> I have an XMPP msg :
> 

> 
> 
> Does someone know what is wrong with my expression? Thank you, Gabriel




Gabriel, trying to debug a long regex in situ can be a nightmare however
the following technique always works for me...

Use the interactive interpreter and see if half the regex works, if it
does your problem is in the second half, if not it's in the first so try
the first half of that and so on an so forth. You'll find the point at
which it goes wrong in a snip.

Non-trivial regexes are always best built up and tested a bit at a time,
the interactive interpreter is great for this.

Roger.
-- 
http://mail.python.org/mailman/listinfo/python-list


regex help

2009-12-16 Thread Gabriel Rossetti

Hello everyone,

I'm going nuts with some regex, could someone please show me what I'm 
doing wrong?


I have an XMPP msg :


   
   
   123
   456
   
   ...
   
   


the  node may be absent or empty (), the  node 
may be absent. I'd like to grab everything exept the  nod and 
create something new using regex, with the XMPP message example above 
I'd get this :



   
   
   123
   456
   
   
   


for some reason my regex doesn't work correctly :

r"().*?(.*?>).*?(?:(.*?)|)?.*?()?"


I group the opening  node, the opening  node and if the 
 node is present and not empty I group it and if the  
node is present I group it. For some reason this doesn't work correctly :


>>> import re
>>> s1 = "xmlns='myprotocol:core' version='1.0' 
type='mytype'>123456type='plain'>...seconds='15'/>"
>>> s2 = "xmlns='myprotocol:core' version='1.0' 
type='mytype'>type='plain'>...seconds='15'/>"
>>> s3 = "xmlns='myprotocol:core' version='1.0' type='mytype'>type='plain'>...seconds='15'/>"
>>> s4 = "xmlns='myprotocol:core' version='1.0' 
type='mytype'>123456type='plain'>..."
>>> s5 = "xmlns='myprotocol:core' version='1.0' 
type='mytype'>type='plain'>..."
>>> s6 = "xmlns='myprotocol:core' version='1.0' type='mytype'>type='plain'>..."
>>> exp = r"().*?(.*?>).*?(?:(.*?)|)?.*?()?"

>>>
>>> re.match(exp, s1).groups()
("", "xmlns='myprotocol:core' version='1.0' type='mytype'>", 
'123456', None)

>>>
>>> re.match(exp, s2).groups()
("", "xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)

>>>
>>> re.match(exp, s3).groups()
("", "xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)

>>>
>>> re.match(exp, s4).groups()
("", "xmlns='myprotocol:core' version='1.0' type='mytype'>", 
'123456', None)

>>>
>>> re.match(exp, s5).groups()
("", "xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)

>>>
>>> re.match(exp, s6).groups()
("", "xmlns='myprotocol:core' version='1.0' type='mytype'>", None, None)

>>>


Does someone know what is wrong with my expression? Thank you, Gabriel
--
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2009-07-09 Thread Peter Otten
David wrote:

>  
> 
> Open:
> 
> 5.50
> 
>  
> Mkt Cap:
> 
> 6.92M
> 
>  
> P/E:
> 
> 21.99
> 
> 
> 
> I want to extract the open, mkt cap and P/E values - but apart from
> doing loads of indivdual REs which I think would look messy, I can't
> think of a better and neater looking way. Any ideas?

>>> from BeautifulSoup import BeautifulSoup
>>> bs = BeautifulSoup(""" 
...
... Open:
... 
... 5.50
... 
...  
... Mkt Cap:
... 
... 6.92M
... 
...  
... P/E:
... 
... 21.99
... 
... """)
>>> for key in bs.findAll(attrs={"class": "key"}):
... value = key.findNext(attrs={"class": "val"})
... print key.string.strip(), "-->", value.string.strip()
...
Open: --> 5.50
Mkt Cap: --> 6.92M
P/E: --> 21.99


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2009-07-08 Thread Rhodri James
On Wed, 08 Jul 2009 23:06:22 +0100, David   
wrote:



Hi

I have a few regexs I need to do, but im struggling to come up with a
nice way of doing them, and more than anything am here to learn some
tricks and some neat code rather than getting an answer - although
thats obviously what i would like to get to.

Problem 1 -

(25.47%)

I want to extract 25.47 from here - so far I've tried -

xPer = re.search('(.*?)%', content)


Supposing that str(xID.group(1)) == "678774", let's see how that string
concatenation turns out:

(.*?)%

The obvious problems here are the spurious double quotes, the spurious
(but harmless) escaping of a double quote, and the lack of (escaped)
backslash and (escaped) open parenthesis.  The latter you can always
strip off later, but the first sink the match rather thoroughly.



and

xPer = re.search('\((\d*)%\)', content)


With only two single quotes present, the biggest problem should be obvious.

Unfortunately if you just fix the obvious in either of the two regular
expressions, you're setting yourself up for a fall later on.  As The Fine
Manual says right at the top of the page on the re module
(http://docs.python.org/library/re.html), you want to be using raw string
literals when you're dealing with regular expressions, because you want
the backslashes getting through without being interpreted specially by
Python's own parser.  As it happens you get away with it in this case,
since neither '\d' nor '\(' have a special meaning to Python, so aren't
changed, and '\"' is interpreted as '"', which happens to be the right
thing anyway.



Problem 2 -

 

Open:

5.50

 
Mkt Cap:

6.92M

 
P/E:

21.99



I want to extract the open, mkt cap and P/E values - but apart from
doing loads of indivdual REs which I think would look messy, I can't
think of a better and neater looking way. Any ideas?


What you're trying to do is inherently messy.  You might want to use
something like BeautifulSoup to hide the mess, but never having had
cause to use it myself I couldn't say for sure.

--
Rhodri James *-* Wildebeest Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2009-07-08 Thread Tim Harig
On 2009-07-08, Chris Rebert  wrote:
> On Wed, Jul 8, 2009 at 3:06 PM, David wrote:
>> I want to extract the open, mkt cap and P/E values - but apart from
>> doing loads of indivdual REs which I think would look messy, I can't
>> think of a better and neater looking way. Any ideas?

You are downloading market data?  Yahoo offers its stats in CSV format that
is easier to parse without a dedicated parser.

> Use an actual HTML parser? Like BeautifulSoup
> (http://www.crummy.com/software/BeautifulSoup/), for instance.

I agree with your sentiment exactly.  If the regex he is trying to get is
difficult enough that he has to ask; then, yes, he should be using a
parser.

> I will never understand why so many people try to parse/scrape
> HTML/XML with regexes...

Why?  Because some times it is good enough to get the job done easily.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2009-07-08 Thread Chris Rebert
On Wed, Jul 8, 2009 at 3:06 PM, David wrote:
> Hi
>
> I have a few regexs I need to do, but im struggling to come up with a
> nice way of doing them, and more than anything am here to learn some
> tricks and some neat code rather than getting an answer - although
> thats obviously what i would like to get to.
>
> Problem 1 -
>
>                 id="ref_678774_cp">(25.47%)
>
> I want to extract 25.47 from here - so far I've tried -
>
> xPer = re.search(' \">(.*?)%', content)
>
> and
>
> xPer = re.search(' \">\((\d*)%\)', content)
>
> neither of these seem to do what I want - am I not doing this
> correctly? (obviously!)
>
> Problem 2 -
>
>  
>
> Open:
> 
> 5.50
> 
>  
> Mkt Cap:
> 
> 6.92M
> 
>  
> P/E:
> 
> 21.99
> 
>
>
> I want to extract the open, mkt cap and P/E values - but apart from
> doing loads of indivdual REs which I think would look messy, I can't
> think of a better and neater looking way. Any ideas?

Use an actual HTML parser? Like BeautifulSoup
(http://www.crummy.com/software/BeautifulSoup/), for instance.

I will never understand why so many people try to parse/scrape
HTML/XML with regexes...

Cheers,
Chris
-- 
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


regex help

2009-07-08 Thread David
Hi

I have a few regexs I need to do, but im struggling to come up with a
nice way of doing them, and more than anything am here to learn some
tricks and some neat code rather than getting an answer - although
thats obviously what i would like to get to.

Problem 1 -

(25.47%)

I want to extract 25.47 from here - so far I've tried -

xPer = re.search('(.*?)%', content)

and

xPer = re.search('\((\d*)%\)', content)

neither of these seem to do what I want - am I not doing this
correctly? (obviously!)

Problem 2 -

 

Open:

5.50

 
Mkt Cap:

6.92M

 
P/E:

21.99



I want to extract the open, mkt cap and P/E values - but apart from
doing loads of indivdual REs which I think would look messy, I can't
think of a better and neater looking way. Any ideas?

Cheers

David

-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Regex Help

2008-09-25 Thread Lawrence D'Oliveiro
In message <[EMAIL PROTECTED]>, Support
Desk wrote:

> Thanks for the reply ...

A: The vulture doesn't get Frequent Poster miles.
Q: What's the difference between a top-poster and a vulture?
--
http://mail.python.org/mailman/listinfo/python-list


RE: More regex help

2008-09-24 Thread Support Desk
Kirk, 

That's exactly what I needed. Thx!
 

-Original Message-
From: Kirk Strauser [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 24, 2008 11:42 AM
To: python-list@python.org
Subject: Re: More regex help

At 2008-09-24T16:25:02Z, "Support Desk" <[EMAIL PROTECTED]> writes:

> I am working on a python webcrawler, that will extract all links from an
> html page, and add them to a queue, The problem I am having is building
> absolute links from relative links, as there are so many different types
of
> relative links. If I just append the relative links to the current url,
some
> websites will send it into a never-ending loop. 

>>> import urllib
>>> urllib.basejoin('http://www.example.com/path/to/deep/page',
'/foo')
'http://www.example.com/foo'
>>> urllib.basejoin('http://www.example.com/path/to/deep/page',
'http://slashdot.org/foo')
'http://slashdot.org/foo'

-- 
Kirk Strauser
The Day Companies


--
http://mail.python.org/mailman/listinfo/python-list


Re: More regex help

2008-09-24 Thread Kirk Strauser
At 2008-09-24T16:25:02Z, "Support Desk" <[EMAIL PROTECTED]> writes:

> I am working on a python webcrawler, that will extract all links from an
> html page, and add them to a queue, The problem I am having is building
> absolute links from relative links, as there are so many different types of
> relative links. If I just append the relative links to the current url, some
> websites will send it into a never-ending loop. 

>>> import urllib
>>> urllib.basejoin('http://www.example.com/path/to/deep/page',
'/foo')
'http://www.example.com/foo'
>>> urllib.basejoin('http://www.example.com/path/to/deep/page',
'http://slashdot.org/foo')
'http://slashdot.org/foo'

-- 
Kirk Strauser
The Day Companies
--
http://mail.python.org/mailman/listinfo/python-list


More regex help

2008-09-24 Thread Support Desk
I am working on a python webcrawler, that will extract all links from an
html page, and add them to a queue, The problem I am having is building
absolute links from relative links, as there are so many different types of
relative links. If I just append the relative links to the current url, some
websites will send it into a never-ending loop. 

What I am looking for is a regexp that will extract the root url from any 
url string I pass to it, such as

'http://example.com/stuff/stuff/morestuff/index.html'

Regexp = http:example.com

'http://anotherexample.com/stuff/index.php

Regexp = 'http://anotherexample.com/

'http://example.com/stuff/stuff/

Regext = 'http://example.com'





--
http://mail.python.org/mailman/listinfo/python-list


RE: Regex Help

2008-09-24 Thread Support Desk

Thanks for the reply, I found out the problem was occurring later on in the
script. The regexp works well.

-Original Message-
From: Lawrence D'Oliveiro [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 23, 2008 6:51 PM
To: python-list@python.org
Subject: Re: Regex Help

In message <[EMAIL PROTECTED]>, Support
Desk wrote:

> Anybody know of a good regex to parse html links from html code? The one I
> am currently using seems to be cutting off the last letter of some links,
> and returning links like
> 
> http://somesite.co
> 
> or http://somesite.ph
> 
> the code I am using is
> 
> 
> regex = r''

Can you post some example HTML sequences that this regexp is not handling
correctly?


--
http://mail.python.org/mailman/listinfo/python-list


Re: Regex Help

2008-09-23 Thread Lawrence D'Oliveiro
In message <[EMAIL PROTECTED]>, Support
Desk wrote:

> Anybody know of a good regex to parse html links from html code? The one I
> am currently using seems to be cutting off the last letter of some links,
> and returning links like
> 
> http://somesite.co
> 
> or http://somesite.ph
> 
> the code I am using is
> 
> 
> regex = r''

Can you post some example HTML sequences that this regexp is not handling
correctly?
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regex Help

2008-09-23 Thread Miki
Hello,

> Anybody know of a good regex to parse html links from html code?
BeautifulSoup is *the* library to handle HTML

from BeautifulSoup import BeautifulSoup
from urllib import urlopen

soup = BeautifulSoup(urlopen("http://python.org/";))
for a in soup("a"):
print a["href"]

HTH,
--
Miki <[EMAIL PROTECTED]>
http://pythonwise.blogspot.com
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regex Help

2008-09-22 Thread Fredrik Lundh

Support Desk wrote:

the code I am using is 


regex = r''


that's way too fragile to work with real-life HTML (what if the link has 
a TITLE attribute, for example?  or contains whitespace after the HREF?)


you might want to consider using a real HTML parser for this task.


page_text = urllib.urlopen('http://somesite.com')
page_text = page_text.read()

links = re.findall(regex, text, re.IGNORECASE)


the RE looks fine for the subset of all valid A elements that it can 
handle, though.


got any examples of pages where you see that behaviour?



--
http://mail.python.org/mailman/listinfo/python-list


Regex Help

2008-09-22 Thread Support Desk
Anybody know of a good regex to parse html links from html code? The one I
am currently using seems to be cutting off the last letter of some links,
and returning links like

http://somesite.co

or http://somesite.ph

the code I am using is 


regex = r''

page_text = urllib.urlopen('http://somesite.com')
page_text = page_text.read()

links = re.findall(regex, text, re.IGNORECASE)



--
http://mail.python.org/mailman/listinfo/python-list


RE: regex help

2008-06-30 Thread Metal Zong
>>> import re
>>>
>>> if __name__ == "__main__":
... lst = [281, 713, 832, 1281, 1713, 1832, 2281, 2713, 2832]
... for item in lst:
... if re.match("^1?(?=281)|^1?(?=713)|^1?(?=832)", str(item)):
... print "%d invalid" % item
... else:
... print "%d valid" % item
...
281 invalid
713 invalid
832 invalid
1281 invalid
1713 invalid
1832 invalid
2281 valid
2713 valid
2832 valid
>>>


  _  

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Support Desk
Sent: Monday, June 30, 2008 10:54 PM
To: python-list@python.org
Subject: regex help



Hello, 
   I am working on a web-app, that querys long distance numbers from a
database of call logs. I am trying to put together a regex that matches any
number that does not start with the following. Basically any number that
does'nt start with:

 

281

713

832 

 

or

 

1281

1713

1832 

 

 

is long distance any, help would be appreciated. 

 

--
http://mail.python.org/mailman/listinfo/python-list

Re: regex help

2008-06-30 Thread Cédric Lucantis
Le Monday 30 June 2008 16:53:54 Support Desk, vous avez écrit :
> Hello,
>I am working on a web-app, that querys long distance numbers from a
> database of call logs. I am trying to put together a regex that matches any
> number that does not start with the following. Basically any number that
> does'nt start with:
>
>
>
> 281
>
> 713
>
> 832
>
>
>
> or
>
>
>
> 1281
>
> 1713
>
> 1832
>
>
>
>
>
> is long distance any, help would be appreciated.

sounds like str.startswith() is enough for your needs:

if not number.startswith(('281', '713', '832', ...)) :
...

-- 
Cédric Lucantis
--
http://mail.python.org/mailman/listinfo/python-list

regex help

2008-06-30 Thread Support Desk
Hello, 
   I am working on a web-app, that querys long distance numbers from a
database of call logs. I am trying to put together a regex that matches any
number that does not start with the following. Basically any number that
does'nt start with:

 

281

713

832 

 

or

 

1281

1713

1832 

 

 

is long distance any, help would be appreciated. 

 

--
http://mail.python.org/mailman/listinfo/python-list

RE: regex help

2008-06-03 Thread Support Desk
That’s it exactly..thx

-Original Message-
From: Reedick, Andrew [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 03, 2008 9:26 AM
To: Support Desk
Subject: RE: regex help

The regex will now skip anything with an '@'in the filename on the
assumption it's already in the correct format.  Uncomment the os.rename line
once you're satisfied you won't mangle anything.


import glob
import os
import re


for filename in glob.glob('*.abook'):
newname = filename
newname = re.sub(r'[EMAIL PROTECTED]', '@domain.com.abook', filename)
if filename != newname:
print "rename", filename, "to", newname
#os.rename(filename, newname)



> -Original Message-
> From: Support Desk [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, June 03, 2008 10:07 AM
> To: Reedick, Andrew
> Subject: RE: regex help
> 
> Thx for the reply,
> 
> I would first have to list all files matching user.abook then rename
> them to
> [EMAIL PROTECTED] something like Im still new to python and haven't
> had
> much experience with the re module
> 
> import os
> import re
> 
> emails = os.popen('ls').readlines()
> for email in emails:
> print email, '-->',
> print re.findall(r'\.abook$', email)
> 
> 
> 
> -Original Message-
> From: Reedick, Andrew [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, June 03, 2008 8:52 AM
> To: Support Desk; python-list@python.org
> Subject: RE: regex help
> 
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED]
> > On Behalf Of Support Desk
> > Sent: Tuesday, June 03, 2008 9:32 AM
> > To: python-list@python.org
> > Subject: regex help
> >
> > I am trying to put together a regular expression that will
> > rename users address books on our server due to a recent
> > change we made.  Users with address books user.abook need
> > to be changed to [EMAIL PROTECTED] I'm having trouble
> > with the regex. Any help would be appreciated.
> 
> 
> import re
> 
> emails = ('foo.abook', 'abook.foo', 'bob.abook.com', 'john.doe.abook')
> 
> for email in emails:
>   print email, '-->',
>   print re.sub(r'\.abook$', '@domain.com.abook', email)
> 
> 
> 
> *
> 
> The information transmitted is intended only for the person or entity
> to
> which it is addressed and may contain confidential, proprietary, and/or
> privileged material. Any review, retransmission, dissemination or other
> use
> of, or taking of any action in reliance upon this information by
> persons or
> entities other than the intended recipient is prohibited. If you
> received
> this in error, please contact the sender and delete the material from
> all
> computers. GA623
> 
> 


--
http://mail.python.org/mailman/listinfo/python-list


RE: regex help

2008-06-03 Thread Reedick, Andrew
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] 
> On Behalf Of Support Desk
> Sent: Tuesday, June 03, 2008 9:32 AM
> To: python-list@python.org
> Subject: regex help
>
> I am trying to put together a regular expression that will 
> rename users address books on our server due to a recent 
> change we made.  Users with address books user.abook need 
> to be changed to [EMAIL PROTECTED] I'm having trouble 
> with the regex. Any help would be appreciated.


import re

emails = ('foo.abook', 'abook.foo', 'bob.abook.com', 'john.doe.abook')

for email in emails:
print email, '-->', 
print re.sub(r'\.abook$', '@domain.com.abook', email)



*

The information transmitted is intended only for the person or entity to which 
it is addressed and may contain confidential, proprietary, and/or privileged 
material. Any review, retransmission, dissemination or other use of, or taking 
of any action in reliance upon this information by persons or entities other 
than the intended recipient is prohibited. If you received this in error, 
please contact the sender and delete the material from all computers. GA623


--
http://mail.python.org/mailman/listinfo/python-list


regex help

2008-06-03 Thread Support Desk
I am trying to put together a regular expression that will rename users
address books on our server due to a recent change we made.  Users with
address books user.abook need to be changed to [EMAIL PROTECTED] I'm
having trouble with the regex. Any help would be appreciated.

 

-Mike

--
http://mail.python.org/mailman/listinfo/python-list

Re: pexpect regex help

2007-02-23 Thread amadain
On Feb 23, 8:53 am, "amadain" <[EMAIL PROTECTED]> wrote:
> On Feb 23, 8:46 am, "amadain" <[EMAIL PROTECTED]> wrote:
>
>
>
> > On Feb 21, 11:15 pm, [EMAIL PROTECTED] wrote:
>
> > > On Feb 21, 6:13 pm, [EMAIL PROTECTED] wrote:
>
> > > > I have apexpectscript to walk through a cisco terminal server and I
> > > > was hoping to get some help with this regex because I really suck at
> > > > it.
>
> > > > This is the code:
>
> > > > index = s.expect(['login: ',pexpect.EOF,pexpect.TIMEOUT])
> > > > if index == 0:
> > > > m = re.search('((#.+\r\n){20,25})(\s.*)',
> > > > s.before)  #<-- MY PROBLEM
> > > > print m.group(3),
> > > > print ' %s %s' % (ip[0], port)
> > > > s.send(chr(30))
> > > > s.sendline('x')
> > > > s.sendline('disco')
> > > > s.sendline('\n')
> > > > elif index == 1:
> > > > print s.before
> > > > elif index == 2:
> > > > print
> > > > print '%s %s FAILED' % (ip[0], port)
> > > > print 'This host may be down or locked on the TS'
> > > > s.send(chr(30))
> > > > s.sendline('x')
> > > > s.sendline('disco')
> > > > s.sendline('\n')
>
> > > > This is attempting to match the hostname of the connected host using
> > > > the output of a motd file which unfortunately is not the same
> > > > everywhere...  It looks like this:
>
> > > > #
> > > > #   This system is the property
> > > > of: #
> > > > #
> > > > #
> > > > #DefNet
> > > > #
> > > > #
> > > > #
> > > > #   Use of this system is for authorized users
> > > > only.#
> > > > #   Individuals using this computer system without authority, or
> > > > in #
> > > > #   excess of their authority, are subject to having all of
> > > > their   #
> > > > #   activities on this system monitored and recorded by
> > > > system  #
> > > > #
> > > > personnel.  #
> > > > #
> > > > #
> > > > #   In the course of monitoring individuals improperly using
> > > > this   #
> > > > #   system, or in the course of system maintenance, the
> > > > activities  #
> > > > #   of authorized users may also be
> > > > monitored.  #
> > > > #
> > > > #
> > > > #   Anyone using this system expressly consents to such
> > > > monitoring  #
> > > > #   and is advised that if such monitoring reveals
> > > > possible #
> > > > #   evidence of criminal activity, system personnel may provide
> > > > the #
> > > > #   evidence of such monitoring to law enforcement
> > > > officials.   #
> > > > #
>
> > > > pa-chi1 console login:
>
> > > > And sometimes it looks like this:
>
> > > > #
> > > > #   This system is the property
> > > > of: #
> > > > #
> > > > #
> > > > #DefNet
> > > > #
> > > > #
> > > > #
> > > > #   Use of this system is for authorized users
> > > > only.#
> > > > #   Individuals using this computer system without authority, or
> > > > in #
> > > > #   excess of their authority, are subject to having all of
> > > > their   #
> > > > #   activities on this system monitored and recorded by
> > > > system  #
> > > > #
> > > > personnel.  #
> > > > #
> > > > #
> > > > #   In the course of monitoring individuals improperly using
> > > > this   #
> > > > #   system, or in the course of system maintenance, the
> > > > activities  #
> > > > #   of authorized users may also be
> > > > monitored.  #
> > > > #
> > > > #
> > > > #   Anyone using this system expressly consents to such
> > > > monitoring  #
> > > > #   and is advised that if such monitoring reveals
> > > > possible #
> > > > #   evidence of criminal activity, system personnel may provide
> > > > the #
> > > > #   evidence of such monitoring to law enforcement
> > > > officials.   #
> > > > #
> > > > pa11-chi1 login:
>
> > > > The second one works and it will print out pa11-chi1  but when there
> > > > is a space or console is in the output it wont print anything or it
> > > > wont match anything...I want to be able to match just the hostname
> > > > and print it out.
>
> > > > Any ideas?
>
> > > > Thanks,
>
> > > > Jonathan
>
> > > It is also posted here more clearly and formatted as it would appear
> > > on the terminal:  http://www.pastebin.ca/366822
>
> > what about using s.before.split("\r\n")[-1]?
>
> > A
>
> 

Re: pexpect regex help

2007-02-23 Thread amadain
On Feb 23, 8:46 am, "amadain" <[EMAIL PROTECTED]> wrote:
> On Feb 21, 11:15 pm, [EMAIL PROTECTED] wrote:
>
>
>
> > On Feb 21, 6:13 pm, [EMAIL PROTECTED] wrote:
>
> > > I have apexpectscript to walk through a cisco terminal server and I
> > > was hoping to get some help with this regex because I really suck at
> > > it.
>
> > > This is the code:
>
> > > index = s.expect(['login: ',pexpect.EOF,pexpect.TIMEOUT])
> > > if index == 0:
> > > m = re.search('((#.+\r\n){20,25})(\s.*)',
> > > s.before)  #<-- MY PROBLEM
> > > print m.group(3),
> > > print ' %s %s' % (ip[0], port)
> > > s.send(chr(30))
> > > s.sendline('x')
> > > s.sendline('disco')
> > > s.sendline('\n')
> > > elif index == 1:
> > > print s.before
> > > elif index == 2:
> > > print
> > > print '%s %s FAILED' % (ip[0], port)
> > > print 'This host may be down or locked on the TS'
> > > s.send(chr(30))
> > > s.sendline('x')
> > > s.sendline('disco')
> > > s.sendline('\n')
>
> > > This is attempting to match the hostname of the connected host using
> > > the output of a motd file which unfortunately is not the same
> > > everywhere...  It looks like this:
>
> > > #
> > > #   This system is the property
> > > of: #
> > > #
> > > #
> > > #DefNet
> > > #
> > > #
> > > #
> > > #   Use of this system is for authorized users
> > > only.#
> > > #   Individuals using this computer system without authority, or
> > > in #
> > > #   excess of their authority, are subject to having all of
> > > their   #
> > > #   activities on this system monitored and recorded by
> > > system  #
> > > #
> > > personnel.  #
> > > #
> > > #
> > > #   In the course of monitoring individuals improperly using
> > > this   #
> > > #   system, or in the course of system maintenance, the
> > > activities  #
> > > #   of authorized users may also be
> > > monitored.  #
> > > #
> > > #
> > > #   Anyone using this system expressly consents to such
> > > monitoring  #
> > > #   and is advised that if such monitoring reveals
> > > possible #
> > > #   evidence of criminal activity, system personnel may provide
> > > the #
> > > #   evidence of such monitoring to law enforcement
> > > officials.   #
> > > #
>
> > > pa-chi1 console login:
>
> > > And sometimes it looks like this:
>
> > > #
> > > #   This system is the property
> > > of: #
> > > #
> > > #
> > > #DefNet
> > > #
> > > #
> > > #
> > > #   Use of this system is for authorized users
> > > only.#
> > > #   Individuals using this computer system without authority, or
> > > in #
> > > #   excess of their authority, are subject to having all of
> > > their   #
> > > #   activities on this system monitored and recorded by
> > > system  #
> > > #
> > > personnel.  #
> > > #
> > > #
> > > #   In the course of monitoring individuals improperly using
> > > this   #
> > > #   system, or in the course of system maintenance, the
> > > activities  #
> > > #   of authorized users may also be
> > > monitored.  #
> > > #
> > > #
> > > #   Anyone using this system expressly consents to such
> > > monitoring  #
> > > #   and is advised that if such monitoring reveals
> > > possible #
> > > #   evidence of criminal activity, system personnel may provide
> > > the #
> > > #   evidence of such monitoring to law enforcement
> > > officials.   #
> > > #
> > > pa11-chi1 login:
>
> > > The second one works and it will print out pa11-chi1  but when there
> > > is a space or console is in the output it wont print anything or it
> > > wont match anything...I want to be able to match just the hostname
> > > and print it out.
>
> > > Any ideas?
>
> > > Thanks,
>
> > > Jonathan
>
> > It is also posted here more clearly and formatted as it would appear
> > on the terminal:  http://www.pastebin.ca/366822
>
> what about using s.before.split("\r\n")[-1]?
>
> A



result=[x for x in s.before.split("\r\n") if x != ""]
print result[-1]

should cover the blank line problem

A

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pexpect regex help

2007-02-23 Thread amadain
On Feb 21, 11:15 pm, [EMAIL PROTECTED] wrote:
> On Feb 21, 6:13 pm, [EMAIL PROTECTED] wrote:
>
>
>
> > I have apexpectscript to walk through a cisco terminal server and I
> > was hoping to get some help with this regex because I really suck at
> > it.
>
> > This is the code:
>
> > index = s.expect(['login: ',pexpect.EOF,pexpect.TIMEOUT])
> > if index == 0:
> > m = re.search('((#.+\r\n){20,25})(\s.*)',
> > s.before)  #<-- MY PROBLEM
> > print m.group(3),
> > print ' %s %s' % (ip[0], port)
> > s.send(chr(30))
> > s.sendline('x')
> > s.sendline('disco')
> > s.sendline('\n')
> > elif index == 1:
> > print s.before
> > elif index == 2:
> > print
> > print '%s %s FAILED' % (ip[0], port)
> > print 'This host may be down or locked on the TS'
> > s.send(chr(30))
> > s.sendline('x')
> > s.sendline('disco')
> > s.sendline('\n')
>
> > This is attempting to match the hostname of the connected host using
> > the output of a motd file which unfortunately is not the same
> > everywhere...  It looks like this:
>
> > #
> > #   This system is the property
> > of: #
> > #
> > #
> > #DefNet
> > #
> > #
> > #
> > #   Use of this system is for authorized users
> > only.#
> > #   Individuals using this computer system without authority, or
> > in #
> > #   excess of their authority, are subject to having all of
> > their   #
> > #   activities on this system monitored and recorded by
> > system  #
> > #
> > personnel.  #
> > #
> > #
> > #   In the course of monitoring individuals improperly using
> > this   #
> > #   system, or in the course of system maintenance, the
> > activities  #
> > #   of authorized users may also be
> > monitored.  #
> > #
> > #
> > #   Anyone using this system expressly consents to such
> > monitoring  #
> > #   and is advised that if such monitoring reveals
> > possible #
> > #   evidence of criminal activity, system personnel may provide
> > the #
> > #   evidence of such monitoring to law enforcement
> > officials.   #
> > #
>
> > pa-chi1 console login:
>
> > And sometimes it looks like this:
>
> > #
> > #   This system is the property
> > of: #
> > #
> > #
> > #DefNet
> > #
> > #
> > #
> > #   Use of this system is for authorized users
> > only.#
> > #   Individuals using this computer system without authority, or
> > in #
> > #   excess of their authority, are subject to having all of
> > their   #
> > #   activities on this system monitored and recorded by
> > system  #
> > #
> > personnel.  #
> > #
> > #
> > #   In the course of monitoring individuals improperly using
> > this   #
> > #   system, or in the course of system maintenance, the
> > activities  #
> > #   of authorized users may also be
> > monitored.  #
> > #
> > #
> > #   Anyone using this system expressly consents to such
> > monitoring  #
> > #   and is advised that if such monitoring reveals
> > possible #
> > #   evidence of criminal activity, system personnel may provide
> > the #
> > #   evidence of such monitoring to law enforcement
> > officials.   #
> > #
> > pa11-chi1 login:
>
> > The second one works and it will print out pa11-chi1  but when there
> > is a space or console is in the output it wont print anything or it
> > wont match anything...I want to be able to match just the hostname
> > and print it out.
>
> > Any ideas?
>
> > Thanks,
>
> > Jonathan
>
> It is also posted here more clearly and formatted as it would appear
> on the terminal:  http://www.pastebin.ca/366822



what about using s.before.split("\r\n")[-1]?

A

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pexpect regex help

2007-02-21 Thread jonathan . sabo
On Feb 21, 6:13 pm, [EMAIL PROTECTED] wrote:
> I have a pexpect script to walk through a cisco terminal server and I
> was hoping to get some help with this regex because I really suck at
> it.
>
> This is the code:
>
> index = s.expect(['login: ', pexpect.EOF, pexpect.TIMEOUT])
> if index == 0:
> m = re.search('((#.+\r\n){20,25})(\s.*)',
> s.before)  #<-- MY PROBLEM
> print m.group(3),
> print ' %s %s' % (ip[0], port)
> s.send(chr(30))
> s.sendline('x')
> s.sendline('disco')
> s.sendline('\n')
> elif index == 1:
> print s.before
> elif index == 2:
> print
> print '%s %s FAILED' % (ip[0], port)
> print 'This host may be down or locked on the TS'
> s.send(chr(30))
> s.sendline('x')
> s.sendline('disco')
> s.sendline('\n')
>
> This is attempting to match the hostname of the connected host using
> the output of a motd file which unfortunately is not the same
> everywhere...  It looks like this:
>
> #
> #   This system is the property
> of: #
> #
> #
> #DefNet
> #
> #
> #
> #   Use of this system is for authorized users
> only.#
> #   Individuals using this computer system without authority, or
> in #
> #   excess of their authority, are subject to having all of
> their   #
> #   activities on this system monitored and recorded by
> system  #
> #
> personnel.  #
> #
> #
> #   In the course of monitoring individuals improperly using
> this   #
> #   system, or in the course of system maintenance, the
> activities  #
> #   of authorized users may also be
> monitored.  #
> #
> #
> #   Anyone using this system expressly consents to such
> monitoring  #
> #   and is advised that if such monitoring reveals
> possible #
> #   evidence of criminal activity, system personnel may provide
> the #
> #   evidence of such monitoring to law enforcement
> officials.   #
> #
>
> pa-chi1 console login:
>
> And sometimes it looks like this:
>
> #
> #   This system is the property
> of: #
> #
> #
> #DefNet
> #
> #
> #
> #   Use of this system is for authorized users
> only.#
> #   Individuals using this computer system without authority, or
> in #
> #   excess of their authority, are subject to having all of
> their   #
> #   activities on this system monitored and recorded by
> system  #
> #
> personnel.  #
> #
> #
> #   In the course of monitoring individuals improperly using
> this   #
> #   system, or in the course of system maintenance, the
> activities  #
> #   of authorized users may also be
> monitored.  #
> #
> #
> #   Anyone using this system expressly consents to such
> monitoring  #
> #   and is advised that if such monitoring reveals
> possible #
> #   evidence of criminal activity, system personnel may provide
> the #
> #   evidence of such monitoring to law enforcement
> officials.   #
> #
> pa11-chi1 login:
>
> The second one works and it will print out pa11-chi1  but when there
> is a space or console is in the output it wont print anything or it
> wont match anything...I want to be able to match just the hostname
> and print it out.
>
> Any ideas?
>
> Thanks,
>
> Jonathan



It is also posted here more clearly and formatted as it would appear
on the terminal:  http://www.pastebin.ca/366822

-- 
http://mail.python.org/mailman/listinfo/python-list


pexpect regex help

2007-02-21 Thread jonathan . sabo
I have a pexpect script to walk through a cisco terminal server and I
was hoping to get some help with this regex because I really suck at
it.

This is the code:

index = s.expect(['login: ', pexpect.EOF, pexpect.TIMEOUT])
if index == 0:
m = re.search('((#.+\r\n){20,25})(\s.*)',
s.before)  #<-- MY PROBLEM
print m.group(3),
print ' %s %s' % (ip[0], port)
s.send(chr(30))
s.sendline('x')
s.sendline('disco')
s.sendline('\n')
elif index == 1:
print s.before
elif index == 2:
print
print '%s %s FAILED' % (ip[0], port)
print 'This host may be down or locked on the TS'
s.send(chr(30))
s.sendline('x')
s.sendline('disco')
s.sendline('\n')

This is attempting to match the hostname of the connected host using
the output of a motd file which unfortunately is not the same
everywhere...  It looks like this:

#
#   This system is the property
of: #
#
#
#DefNet
#
#
#
#   Use of this system is for authorized users
only.#
#   Individuals using this computer system without authority, or
in #
#   excess of their authority, are subject to having all of
their   #
#   activities on this system monitored and recorded by
system  #
#
personnel.  #
#
#
#   In the course of monitoring individuals improperly using
this   #
#   system, or in the course of system maintenance, the
activities  #
#   of authorized users may also be
monitored.  #
#
#
#   Anyone using this system expressly consents to such
monitoring  #
#   and is advised that if such monitoring reveals
possible #
#   evidence of criminal activity, system personnel may provide
the #
#   evidence of such monitoring to law enforcement
officials.   #
#

pa-chi1 console login:

And sometimes it looks like this:

#
#   This system is the property
of: #
#
#
#DefNet
#
#
#
#   Use of this system is for authorized users
only.#
#   Individuals using this computer system without authority, or
in #
#   excess of their authority, are subject to having all of
their   #
#   activities on this system monitored and recorded by
system  #
#
personnel.  #
#
#
#   In the course of monitoring individuals improperly using
this   #
#   system, or in the course of system maintenance, the
activities  #
#   of authorized users may also be
monitored.  #
#
#
#   Anyone using this system expressly consents to such
monitoring  #
#   and is advised that if such monitoring reveals
possible #
#   evidence of criminal activity, system personnel may provide
the #
#   evidence of such monitoring to law enforcement
officials.   #
#
pa11-chi1 login:

The second one works and it will print out pa11-chi1  but when there
is a space or console is in the output it wont print anything or it
wont match anything...I want to be able to match just the hostname
and print it out.

Any ideas?

Thanks,

Jonathan

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help...pretty please?

2006-08-23 Thread vbgunz
MooMaster Wrote:
> I'm trying to develop a little script that does some string
> manipulation. I have some few hundred strings that currently look like
> this:
> cond(a,b,c)
> and I want them to look like this:
> cond(c,a,b)

I zoned out on your question and created a very simple flipper.
Although it will not solve your problem maybe someone looking for a
simpler version may find it useful as a starting point. I hope it
proves useful. I'll post my simple flipper here:

s = 'cond(1,savv(grave(3,2,1),y,x),maxx(c,b,a),0)'
def argFlipper(s):
''' take a string of arguments and reverse'em e.g.
>>> cond(1,savv(grave(3,2,1),y,x),maxx(c,b,a),0)
 -> cond(0,maxx(a,b,c),savv(x,y,grave(1,2,3)),1)

'''

count = 0
keyholder = {}
while 1:
if s.find('(') > 0:
count += 1
value = '%sph' + '%d' % count
tempstring = [x for x in s]
startindex = s.rfind('(')
limitindex = s.find(')', startindex)
argtarget = s[startindex + 1:limitindex].split(',')
argreversed = ','.join(reversed(argtarget))
keyholder[value] = '(' + argreversed + ')'
tempstring[startindex:limitindex + 1] = value
s = ''.join(tempstring)
else:
while count and keyholder:
s = s.replace(value, keyholder[value])
count -= 1
value = '%sph' + '%d' % count
return s  

print argFlipper(s)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help...pretty please?

2006-08-23 Thread Paul McGuire
"MooMaster" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> I'm trying to develop a little script that does some string
> manipulation. I have some few hundred strings that currently look like
> this:
>
> cond(a,b,c)
>
> and I want them to look like this:
>
> cond(c,a,b)



Pyparsing makes this a fairly tractable problem.  The hardest part is
defining the valid contents of a relational and arithmetic expression, which
may be found within the arguments of your cond(a,b,c) constructs.

Not guaranteeing this 100%, but it did convert your pathologically nested
example on the first try.

-- Paul

--
from pyparsing import *

ident = ~Literal("cond") + Word(alphas)
number = Combine(Optional("-") + Word(nums) + Optional("." + Word(nums)))

arithExpr = Forward()
funcCall = ident+"("+delimitedList(arithExpr)+")"
operand = number | funcCall | ident
binop = oneOf("+ - * /")
arithExpr << ( ( operand + ZeroOrMore( binop + operand ) ) | ("(" +
arithExpr + ")" ) )
relop = oneOf("< > == <= >= != <>")

condDef = Forward()
simpleCondExpr = arithExpr + ZeroOrMore( relop + arithExpr ) | condDef
multCondExpr = simpleCondExpr + "*" + arithExpr
condExpr = Forward()
condExpr << ( simpleCondExpr | multCondExpr | "(" + condExpr + ")" )

def reorderArgs(t):
return "cond(" + ",".join(["".join(t.arg3), "".join(t.arg1),
"".join(t.arg2)]) + ")"

condDef << ( Literal("cond") + "(" + Group(condExpr).setResultsName("arg1")
+ "," +
 Group(condExpr).setResultsName("arg2")
+ "," +
 Group(condExpr).setResultsName("arg3")
+ ")" ).setParseAction( reorderArgs )

tests = [
"cond(a,b,c)",
"cond(1>2,b,c)",
"cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+floa
t(a))",
"cond(a,b,(abs(c) >= d))",
"cond(0,cond(c,cond(e,cond(g,h,(a",condExpr.transformString(t)
--
Prints:
cond(a,b,c) -> cond(c,a,b)
cond(1>2,b,c) -> cond(c,1>2,b)
cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float
(a)) ->
cond(f,-1,1)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float
(a))
cond(a,b,(abs(c) >= d)) -> cond((abs(c)>=d),a,b)
cond(0,cond(c,cond(e,cond(g,h,(a
cond((a<1),0,cond((ahttp://mail.python.org/mailman/listinfo/python-list


Re: Regex help...pretty please?

2006-08-23 Thread Simon Forman
MooMaster wrote:
> I'm trying to develop a little script that does some string
> manipulation. I have some few hundred strings that currently look like
> this:
>
> cond(a,b,c)
>
> and I want them to look like this:
>
> cond(c,a,b)
>
> but it gets a little more complicated because the conds themselves may
> have conds within, like the following:
>
> cond(0,cond(c,cond(e,cond(g,h,(a
> What I want to do in this case is move the last parameter to the front
> and then work backwards all the way out (if you're thinking recursion
> too, I'm vindicated) so that it ends up looking like this:
>
> cond((a<1), 0, cond((a
> futhermore, the conds may be multiplied by an expression, such as the
> following:
>
> cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))
>
> Here, all I want to do is switch the parameters of the conds without
> touching the expression, like so:
>
> cond(f,-1,1)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))
>
> So that's the gist of my problem statement. I immediately thought that
> regular expressions would provide an elegant solution. I would go
> through the string by conds, stripping them & the () off, until I got
> to the lowest level, then move the parameters and work backwards. That
> thought process became this:
> -CODE
> import re
>
> def swap(left, middle, right):
> left = left.replace("(", "")
> right = right.replace(")", "")
> temp = left
> left = right
> right = temp
> temp = middle
> middle = right
> right = temp
> whole = 'cond(' + left + ',' + middle + ',' + right + ')'
> return whole
>
> def condReplacer(string):
>  #regex = re.compile(r'cond\(.*,.*,.+\)')
>  regex = re.compile(r'cond\(.*,.*,.+?\)')
>  if not regex.search(string):
>   print "whole string is: " + string
>   [left, middle, right] = string.split(',')
>   right = right.replace('\'', ' ')
>   string = swap(left.strip(), middle.strip(), right.strip())
>   print "the new string is:" + string
>   return string
>  else:
>   more_conds = regex.search(string)
>   temp_string = more_conds.group()
>   firstParen = temp_string.find('(')
>   temp_string = temp_string[firstParen:]
>   print "there are more conditionals!" + temp_string
>   condReplacer(temp_string)
> def lineReader(file):
>  for line in file:
>  regex = r'cond\(.*,.*,.+\)?'
>  if re.search(regex,line,re.DOTALL):
> condReplacer(line)
>
> if __name__ == "__main__":
>input_file = open("only_conds2.txt", 'r')
>lineReader(input_file)
> -CODE
>
> I think my problem lies in my regular expression... If I use the one
> commented out I do a greedy search and in my test case where I have a
> conditional * an expression, I grab the expression too, like so:
>
> INPUT:
>
> cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))
> OUTPUT:
> whole string is:
> (-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float
> (a))
> the new string
> is:cond(f*((float(e*(2**4+(float(d*8+(float(c*4+(float(b*2+float
> (a,-1,1)
>
> when all I really want to do is grab the part associated with the cond.
> But if I do a non-greedy search I avoid that problem but stop too early
> when I have an expression like this:
>
> INPUT:
> cond(a,b,(abs(c) >= d))
> OUTPUT:
> whole string is: (a,b,(abs(c)
> the new string is:cond((abs(c,a,b)
>
> Can anyone help me with the regular expression? Is this even the best
> approach to take? Anyone have any thoughts?
>
> Thanks for your time!

You're gonna want a parser for this.  pyparsing or spark would suffice.
 However, since it looks like your source strings are valid python you
could get some traction out of the tokenize standard library module:

from tokenize import generate_tokens
from StringIO import StringIO

s =
'cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))'

for t in generate_tokens(StringIO(s).readline):
print t[1],


Prints:
cond ( - 1 , 1 , f ) * ( ( float ( e ) * ( 2 ** 4 ) ) + ( float ( d ) *
8 ) + ( float ( c ) * 4 ) + ( float ( b ) * 2 ) + float ( a ) )

Once you've got that far the rest should be easy.  :)

Peace,
~Simon

http://pyparsing.wikispaces.com/
http://pages.cpsc.ucalgary.ca/~aycock/spark/
http://docs.python.org/lib/module-tokenize.html

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help...pretty please?

2006-08-23 Thread Tim Chase
> cond(a,b,c)
> 
> and I want them to look like this:
> 
> cond(c,a,b)
> 
> but it gets a little more complicated because the conds themselves may
> have conds within, like the following:
> 
> cond(0,cond(c,cond(e,cond(g,h,(ahttp://mail.python.org/mailman/listinfo/python-list


Regex help...pretty please?

2006-08-23 Thread MooMaster
I'm trying to develop a little script that does some string
manipulation. I have some few hundred strings that currently look like
this:

cond(a,b,c)

and I want them to look like this:

cond(c,a,b)

but it gets a little more complicated because the conds themselves may
have conds within, like the following:

cond(0,cond(c,cond(e,cond(g,h,(a= d))
OUTPUT:
whole string is: (a,b,(abs(c)
the new string is:cond((abs(c,a,b)

Can anyone help me with the regular expression? Is this even the best
approach to take? Anyone have any thoughts? 

Thanks for your time!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2006-05-16 Thread johnzenger
Why not use split instead of regular expressions?

>>> ln = "3232 23  9 9 - 9 9 - 911 
>>> 110"
>>> ln.split()
['32', '32', '23', '9', '9', '-', '9', '9', '-', '9', '11', '1', '10']

Much simpler, yes?  Just find the line that comes after a line that
begins with "TIGER," split it, and pick the number you want out of the
resulting list.

Lance Hoffmeyer wrote:
> I have the following table and I am trying to match percentage the 2nd column 
> on the 2nd Tiger line (9.0).
>
> I have tried both of the following.  I expected both to match but neither 
> did?  Is there a modifier
> I am missing?  What changes do I need to make these match?  I need to keep 
> the structure of the regex
> the same.
>
> TIGER.append(re.search("TIGER\s{10}.*?(?:(\d{1,3}\.\d)\s+){2}", 
> target_table).group(1))
> TIGER.append(re.search("^TIGER.*?(?:(\d{1,3}\.\d)\s+){2}", 
> target_table).group(1))
>
>
> BASE - TOTAL TIGER 268   268173 95   101 -   10157 -  
>   5778 276   268   19276230 21
>
> DOG 7979 44 3531 -3117 -  
>   1725 124795524 75  1
>   29.5  29.5   25.4   36.8  30.7 -  30.7  29.8 -  
> 29.8  32.1  50.0  31.6  29.5  28.6  31.6   32.64.8
>
> CAT 4646 28 1820 -20 7 -  
>714 -14463214 39  4
>   17.2  17.2   16.2   18.9  19.8 -  19.8  12.3 -  
> 12.3  17.9 -  18.4  17.2  16.7  18.4   17.0   19.0
>
> LAMB3232 23  910 -10 8 -  
>812 -12322012 28  1
>   11.9  11.9   13.39.5   9.9 -   9.9  14.0 -  
> 14.0  15.4 -  15.8  11.9  10.4  15.8   12.24.8
>
> TRIPOD  3232 23  9 9 - 9 9 -  
>911 110322210 28  3
>   11.9  11.9   13.39.5   8.9 -   8.9  15.8 -  
> 15.8  14.1  50.0  13.2  11.9  11.5  13.2   12.2   14.3
>
> TIGER   2424 16  8 5 - 510 -  
>   10 7 - 72417 7 18  2
>9.0   9.09.28.4   5.0 -   5.0  17.5 -  
> 17.5   9.0 -   9.2   9.0   8.9   9.27.89.5

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2006-05-16 Thread Peter Otten
Lance Hoffmeyer wrote:

> I have the following table and I am trying to match percentage the 2nd
> column on the 2nd Tiger line (9.0).
> 
> I have tried both of the following.  I expected both to match but neither
> did?  Is there a modifier
> I am missing?  What changes do I need to make these match?  I need to keep
> the structure of the regex the same.
> 
> TIGER.append(re.search("TIGER\s{10}.*?(?:(\d{1,3}\.\d)\s+){2}",
> target_table).group(1))
> TIGER.append(re.search("^TIGER.*?(?:(\d{1,3}\.\d)\s+){2}",
> target_table).group(1))

You can try the re.DOTALL flag (prepend the regex string with "(?s)"), but
I'd go with something really simple:

instream = iter(target_table.splitlines()) # or: instream = open(datafile)
for line in instream:
if line.startswith("TIGER"):
value = instream.next().split()[1] # or ...[0]? they are both '9.0'
TIGER.append(value)
break

Peter

-- 
http://mail.python.org/mailman/listinfo/python-list


regex help

2006-05-16 Thread Lance Hoffmeyer
I have the following table and I am trying to match percentage the 2nd column 
on the 2nd Tiger line (9.0).

I have tried both of the following.  I expected both to match but neither did?  
Is there a modifier
I am missing?  What changes do I need to make these match?  I need to keep the 
structure of the regex
the same.

TIGER.append(re.search("TIGER\s{10}.*?(?:(\d{1,3}\.\d)\s+){2}", 
target_table).group(1))
TIGER.append(re.search("^TIGER.*?(?:(\d{1,3}\.\d)\s+){2}", 
target_table).group(1))


BASE - TOTAL TIGER 268   268173 95   101 -   10157 -
5778 276   268   19276230 21

DOG 7979 44 3531 -3117 -
1725 124795524 75  1
  29.5  29.5   25.4   36.8  30.7 -  30.7  29.8 -  
29.8  32.1  50.0  31.6  29.5  28.6  31.6   32.64.8

CAT 4646 28 1820 -20 7 -
 714 -14463214 39  4
  17.2  17.2   16.2   18.9  19.8 -  19.8  12.3 -  
12.3  17.9 -  18.4  17.2  16.7  18.4   17.0   19.0

LAMB3232 23  910 -10 8 -
 812 -12322012 28  1
  11.9  11.9   13.39.5   9.9 -   9.9  14.0 -  
14.0  15.4 -  15.8  11.9  10.4  15.8   12.24.8

TRIPOD  3232 23  9 9 - 9 9 -
 911 110322210 28  3
  11.9  11.9   13.39.5   8.9 -   8.9  15.8 -  
15.8  14.1  50.0  13.2  11.9  11.5  13.2   12.2   14.3

TIGER   2424 16  8 5 - 510 -
10 7 - 72417 7 18  2
   9.0   9.09.28.4   5.0 -   5.0  17.5 -  
17.5   9.0 -   9.2   9.0   8.9   9.27.89.5
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread Michael Spencer
rh0dium wrote:
> Michael Spencer wrote:
>>   >>> def parse(source):
>>   ... source = source.splitlines()
>>   ... original, rest = source[0], "\n".join(source[1:])
>>   ... return original, rest_eval(get_tokens(rest))
> 
> This is a very clean and elegant way to separate them - Very nice!!  I
> like this alot - I will definately use this in the future!!
> 
>> Cheers
>>
>> Michael
> 
On reflection, this simplifies further (to 9 lines), at least for the test 
cases 
your provide, which don't involve any nested parens:

  >>> import cStringIO, tokenize
  ...
  >>> def get_tokens2(source):
  ... src = cStringIO.StringIO(source).readline
  ... src = tokenize.generate_tokens(src)
  ... return [token[1][1:-1] for token in src if token[0] == 
tokenize.STRING]
  ...
  >>> def parse2(source):
  ... source = source.splitlines()
  ... original, rest = source[0], "\n".join(source[1:])
  ... return original, get_tokens2(rest)
  ...
  >>>

This matches your main function for the three tests where main works...

  >>> for source in sources[:3]: #matches your main function where it works
  ... assert parse2(source) == main(source)
  ...
  Original someFunction
  Orig someFunction Results ['test', 'foo']
  Original someFunction
  Orig someFunction Results ['test  foo']
  Original someFunction
  Orig someFunction Results ['test', 'test1', 'foo aasdfasdf', 'newline', 
'test2']

...and handles the case where main fails (I think correctly, although I'm not 
entirely sure what your desired output is in this case:
  >>> parse2(sources[3])
  ('getVersion()', ['@(#)$CDS: icfb.exe version 5.1.0 05/22/2005 23:36 
(cicln01) 
$'])
  >>>

If you really do need nested parens, then you'd need the slightly longer 
version 
I posted earlier

Cheers

Michael

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread Paul McGuire
"rh0dium" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
>
> Paul McGuire wrote:
>
> > ident = Combine( Word(alpha,alphanums+"_") + LPAR + RPAR )
>
> This will only work for a word with a parentheses ( ie.  somefunction()
> )
>
> > If you *really* want everything on the first line to be the ident, try
this:
> >
> > ident = Word(alpha,alphanums+"_") + restOfLine
> > or
> > ident = Combine( Word(alpha,alphanums+"_") + restOfLine )
>
> This nicely grabs the "\r"..  How can I get around it?
>
> > Now the next step is to assign field names to the results:
> >
> > dataFormat = ident.setResultsName("ident") + ( dblQuotedString |
> > quoteList ).setResultsName("contents")
>
> This is super cool!!
>
> So let's take this for example
>
> test= 'fprintf( outFile "leSetInstSelectable( t )\n" )\r\n ("test"
> "test1" "foo aasdfasdf"\r\n "newline" "test2")\r\n'
>
> Now I want the ident to pull out 'fprintf( outFile
> "leSetInstSelectable( t )\n" )' so I tried to do this?
>
> ident = Forward()
> ident << Group( Word(alphas,alphanums) + LPAR + ZeroOrMore(
> dblQuotedString | ident | Word(alphas,alphanums) ) + RPAR)
>
> Borrowing from the example listed previously.  But it bombs out cause
> it wants a ")"  but it has one..  Forward() ROCKS!!
>
> Also how does it know to do this for just the first line?  It would
> seem that this will work for every line - No?
>
This works for me:

test4 = r"""fprintf( outFile "leSetInstSelectable( t )\n" )
("test"
"test1" "foo aasdfasdf"
"newline" "test2")
"""

ident = Forward()
ident << Group( Word(alphas,alphanums) + LPAR + ZeroOrMore(
dblQuotedString | ident | Word(alphas,alphanums) ) + RPAR)
dataFormat = ident + ( dblQuotedString | quoteList )

print dataFormat.parseString(test4)

Prints:
[['fprintf', '(', 'outFile', '"leSetInstSelectable( t )\\n"', ')'],
['"test"', '"test1"', '"foo aasdfasdf"', '"newline"', '"test2"']]


1. Is there supposed to be a real line break in the string
"leSetInstSelectable( t )\n", or just a slash-n at the end?  pyparsing
quoted strings do not accept multiline quotes, but they do accept escaped
characters such as "\t" "\n", etc.  That is, to pyparsing:

"\n this is a valid \t \n string"

"this is not
a valid string"

Part of the confusion is that your examples include explicit \r\n
characters.  I'm assuming this is to reflect what you see when listing out
the Python variable containing the string.  (Are you opening a text file
with "rb" to read in binary?  Try opening with just "r", and this may
resolve your \r\n problems.)

2. If restOfLine is still giving you \r's at the end, you can redefine
restOfLine to not include them, or to include and suppress them.  Or (this
is easier) define a parse action for restOfLine that strips trailing \r's:

def stripTrailingCRs(st,loc,toks):
try:
  if toks[0][-1] == '\r':
return toks[0][:-1]
except:
  pass

restOfLine.setParseAction( stripTrailingCRs )


3.  How does it know to only do it for the first line?  Presumably you told
it to do so.  pyparsing's parseString method starts at the beginning of the
input string, and matches expressions until it finds a mismatch, or runs out
of expressions to match - even if there is more input string to process,
pyparsing does not continue.  To search through the whole file looking for
idents, try using scanString which returns a generator; for each match, the
generator gives a tuple containing:
- tokens - the matched tokens
- start - the start location of the match
- end - the end location of the match

If your input file consists *only* of these constructs, you can also just
expand dataFormat.parseString to OneOrMore(dataFormat).parseString.


-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread rh0dium

Michael Spencer wrote:
>   >>> def parse(source):
>   ... source = source.splitlines()
>   ... original, rest = source[0], "\n".join(source[1:])
>   ... return original, rest_eval(get_tokens(rest))

This is a very clean and elegant way to separate them - Very nice!!  I
like this alot - I will definately use this in the future!!

> 
> Cheers
> 
> Michael

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread rh0dium

Paul McGuire wrote:

> ident = Combine( Word(alpha,alphanums+"_") + LPAR + RPAR )

This will only work for a word with a parentheses ( ie.  somefunction()
)

> If you *really* want everything on the first line to be the ident, try this:
>
> ident = Word(alpha,alphanums+"_") + restOfLine
> or
> ident = Combine( Word(alpha,alphanums+"_") + restOfLine )

This nicely grabs the "\r"..  How can I get around it?

> Now the next step is to assign field names to the results:
>
> dataFormat = ident.setResultsName("ident") + ( dblQuotedString |
> quoteList ).setResultsName("contents")

This is super cool!!

So let's take this for example

test= 'fprintf( outFile "leSetInstSelectable( t )\n" )\r\n ("test"
"test1" "foo aasdfasdf"\r\n "newline" "test2")\r\n'

Now I want the ident to pull out 'fprintf( outFile
"leSetInstSelectable( t )\n" )' so I tried to do this?

ident = Forward()
ident << Group( Word(alphas,alphanums) + LPAR + ZeroOrMore(
dblQuotedString | ident | Word(alphas,alphanums) ) + RPAR)

Borrowing from the example listed previously.  But it bombs out cause
it wants a ")"  but it has one..  Forward() ROCKS!!

Also how does it know to do this for just the first line?  It would
seem that this will work for every line - No?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread Michael Spencer
rh0dium wrote:
> Hi all,
> 
> I am using python to drive another tool using pexpect.  The values
> which I get back I would like to automatically put into a list if there
> is more than one return value. They provide me a way to see that the
> data is in set by parenthesising it.
> 
...

> 
> CAN SOMEONE PLEASE CLEAN THIS UP?
> 

How about using the Python tokenizer rather than re:

  >>> import cStringIO, tokenize
  ...
  >>> def get_tokens(source):
  ... allowed_tokens = (tokenize.STRING, tokenize.OP)
  ... src = cStringIO.StringIO(source).readline
  ... src = tokenize.generate_tokens(src)
  ... return (token[1] for token in src if token[0] in allowed_tokens)
  ...
  >>> def rest_eval(tokens):
  ... output = []
  ... for token in tokens:
  ... if token == "(":
  ... output.append(rest_eval(tokens))
  ... elif token == ")":
  ... return output
  ... else:
  ... output.append(token[1:-1])
  ... return output
  ...
  >>> def parse(source):
  ... source = source.splitlines()
  ... original, rest = source[0], "\n".join(source[1:])
  ... return original, rest_eval(get_tokens(rest))
  ...
  >>> sources = [
  ... 'someFunction\r\n "test" "foo"\r\n',
  ... 'someFunction\r\n "test  foo"\r\n',
  ... 'getVersion()\r\n"@(#)$CDS: icfb.exe version 5.1.0 05/22/2005 23:36 
(cicln01) $"\r\n',
  ... 'someFunction\r\n ("test" "test1" "foo aasdfasdf"\r\n "newline" 
"test2")\r\n']
  >>>
  >>> for data in sources: parse(data)
  ...
  ('someFunction', ['test', 'foo'])
  ('someFunction', ['test  foo'])
  ('getVersion()', ['@(#)$CDS: icfb.exe version 5.1.0 05/22/2005 23:36 
(cicln01) 
$'])
  ('someFunction', [['test', 'test1', 'foo aasdfasdf', 'newline', 'test2']])
  >>>

Cheers

Michael

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread Paul McGuire
"rh0dium" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
>
> Paul McGuire wrote:
> > -- Paul
> > (Download pyparsing at http://pyparsing.sourceforge.net.)
>
> Done.
>
>
> Hey this is pretty cool!  I have one small problem that I don't know
> how to resolve.  I want the entire contents (whatever it is) of line 1
> to be the ident.  Now digging into the code showed a method line,
> lineno and LineStart LineEnd.  I tried to use all three but it didn't
> work for a few reasons ( line = type issues, lineno - I needed the data
> and could't get it to work, LineStart/End - I think it matches every
> line and I need the scope to line 1 )
>
> So here is my rendition of the code - But this is REALLY slick..
>
> I think the problem is the parens on line one
>
> def main(data=None):
>
> LPAR = Literal("(")
> RPAR = Literal(")")
>
> # assume function identifiers must start with alphas, followed by
> zero or more
> # alphas, numbers, or '_' - expand this defn as needed
> ident = LineStart + LineEnd
>
> # define a list as one or more quoted strings, inside ()'s - we'll
> tackle nesting
> # in a minute
> quoteList = Group( LPAR.suppress() + OneOrMore(dblQuotedString) +
> RPAR.suppress())
>
> # define format of a line of data - don't bother with \n's or \r's,
>
> # pyparsing just skips 'em
> dataFormat = ident + ( dblQuotedString | quoteList )
>
> return dataFormat.parseString(data)
>
>
> # General run..
> if __name__ == '__main__':
>
>
> # data = 'someFunction\r\n "test" "foo"\r\n'
> # data = 'someFunction\r\n "test  foo"\r\n'
> data = 'getVersion()\r\n"@(#)$CDS: icfb.exe version 5.1.0
> 05/22/2005 23:36 (cicln01) $"\r\n'
> # data = 'someFunction\r\n ("test" "test1" "foo aasdfasdf"\r\n
> "newline" "test2")\r\n'
>
> foo = main(data)
>
> print foo
>

LineStart() + LineEnd() will only match an empty line.


If you describe in words what you want ident to be, it may be more natural
to translate to pyparsing.

"A word starting with an alpha, followed by zero or more alphas, numbers, or
'_'s, with a trailing pair of parens"

ident = Word(alpha,alphanums+"_") + LPAR + RPAR


If you want the ident all combined into a single token, use:

ident = Combine( Word(alpha,alphanums+"_") + LPAR + RPAR )


LineStart and LineEnd are geared more for line-oriented or
whitespace-sensitive grammars.  Your example doesn't really need them, I
don't think.

If you *really* want everything on the first line to be the ident, try this:

ident = Word(alpha,alphanums+"_") + restOfLine
or
ident = Combine( Word(alpha,alphanums+"_") + restOfLine )


Now the next step is to assign field names to the results:

dataFormat = ident.setResultsName("ident") + ( dblQuotedString |
quoteList ).setResultsName("contents")

test = "blah blah test string"

results = dataFormat.parseString(test)
print results.ident, results.contents

I'm glad pyparsing is working out for you!  There should be a number of
examples that ship with pyparsing that may give you some more ideas on how
to proceed from here.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread rh0dium

Paul McGuire wrote:
> -- Paul
> (Download pyparsing at http://pyparsing.sourceforge.net.)

Done.


Hey this is pretty cool!  I have one small problem that I don't know
how to resolve.  I want the entire contents (whatever it is) of line 1
to be the ident.  Now digging into the code showed a method line,
lineno and LineStart LineEnd.  I tried to use all three but it didn't
work for a few reasons ( line = type issues, lineno - I needed the data
and could't get it to work, LineStart/End - I think it matches every
line and I need the scope to line 1 )

So here is my rendition of the code - But this is REALLY slick..

I think the problem is the parens on line one

def main(data=None):

LPAR = Literal("(")
RPAR = Literal(")")

# assume function identifiers must start with alphas, followed by
zero or more
# alphas, numbers, or '_' - expand this defn as needed
ident = LineStart + LineEnd

# define a list as one or more quoted strings, inside ()'s - we'll
tackle nesting
# in a minute
quoteList = Group( LPAR.suppress() + OneOrMore(dblQuotedString) +
RPAR.suppress())

# define format of a line of data - don't bother with \n's or \r's,

# pyparsing just skips 'em
dataFormat = ident + ( dblQuotedString | quoteList )

return dataFormat.parseString(data)


# General run..
if __name__ == '__main__':


# data = 'someFunction\r\n "test" "foo"\r\n'
# data = 'someFunction\r\n "test  foo"\r\n'
data = 'getVersion()\r\n"@(#)$CDS: icfb.exe version 5.1.0
05/22/2005 23:36 (cicln01) $"\r\n'
# data = 'someFunction\r\n ("test" "test1" "foo aasdfasdf"\r\n
"newline" "test2")\r\n'

foo = main(data)

print foo

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread Paul McGuire
"rh0dium" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Hi all,
>
> I am using python to drive another tool using pexpect.  The values
> which I get back I would like to automatically put into a list if there
> is more than one return value. They provide me a way to see that the
> data is in set by parenthesising it.
>


Well, you asked for regex help, but a pyparsing rendition may be easier to
read and maintain.

-- Paul
(Download pyparsing at http://pyparsing.sourceforge.net.)


# test data strings
test1 = """somefunction()
"@(#)$CDS: icfb.exe version 5.1.0 05/22/2005 23:36 (cicln01) $"
"""

test2 = """somefunction()
("." "~"
"/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile"
"foo")
"""

test3 = """somefunctionWithNestedlist()
("." "~"
"/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile"
("Hey!"
"this is a nested"
"list")
"foo")
"""

"""
So if you're still reading this I want to parse out data.  Here are the
rules...
- Line 1 ALWAYS is the calling function whatever is there (except
"\r\n") should be kept as "original"
- Anything may occur inside the quotations - I don't care what's in
there per se but it must be maintained.
- Parenthesed items I want to be pushed into a list.  I haven't run
into a case where you have nested paren's but that not to say it won't
happen...
"""

from pyparsing import Literal, Word, alphas, alphanums, \
dblQuotedString, OneOrMore, Group, Forward

LPAR = Literal("(")
RPAR = Literal(")")

# assume function identifiers must start with alphas, followed by zero or
more
# alphas, numbers, or '_' - expand this defn as needed
ident = Word(alphas,alphanums+"_")

# define a list as one or more quoted strings, inside ()'s - we'll tackle
nesting
# in a minute
quoteList = Group( LPAR.suppress() +
   OneOrMore(dblQuotedString) +
   RPAR.suppress() )

# define format of a line of data - don't bother with \n's or \r's,
# pyparsing just skips 'em
dataFormat = ident + LPAR + RPAR + ( dblQuotedString | quoteList )

def test(t):
print dataFormat.parseString(t)

print "Parse flat lists"
test(test1)
test(test2)

# modifications for nested lists
quoteList = Forward()
quoteList << Group( LPAR.suppress() +
   OneOrMore(dblQuotedString | quoteList) +
   RPAR.suppress() )
dataFormat = ident + LPAR + RPAR + ( dblQuotedString | quoteList )

print
print "Parse using nested lists"
test(test1)
test(test2)
test(test3)

Parsing results:
Parse flat lists
['somefunction', '(', ')', '"@(#)$CDS: icfb.exe version 5.1.0 05/22/2005
23:36 (cicln01) $"']
['somefunction', '(', ')', ['"."', '"~"',
'"/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile"', '"foo"']]

Parse using nested lists
['somefunction', '(', ')', '"@(#)$CDS: icfb.exe version 5.1.0 05/22/2005
23:36 (cicln01) $"']
['somefunction', '(', ')', ['"."', '"~"',
'"/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile"', '"foo"']]
['somefunctionWithNestedlist', '(', ')', ['"."', '"~"',
'"/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile"', ['"Hey!"',
'"this is a nested"', '"list"'], '"foo"']]



-- 
http://mail.python.org/mailman/listinfo/python-list


Regex help needed

2006-01-10 Thread rh0dium
Hi all,

I am using python to drive another tool using pexpect.  The values
which I get back I would like to automatically put into a list if there
is more than one return value. They provide me a way to see that the
data is in set by parenthesising it.

This is all generated as I said using pexpect - Here is how I use it..
 child = pexpect.spawn( _buildCadenceExe(), timeout=timeout)
 child.sendline("somefunction()")
 child.expect("> ")
 data=child.before

Given this data can take on several shapes:

Single return value -- THIS IS THE ONE I CAN'T GET TO WORK..
data = 'somefunction()\r\n"@(#)$CDS: icfb.exe version 5.1.0 05/22/2005
23:36 (cicln01) $"\r\n'

Multiple return value
data = 'somefunction()\r\n("." "~"
"/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile")\r\n'

It may take up several lines...
data = 'somefunction()\r\n("." "~"
\r\n"/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile"\r\n"foo")\r\n'

So if you're still reading this I want to parse out data.  Here are the
rules...
- Line 1 ALWAYS is the calling function whatever is there (except
"\r\n") should be kept as "original"
- Anything may occur inside the quotations - I don't care what's in
there per se but it must be maintained.
- Parenthesed items I want to be pushed into a list.  I haven't run
into a case where you have nested paren's but that not to say it won't
happen...

So here is my code..  Pardon my hack job..

import os,re

def main(data=None):

# Get rid of the annoying \r's
dat=data.split("\r")
data="".join(dat)

# Remove the first line - that is the original call
dat = data.split("\n")
original=dat[0]
del dat[0]

print "Original", original
# Now join all of the remaining lines
retl="".join(dat)

# self.logger.debug("Original = \'%s\'" % original)

try:
# Get rid of the parenthesis
parmatcher = re.compile( r'\(([^()]*)\)' )
parmatch = parmatcher.search(retl)

# Get rid of the first and last quotes
qrmatcher = re.compile( r'\"([^()]*)\"' )
qrmatch = qrmatcher.search(parmatch.group(1))

# Split the items
qmatch=re.compile(r'\"\s+\"')
results = qmatch.split(qrmatch.group(1))
except:
qrmatcher = re.compile( r'\"([^()]*)\"' )
qrmatch = qrmatcher.search(retl)

# Split the items
qmatch=re.compile(r'\"\s+\"')
results = qmatch.split(qrmatch.group(1))

print "Orig", original, "Results", results
return original,results


# General run..
if __name__ == '__main__':


# data = 'someFunction\r\n "test" "foo"\r\n'
# data = 'someFunction\r\n "test  foo"\r\n'
data = 'getVersion()\r\n"@(#)$CDS: icfb.exe version 5.1.0
05/22/2005 23:36 (cicln01) $"\r\n'
# data = 'someFunction\r\n ("test" "test1" "foo aasdfasdf"\r\n
"newline" "test2")\r\n'

main(data)

CAN SOMEONE PLEASE CLEAN THIS UP?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2005-08-11 Thread Shantanoo Mahajan
John Machin wrote:

> jeff sacksteder wrote:
>> Regex questions seem to be rather resistant to googling.
>> 
>> My regex currently looks like - 'FOO:.*\n\n'
>> 
>> The chunk of text I am attempting to locate is a line beginning with
>> "FOO:", followed by an unknown number of lines, terminating with a
>> blank line. Clearly the ".*" phrase does not match the single newlines
>> occuring inside the block.
>> 
>> Suggestions are warmly welcomed.
> 
> I suggest you read the manual first:
> """
> "."
> (Dot.) In the default mode, this matches any character except a newline.
> If the DOTALL flag has been specified, this matches any character
> including a newline.
> """

I think you need to write you own function. Something like:

for x in open('_file_name'):
 if x == 'Foo:\n':
 flag=1
 if x == '\n':
 flag=0
 if flag == 1:
 print x


if the line is 'FOO: _some_more_data_' you may try, 
if x.startswith('Foo:'):
instead of
if x == 'Foo:\n':

Hope this help.

Shantanoo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2005-08-11 Thread gene tani
when *I* google

http://www.awaretek.com/tutorials.html#regular
http://en.wikibooks.org/wiki/Programming:Python_Strings
http://www.regexlib.com/Default.aspx

http://docs.python.org/lib/module-re.html

http://diveintopython.org/regular_expressions/index.html#re.intro
http://www.amk.ca/python/howto/regex/
http://gnosis.cx/publish/programming/regular_expressions.html

also look into ActiveStateKomodo reg ex debugger ( I think WIng IDE has
it too

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2005-08-11 Thread John Machin
jeff sacksteder wrote:
> Regex questions seem to be rather resistant to googling.
> 
> My regex currently looks like - 'FOO:.*\n\n'
> 
> The chunk of text I am attempting to locate is a line beginning with
> "FOO:", followed by an unknown number of lines, terminating with a
> blank line. Clearly the ".*" phrase does not match the single newlines
> occuring inside the block.
> 
> Suggestions are warmly welcomed.

I suggest you read the manual first:
"""
"."
(Dot.) In the default mode, this matches any character except a newline. 
If the DOTALL flag has been specified, this matches any character 
including a newline.
"""
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2005-08-10 Thread Christopher Subich
jeff sacksteder wrote:
> Regex questions seem to be rather resistant to googling.
> 
> My regex currently looks like - 'FOO:.*\n\n'
> 
> The chunk of text I am attempting to locate is a line beginning with
> "FOO:", followed by an unknown number of lines, terminating with a
> blank line. Clearly the ".*" phrase does not match the single newlines
> occuring inside the block.

Include the re.DOTALL flag when you compile the regular expression.
-- 
http://mail.python.org/mailman/listinfo/python-list


regex help

2005-08-10 Thread jeff sacksteder
Regex questions seem to be rather resistant to googling.

My regex currently looks like - 'FOO:.*\n\n'

The chunk of text I am attempting to locate is a line beginning with
"FOO:", followed by an unknown number of lines, terminating with a
blank line. Clearly the ".*" phrase does not match the single newlines
occuring inside the block.

Suggestions are warmly welcomed.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Kent Johnson
Steven Bethard wrote:
Kent Johnson wrote:
for line in raw_data:
if line.startswith('RelevantInfo1'):
info1 = raw_data.next().strip()
elif line.startswith('RelevantInfo2'):
info2 = raw_data.next().strip()
elif line.startswith('RelevantInfo3'):
info3 = raw_data.next().strip()
scores.setdefault(info1, {}).setdefault(info3, []).append(info2)
info1 = info2 = info3 = None

Very pretty. =)  I have to say, I hadn't ever used iterators this way 
before, that is, calling their next method from within a for-loop.  I 
like it. =)
I confess I have a nagging suspicion that someone who actually knows something about CPython 
internals will tell me why it's a bad idea...but it sure is handy!

Thanks for opening my mind. ;)
My pleasure :-)
Kent
--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Steven Bethard
Kent Johnson wrote:
for line in raw_data:
if line.startswith('RelevantInfo1'):
info1 = raw_data.next().strip()
elif line.startswith('RelevantInfo2'):
info2 = raw_data.next().strip()
elif line.startswith('RelevantInfo3'):
info3 = raw_data.next().strip()
scores.setdefault(info1, {}).setdefault(info3, []).append(info2)
info1 = info2 = info3 = None
Very pretty. =)  I have to say, I hadn't ever used iterators this way 
before, that is, calling their next method from within a for-loop.  I 
like it. =)

Thanks for opening my mind. ;)
STeVe
--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 3 Mar 2005 12:26:37 -0800, James Stroud <[EMAIL PROTECTED]> wrote:
> Have a look at "martel", part of biopython. The world of bioinformatics is 
> filled with files with structure like this.
>
> http://www.biopython.org/docs/api/public/Martel-module.html
>
> James

Thanks for the link. Steve and Kent have provided me with nice solutions but
I will check this out anyways for future referenced.

Take care.

-- 
You may easily play a joke on a man who likes to argue -- agree with him.
-- Ed Howe
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 03 Mar 2005 16:25:39 -0500, Kent Johnson <[EMAIL PROTECTED]> wrote:
> Here is another attempt. I'm still not sure I understand what form you want 
> the data in. I made a 
> dict -> dict -> list structure so if you lookup e.g. scores['10/11/04']['60'] 
> you get a list of all 
> the RelevantInfo2 values for Relevant1='10/11/04' and Relevant2='60'.
>
> The parser is a simple-minded state machine that will misbehave if the input 
> does not have entries 
> in the order Relevant1, Relevant2, Relevant3 (with as many intervening lines 
> as you like).
>
> All three values are available when Relevant3 is detected so you could do 
> something else with them 
> if you want.
>
> HTH
> Kent
>
> import cStringIO
>
> raw_data = '''Gibberish
> 53
> MoreGarbage
[mass snippage]
> 60
> Lalala'''
> raw_data = cStringIO.StringIO(raw_data)
>
> scores = {}
> info1 = info2 = info3 = None
>
> for line in raw_data:
>  if line.startswith('RelevantInfo1'):
>  info1 = raw_data.next().strip()
>  elif line.startswith('RelevantInfo2'):
>  info2 = raw_data.next().strip()
>  elif line.startswith('RelevantInfo3'):
>  info3 = raw_data.next().strip()
>  scores.setdefault(info1, {}).setdefault(info3, []).append(info2)
>  info1 = info2 = info3 = None
>
> print scores
> print scores['10/11/04']['60']
> print scores['10/10/04']['23']
>
> ## prints:
> {'10/10/04': {'44': ['33'], '23': ['22', '22']}, '10/11/04': {'60': ['45']}}
> ['45']
> ['22', '22']

Thank you so much. Your solution and Steve's both give me what I'm looking
for. I appreciate both of your incredibly quick replies!

Take care.

-- 
You worry too much about your job.  Stop it.  You are not paid enough to worry.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 03 Mar 2005 13:45:31 -0700, Steven Bethard <[EMAIL PROTECTED]> wrote:
>
> I think if you use the non-greedy .*? instead of the greedy .*, you'll 
> get this behavior.  For example:
>
> py> s = """\
> ... Gibberish
> ... 53
> ... MoreGarbage
> [snip a whole bunch of stuff]
> ... RelevantInfo3
> ... 60
> ... Lalala
> ... """
> py> import re
> py> m = re.compile(r"""^RelevantInfo1\n([^\n]*)
> ....*?
> ...^RelevantInfo2\n([^\n]*)
> ....*?
> ...^RelevantInfo3\n([^\n]*)""",
> ...re.DOTALL | re.MULTILINE | re.VERBOSE)
> py> score = {}
> py> for info1, info2, info3 in m.findall(s):
> ... score.setdefault(info1, {})[info3] = info2
> ...
> py> score
> {'10/10/04': {'44': '33', '23': '22'}, '10/11/04': {'60': '45'}}
>
> If you might have multiple info2 values for the same (info1, info3) 
> pair, you can try something like:
>
> py> score = {}
> py> for info1, info2, info3 in m.findall(s):
> ... score.setdefault(info1, {}).setdefault(info3, []).append(info2)
> ...
> py> score
> {'10/10/04': {'44': ['33'], '23': ['22']}, '10/11/04': {'60': ['45']}}
>
Perfect! Thank you so much. This is the behaviour I'm looking for. I will
fiddle around with this some more tonight but the rest should be okay.

Take care.

-- 
Of course power tools and alcohol don't mix.  Everyone knows power
tools aren't soluble in alcohol...
-- Crazy Nigel
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Kent Johnson
Here is another attempt. I'm still not sure I understand what form you want the data in. I made a 
dict -> dict -> list structure so if you lookup e.g. scores['10/11/04']['60'] you get a list of all 
the RelevantInfo2 values for Relevant1='10/11/04' and Relevant2='60'.

The parser is a simple-minded state machine that will misbehave if the input does not have entries 
in the order Relevant1, Relevant2, Relevant3 (with as many intervening lines as you like).

All three values are available when Relevant3 is detected so you could do something else with them 
if you want.

HTH
Kent
import cStringIO
raw_data = '''Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34
Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34
SecondSetofGarbage
2423
YouGetThePicture
342342
RelevantInfo1
10/10/04
HoHum
343
MoreStuffNotNeeded
232
RelevantInfo2
33
RelevantInfo3
44
sdfsdf
RelevantInfo1
10/11/04
InsertBoringFillerHere
43234
Stuff
MoreStuff
RelevantInfo2
45
ExcitingIsntIt
324234
RelevantInfo3
60
Lalala'''
raw_data = cStringIO.StringIO(raw_data)
scores = {}
info1 = info2 = info3 = None
for line in raw_data:
if line.startswith('RelevantInfo1'):
info1 = raw_data.next().strip()
elif line.startswith('RelevantInfo2'):
info2 = raw_data.next().strip()
elif line.startswith('RelevantInfo3'):
info3 = raw_data.next().strip()
scores.setdefault(info1, {}).setdefault(info3, []).append(info2)
info1 = info2 = info3 = None
print scores
print scores['10/11/04']['60']
print scores['10/10/04']['23']
## prints:
{'10/10/04': {'44': ['33'], '23': ['22', '22']}, '10/11/04': {'60': ['45']}}
['45']
['22', '22']
--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Steven Bethard
Yatima wrote:
On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard <[EMAIL PROTECTED]> wrote:
A possible solution, using the re module:
py> s = """\
... Gibberish
... 53
... MoreGarbage
... 12
... RelevantInfo1
... 10/10/04
... NothingImportant
... ThisDoesNotMatter
... 44
... RelevantInfo2
... 22
... BlahBlah
... 343
... RelevantInfo3
... 23
... Hubris
... Crap
... 34
... """
py> import re
py> m = re.compile(r"""^RelevantInfo1\n([^\n]*)
....*
...^RelevantInfo2\n([^\n]*)
....*
...^RelevantInfo3\n([^\n]*)""",
...re.DOTALL | re.MULTILINE | re.VERBOSE)
py> score = {}
py> for info1, info2, info3 in m.findall(s):
... score.setdefault(info1, {})[info3] = info2
...
py> score
{'10/10/04': {'23': '22'}}
Note that I use DOTALL to allow .* to cross line boundaries, MULTILINE 
to have ^ apply at the start of each line, and VERBOSE to allow me to 
write the re in a more readable form.

If I didn't get your dict update quite right, hopefully you can see how 
to fix it!

Thanks! That was very helpful. Unfortunately, I wasn't completely clear when
describing the problem. Is there anyway to extract multiple scores from the
same file and from multiple files
I think if you use the non-greedy .*? instead of the greedy .*, you'll 
get this behavior.  For example:

py> s = """\
... Gibberish
... 53
... MoreGarbage
[snip a whole bunch of stuff]
... RelevantInfo3
... 60
... Lalala
... """
py> import re
py> m = re.compile(r"""^RelevantInfo1\n([^\n]*)
....*?
...^RelevantInfo2\n([^\n]*)
....*?
...^RelevantInfo3\n([^\n]*)""",
...re.DOTALL | re.MULTILINE | re.VERBOSE)
py> score = {}
py> for info1, info2, info3 in m.findall(s):
... score.setdefault(info1, {})[info3] = info2
...
py> score
{'10/10/04': {'44': '33', '23': '22'}, '10/11/04': {'60': '45'}}
If you might have multiple info2 values for the same (info1, info3) 
pair, you can try something like:

py> score = {}
py> for info1, info2, info3 in m.findall(s):
... score.setdefault(info1, {}).setdefault(info3, []).append(info2)
...
py> score
{'10/10/04': {'44': ['33'], '23': ['22']}, '10/11/04': {'60': ['45']}}
HTH,
STeVe
--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread James Stroud
I found the original paper for Martel:

http://www.dalkescientific.com/Martel/ipc9/

On Thursday 03 March 2005 12:26 pm, James Stroud wrote:
> Have a look at "martel", part of biopython. The world of bioinformatics is
> filled with files with structure like this.
>
> http://www.biopython.org/docs/api/public/Martel-module.html
>
> James
>
> On Thursday 03 March 2005 12:03 pm, Yatima wrote:

-- 
James Stroud, Ph.D.
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 03 Mar 2005 07:14:50 -0500, Kent Johnson <[EMAIL PROTECTED]> wrote:
>
> Here is a way to create a list of [RelevantInfo, value] pairs:
> import cStringIO
>
> raw_data = '''Gibberish
> 53
> MoreGarbage
> 12
> RelevantInfo1
> 10/10/04
> NothingImportant
> ThisDoesNotMatter
> 44
> RelevantInfo2
> 22
> BlahBlah
> 343
> RelevantInfo3
> 23
> Hubris
> Crap
> 34'''
> raw_data = cStringIO.StringIO(raw_data)
>
> data = []
> for line in raw_data:
>  if line.startswith('RelevantInfo'):
>  key = line.strip()
>  value = raw_data.next().strip()
>  data.append([key, value])
>
> print data
>

Thank you. This isn't exactly what I'm looking for (I wasn't clear in
describing the problem -- please see my reply to Steve for a, hopefully,
better explanation) but it does give me a few ideas.
>
>> 
>> Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2
>
> I'm not sure what you mean by this. Do you want to build a Score dictionary 
> as well?

Sure... Uhhh.. I think. Okay, what I want is some kind of awk-like
associative array because the raw data files will have repeats for certain
field vaues such that there would be, for example, multiple RelevantInfo2's
and RelevantInfo3's for the same RelevantInfo1 (i.e. on the same date). To
make matters more exciting, there will be multiple RelevantInfo1's (dates)
for the same RelevantInfo3 (e.g. a subject ID). RelevantInfo2 will be the
value for all unique combinations of RelevantInfo1 and RelevantInfo3. There
will be multiple occurrences of these fields in the same file (original data
sample was not very good for this reason) and multiple files as well. The
interesting three fields will always be repeated in the same order although
the amount of irrelevant data in between may vary. So:

RelevantInfo1
10/10/04

RelevantInfo2
12

RelevantInfo3
43

RelevantInfo1
10/10/04<- The same as the first occurrence of RelevantInfo1

RelevantInfo2
22

RelevantInfo3
25

RelevantInfo1
10/11/04

RelevantInfo2
34

RelevantInfo3
28

RelevantInfo1
10/12/04

RelevantInfo2
98

RelevantInfo3
25<- The same as the second occurrence of RelevantInfo3
...

Sorry for the long and tedious "data" example.

There will be missing values for some combinations of RelevantInfo1 and
RelevantInfo3 so hopefully that won't be an issue.

Thanks again for your reply.

Take care.

-- 
"I figured there was this holocaust, right, and the only ones left alive were
 Donna Reed, Ozzie and Harriet, and the Cleavers."
-- Wil Wheaton explains why everyone in "Star Trek: The Next Generation" 
is so nice
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread James Stroud
Have a look at "martel", part of biopython. The world of bioinformatics is 
filled with files with structure like this.

http://www.biopython.org/docs/api/public/Martel-module.html

James

On Thursday 03 March 2005 12:03 pm, Yatima wrote:
> On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard 
<[EMAIL PROTECTED]> wrote:
> > A possible solution, using the re module:
> >
> > py> s = """\
> > ... Gibberish
> > ... 53
> > ... MoreGarbage
> > ... 12
> > ... RelevantInfo1
> > ... 10/10/04
> > ... NothingImportant
> > ... ThisDoesNotMatter
> > ... 44
> > ... RelevantInfo2
> > ... 22
> > ... BlahBlah
> > ... 343
> > ... RelevantInfo3
> > ... 23
> > ... Hubris
> > ... Crap
> > ... 34
> > ... """
> > py> import re
> > py> m = re.compile(r"""^RelevantInfo1\n([^\n]*)
> > ....*
> > ...^RelevantInfo2\n([^\n]*)
> > ....*
> > ...^RelevantInfo3\n([^\n]*)""",
> > ...re.DOTALL | re.MULTILINE | re.VERBOSE)
> > py> score = {}
> > py> for info1, info2, info3 in m.findall(s):
> > ... score.setdefault(info1, {})[info3] = info2
> > ...
> > py> score
> > {'10/10/04': {'23': '22'}}
> >
> > Note that I use DOTALL to allow .* to cross line boundaries, MULTILINE
> > to have ^ apply at the start of each line, and VERBOSE to allow me to
> > write the re in a more readable form.
> >
> > If I didn't get your dict update quite right, hopefully you can see how
> > to fix it!
>
> Thanks! That was very helpful. Unfortunately, I wasn't completely clear
> when describing the problem. Is there anyway to extract multiple scores
> from the same file and from multiple files (I will probably use the
> "fileinput" module to deal with multiple files). So, if I've got say:
>
> Gibberish
> 53
> MoreGarbage
> 12
> RelevantInfo1
> 10/10/04
> NothingImportant
> ThisDoesNotMatter
> 44
> RelevantInfo2
> 22
> BlahBlah
> 343
> RelevantInfo3
> 23
> Hubris
> Crap
> 34
>
> SecondSetofGarbage
> 2423
> YouGetThePicture
> 342342
> RelevantInfo1
> 10/10/04
> HoHum
> 343
> MoreStuffNotNeeded
> 232
> RelevantInfo2
> 33
> RelevantInfo3
> 44
> sdfsdf
> RelevantInfo1
> 10/11/04
> InsertBoringFillerHere
> 43234
> Stuff
> MoreStuff
> RelevantInfo2
> 45
> ExcitingIsntIt
> 324234
> RelevantInfo3
> 60
> Lalala
>
> Sorry for the long and painful example input. Notice that the first two
> "RelevantInfo1" fields have the same info but that the RelevantInfo2 and
> RelevantInfo3 fields have different info. Also, there will be cases where
> RelevantInfo3 might be the same with a different RelevantInfo2. What, I'm
> hoping for is something along then lines of being able to organize it like
> so (don't worry about the format of the output -- I'll deal with that
> later; "RelevantInfo" shortened to "Info" for readability):
>
> Info1[0],   Info[1],Info[2]
> ... Info3[0]Info2[Info1[0],Info3[0]]Info2[Info1[1],Info3[1]]...
> Info3[1]Info2[Info1[0],Info3[1]]...
> Info3[2]Info2[Info1[0],Info3[2]]...
> ...
>
> I don't really care if it's a list, dictionary, array etc.
>
> Thanks again for your help. The multiline option in the re module is very
> useful.
>
> Take care.
>
> --
> Clarke's Conclusion:
>   Never let your sense of morals interfere with doing the right thing.

-- 
James Stroud, Ph.D.
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard <[EMAIL PROTECTED]> wrote:
>
> A possible solution, using the re module:
>
> py> s = """\
> ... Gibberish
> ... 53
> ... MoreGarbage
> ... 12
> ... RelevantInfo1
> ... 10/10/04
> ... NothingImportant
> ... ThisDoesNotMatter
> ... 44
> ... RelevantInfo2
> ... 22
> ... BlahBlah
> ... 343
> ... RelevantInfo3
> ... 23
> ... Hubris
> ... Crap
> ... 34
> ... """
> py> import re
> py> m = re.compile(r"""^RelevantInfo1\n([^\n]*)
> ....*
> ...^RelevantInfo2\n([^\n]*)
> ....*
> ...^RelevantInfo3\n([^\n]*)""",
> ...re.DOTALL | re.MULTILINE | re.VERBOSE)
> py> score = {}
> py> for info1, info2, info3 in m.findall(s):
> ... score.setdefault(info1, {})[info3] = info2
> ...
> py> score
> {'10/10/04': {'23': '22'}}
>
> Note that I use DOTALL to allow .* to cross line boundaries, MULTILINE 
> to have ^ apply at the start of each line, and VERBOSE to allow me to 
> write the re in a more readable form.
>
> If I didn't get your dict update quite right, hopefully you can see how 
> to fix it!

Thanks! That was very helpful. Unfortunately, I wasn't completely clear when
describing the problem. Is there anyway to extract multiple scores from the
same file and from multiple files (I will probably use the "fileinput"
module to deal with multiple files). So, if I've got say:

Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34

SecondSetofGarbage
2423
YouGetThePicture
342342
RelevantInfo1
10/10/04
HoHum
343
MoreStuffNotNeeded
232
RelevantInfo2
33
RelevantInfo3
44
sdfsdf
RelevantInfo1
10/11/04
InsertBoringFillerHere
43234
Stuff
MoreStuff
RelevantInfo2
45
ExcitingIsntIt
324234
RelevantInfo3
60
Lalala

Sorry for the long and painful example input. Notice that the first two
"RelevantInfo1" fields have the same info but that the RelevantInfo2 and
RelevantInfo3 fields have different info. Also, there will be cases where
RelevantInfo3 might be the same with a different RelevantInfo2. What, I'm
hoping for is something along then lines of being able to organize it like
so (don't worry about the format of the output -- I'll deal with that
later; "RelevantInfo" shortened to "Info" for readability):

Info1[0],   Info[1],Info[2] ...
Info3[0]Info2[Info1[0],Info3[0]]Info2[Info1[1],Info3[1]]...
Info3[1]Info2[Info1[0],Info3[1]]...
Info3[2]Info2[Info1[0],Info3[2]]...
...

I don't really care if it's a list, dictionary, array etc. 

Thanks again for your help. The multiline option in the re module is very
useful. 

Take care.

-- 
Clarke's Conclusion:
Never let your sense of morals interfere with doing the right thing.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Steven Bethard
Yatima wrote:
Hey Folks,
I've got some info in a bunch of files that kind of looks like so:
Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34
and so on...
Anyhow, these "fields" repeat several times in a given file (number of
repetitions varies from file to file). The number on the line following the
"RelevantInfo" lines is really what I'm after. Ideally, I would like to have
something like so:
RelevantInfo1 = 10/10/04 # The variable name isn't actually important
RelevantInfo3 = 23   # it's just there to illustrate what info I'm
 # trying to snag.
Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2
A possible solution, using the re module:
py> s = """\
... Gibberish
... 53
... MoreGarbage
... 12
... RelevantInfo1
... 10/10/04
... NothingImportant
... ThisDoesNotMatter
... 44
... RelevantInfo2
... 22
... BlahBlah
... 343
... RelevantInfo3
... 23
... Hubris
... Crap
... 34
... """
py> import re
py> m = re.compile(r"""^RelevantInfo1\n([^\n]*)
....*
...^RelevantInfo2\n([^\n]*)
....*
...^RelevantInfo3\n([^\n]*)""",
...re.DOTALL | re.MULTILINE | re.VERBOSE)
py> score = {}
py> for info1, info2, info3 in m.findall(s):
... score.setdefault(info1, {})[info3] = info2
...
py> score
{'10/10/04': {'23': '22'}}
Note that I use DOTALL to allow .* to cross line boundaries, MULTILINE 
to have ^ apply at the start of each line, and VERBOSE to allow me to 
write the re in a more readable form.

If I didn't get your dict update quite right, hopefully you can see how 
to fix it!

HTH,
STeVe
--
http://mail.python.org/mailman/listinfo/python-list


  1   2   >