Re: regex help

2015-03-13 Thread Thomas 'PointedEars' Lahn
Larry Martell wrote:

 I need to remove all trailing zeros to the right of the decimal point,
 but leave one zero if it's whole number. For example, if I have this:
 
 
14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196
 
 I want to end up with:
 
 
14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196
 
 I have a regex to remove the zeros:
 
 '0+[,$]', ''
 
 But I can't figure out how to get the 5. to be 5.0.
 I've been messing with the negative lookbehind, but I haven't found
 one that works for this.

First of all, I find it unlikely that you really want to solve your problem 
with regular expressions.  Google “X-Y problem”.

Second, if you must use regular expressions, the most simple approach is to 
use backreferences.

Third, you need to show the relevant (Python) code.

http://www.catb.org/~esr/faqs/smart-questions.html

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: regex help

2015-03-13 Thread Tim Chase
On 2015-03-13 12:05, Larry Martell wrote:
 I need to remove all trailing zeros to the right of the decimal
 point, but leave one zero if it's whole number. 
 
 But I can't figure out how to get the 5. to be 5.0.
 I've been messing with the negative lookbehind, but I haven't found
 one that works for this.

You can do it with string-ops, or you can resort to regexp.
Personally, I like the clarity of the string-ops version, but use
what suits you.

-tkc

import re
input = [
'14S',
'5.',
'4.5686274500',
'3.7272727272727271',
'3.3947368421052630',
'5.7307692307692308',
'5.7547169811320753',
'4.9423076923076925',
'5.7884615384615383',
'5.13725490196',
]

output = [
'14S',
'5.0',
'4.56862745',
'3.7272727272727271',
'3.394736842105263',
'5.7307692307692308',
'5.7547169811320753',
'4.9423076923076925',
'5.7884615384615383',
'5.13725490196',
]


def fn1(s):
if '.' in s:
s = s.rstrip('0')
if s.endswith('.'):
s += '0'
return s

def fn2(s):
return re.sub(r'(\.\d+?)0+$', r'\1', s)

for fn in (fn1, fn2):
for i, o in zip(input, output):
v = fn(i)
print %s: %s - %s [%s] % (v == o, i, v, o)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: regex help

2015-03-13 Thread MRAB

On 2015-03-13 16:05, Larry Martell wrote:

I need to remove all trailing zeros to the right of the decimal point,
but leave one zero if it's whole number. For example, if I have this:

14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196

I want to end up with:

14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196

I have a regex to remove the zeros:

'0+[,$]', ''

But I can't figure out how to get the 5. to be 5.0.
I've been messing with the negative lookbehind, but I haven't found
one that works for this.


Search: (\.\d+?)0+\b
Replace: \1

which is:

re.sub(r'(\.\d+?)0+\b', r'\1', string)

--
https://mail.python.org/mailman/listinfo/python-list


Re: regex help

2015-03-13 Thread Larry Martell
On Fri, Mar 13, 2015 at 1:29 PM, MRAB pyt...@mrabarnett.plus.com wrote:
 On 2015-03-13 16:05, Larry Martell wrote:

 I need to remove all trailing zeros to the right of the decimal point,
 but leave one zero if it's whole number. For example, if I have this:


 14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196

 I want to end up with:


 14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196

 I have a regex to remove the zeros:

 '0+[,$]', ''

 But I can't figure out how to get the 5. to be 5.0.
 I've been messing with the negative lookbehind, but I haven't found
 one that works for this.

 Search: (\.\d+?)0+\b
 Replace: \1

 which is:

 re.sub(r'(\.\d+?)0+\b', r'\1', string)

Thanks! That works perfectly.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: regex help

2015-03-13 Thread Cameron Simpson

On 13Mar2015 12:05, Larry Martell larry.mart...@gmail.com wrote:

I need to remove all trailing zeros to the right of the decimal point,
but leave one zero if it's whole number. For example, if I have this:

14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196

I want to end up with:

14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196

I have a regex to remove the zeros:

'0+[,$]', ''

But I can't figure out how to get the 5. to be 5.0.
I've been messing with the negative lookbehind, but I haven't found
one that works for this.


Leaving aside the suggested non-greedy match, you can rephrase this: strip 
trailing zeroes _after_ the first decimal digit. Then you can consider a number 
to be:


 digits
 point
 any digit
 other digits to be right-zero stripped

so:

 (\d+\.\d)(\d*[1-9])?0*\b

and keep .group(1) and .group(2) from the match.

Another way of considering the problem.

Or you could two step it. Strip all trailing zeroes. If the result ends in a 
dot, add a single zero.


Cheers,
Cameron Simpson c...@zip.com.au

C'mon. Take the plunge. By the time you go through rehab the first time,
you'll be surrounded by the most interesting people, and if it takes years
off of your life, don't sweat it. They'll be the last ones anyway.
   - Vinnie Jordan, alt.peeves
--
https://mail.python.org/mailman/listinfo/python-list


Re: regex help

2015-03-13 Thread Steven D'Aprano
Larry Martell wrote:

 I need to remove all trailing zeros to the right of the decimal point,
 but leave one zero if it's whole number. 


def strip_zero(s):
if '.' not in s:
return s
s = s.rstrip('0')
if s.endswith('.'):
s += '0'
return s


And in use:

py strip_zero('-10.2500')
'-10.25'
py strip_zero('123000')
'123000'
py strip_zero('123000.')
'123000.0'


It doesn't support exponential format:

py strip_zero('1.230e3')
'1.230e3'

because it isn't clear what you intend to do under those circumstances.


-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


regex help

2015-03-13 Thread Larry Martell
I need to remove all trailing zeros to the right of the decimal point,
but leave one zero if it's whole number. For example, if I have this:

14S,5.,4.5686274500,3.7272727272727271,3.3947368421052630,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196

I want to end up with:

14S,5.0,4.56862745,3.7272727272727271,3.394736842105263,5.7307692307692308,5.7547169811320753,4.9423076923076925,5.7884615384615383,5.13725490196

I have a regex to remove the zeros:

'0+[,$]', ''

But I can't figure out how to get the 5. to be 5.0.
I've been messing with the negative lookbehind, but I haven't found
one that works for this.
-- 
https://mail.python.org/mailman/listinfo/python-list


Newbie needs regex help

2010-12-06 Thread Dan M
I'm getting bogged down with backslash escaping.

I have some text files containing characters with the 8th bit set. These 
characters are encoded one of two ways: either =hh or \xhh, where h 
represents a hex digit, and \x is a literal backslash followed by a 
lower-case x.

Catching the first case with a regex is simple. But when I try to write a 
regex to catch the second case, I mess up the escaping.

I took at look at http://docs.python.org/howto/regex.html, especially the 
section titled The Backslash Plague. I started out trying :

d...@dan:~/personal/usenet$ python
Python 2.7 (r27:82500, Nov 15 2010, 12:10:23) 
[GCC 4.3.2] on linux2
Type help, copyright, credits or license for more information.
 import re
 r = re.compile('x([0-9a-fA-F]{2})')
 a = This \xef file \xef has \x20 a bunch \xa0 of \xb0 crap \xc0 
characters \xefn \xeft.
 m = r.search(a)
 m

No match.

I then followed the advice of the above-mentioned document, and expressed 
the regex as a raw string:

 r = re.compile(r'\\x([0-9a-fA-F]{2})')
 r.search(a)

Still no match.

I'm obviously missing something. I spent a fair bit of time playing with 
this over the weekend, and I got nowhere. Now it's time to ask for help. 
What am I doing wrong here?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs regex help

2010-12-06 Thread Mel
Dan M wrote:

 I'm getting bogged down with backslash escaping.
 
 I have some text files containing characters with the 8th bit set. These
 characters are encoded one of two ways: either =hh or \xhh, where h
 represents a hex digit, and \x is a literal backslash followed by a
 lower-case x.
 
 Catching the first case with a regex is simple. But when I try to write a
 regex to catch the second case, I mess up the escaping.
 
 I took at look at http://docs.python.org/howto/regex.html, especially the
 section titled The Backslash Plague. I started out trying :
 
 d...@dan:~/personal/usenet$ python
 Python 2.7 (r27:82500, Nov 15 2010, 12:10:23)
 [GCC 4.3.2] on linux2
 Type help, copyright, credits or license for more information.
 import re
 r = re.compile('x([0-9a-fA-F]{2})')
 a = This \xef file \xef has \x20 a bunch \xa0 of \xb0 crap \xc0
 characters \xefn \xeft.
 m = r.search(a)
 m
 
 No match.
 
 I then followed the advice of the above-mentioned document, and expressed
 the regex as a raw string:
 
 r = re.compile(r'\\x([0-9a-fA-F]{2})')
 r.search(a)
 
 Still no match.
 
 I'm obviously missing something. I spent a fair bit of time playing with
 this over the weekend, and I got nowhere. Now it's time to ask for help.
 What am I doing wrong here?

What you're missing is that string `a` doesn't actually contain four-
character sequences like '\', 'x', 'a', 'a' .  It contains single characters 
that you encode in string literals as '\xaa' and so on.  You might do better 
with

p1 = r'([\x80-\xff])'
r1 = re.compile (p1)
m = r1.search (a)

I get at least an _sre.SRE_Match object at 0xb749a6e0 when I try this.

Mel.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs regex help

2010-12-06 Thread Alain Ketterlin
Dan M d...@catfolks.net writes:

 I took at look at http://docs.python.org/howto/regex.html, especially the 
 section titled The Backslash Plague. I started out trying :

 import re
 r = re.compile('x([0-9a-fA-F]{2})')
 a = This \xef file \xef has \x20 a bunch \xa0 of \xb0 crap \xc0 

The backslash trickery applies to string literals also, not only regexps.

Your string does not have the value you think it has. Double each
backslash (or make your string raw) and you'll get what you expect.

-- Alain.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs regex help

2010-12-06 Thread Dan M
On Mon, 06 Dec 2010 10:29:41 -0500, Mel wrote:

 What you're missing is that string `a` doesn't actually contain four-
 character sequences like '\', 'x', 'a', 'a' .  It contains single
 characters that you encode in string literals as '\xaa' and so on.  You
 might do better with
 
 p1 = r'([\x80-\xff])'
 r1 = re.compile (p1)
 m = r1.search (a)
 
 I get at least an _sre.SRE_Match object at 0xb749a6e0 when I try this.
 
   Mel.

That's what I had initially assumed was the case, but looking at the data 
files with a hex editor showed me that I do indeed have four-character 
sequences. That's what makes this such as interesting task!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs regex help

2010-12-06 Thread Dan M
On Mon, 06 Dec 2010 16:34:56 +0100, Alain Ketterlin wrote:

 Dan M d...@catfolks.net writes:
 
 I took at look at http://docs.python.org/howto/regex.html, especially
 the section titled The Backslash Plague. I started out trying :
 
 import re
 r = re.compile('x([0-9a-fA-F]{2})') a = This \xef file \xef has
 \x20 a bunch \xa0 of \xb0 crap \xc0
 
 The backslash trickery applies to string literals also, not only
 regexps.
 
 Your string does not have the value you think it has. Double each
 backslash (or make your string raw) and you'll get what you expect.
 
 -- Alain.

D'oh! I hadn't thought of that. If I read my data file in from disk, use 
the raw string version of the regex, and do the search that way I do 
indeed get the results I'm looking for.

Thanks for pointing that out. I guess I need to think a little deeper 
into what I'm doing when I escape stuff.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs regex help

2010-12-06 Thread Dan M
On Mon, 06 Dec 2010 09:44:39 -0600, Dan M wrote:

 That's what I had initially assumed was the case, but looking at the
 data files with a hex editor showed me that I do indeed have
 four-character sequences. That's what makes this such as interesting
 task!

Sorry, I misunderstood the first time I read your reply.

You're right, the string I showed did indeed contain single-byte 
characters, not four-character sequences. The data file I work with, 
though, does contain four-character sequences.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs regex help

2010-12-06 Thread Peter Otten
Dan M wrote:

 I'm getting bogged down with backslash escaping.
 
 I have some text files containing characters with the 8th bit set. These
 characters are encoded one of two ways: either =hh or \xhh, where h
 represents a hex digit, and \x is a literal backslash followed by a
 lower-case x.

By the way:
 
 print quopri.decodestring(=E4=F6=FC).decode(iso-8859-1)
äöü
 print r\xe4\xf6\xfc.decode(string-escape).decode(iso-8859-1)
äöü

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie needs regex help

2010-12-06 Thread Dan M
On Mon, 06 Dec 2010 18:12:33 +0100, Peter Otten wrote:

 By the way:
  
 print quopri.decodestring(=E4=F6=FC).decode(iso-8859-1)
 äöü
 print r\xe4\xf6\xfc.decode(string-escape).decode(iso-8859-1)
 äöü

Ah - better than a regex. Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


regex help: splitting string gets weird groups

2010-04-08 Thread gry
[ python3.1.1, re.__version__='2.2.1' ]
I'm trying to use re to split a string into (any number of) pieces of
these kinds:
1) contiguous runs of letters
2) contiguous runs of digits
3) single other characters

e.g.   555tHe-rain.in#=1234   should give:   [555, 'tHe', '-', 'rain',
'.', 'in', '#', '=', 1234]
I tried:
 re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', 
 '555tHe-rain.in#=1234').groups()
('1234', 'in', '1234', '=')

Why is 1234 repeated in two groups?  and why doesn't tHe appear as a
group?  Is my regexp illegal somehow and confusing the engine?

I *would* like to understand what's wrong with this regex, though if
someone has a neat other way to do the above task, I'm also interested
in suggestions.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread MRAB

gry wrote:

[ python3.1.1, re.__version__='2.2.1' ]
I'm trying to use re to split a string into (any number of) pieces of
these kinds:
1) contiguous runs of letters
2) contiguous runs of digits
3) single other characters

e.g.   555tHe-rain.in#=1234   should give:   [555, 'tHe', '-', 'rain',
'.', 'in', '#', '=', 1234]
I tried:

re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', '555tHe-rain.in#=1234').groups()

('1234', 'in', '1234', '=')

Why is 1234 repeated in two groups?  and why doesn't tHe appear as a
group?  Is my regexp illegal somehow and confusing the engine?

I *would* like to understand what's wrong with this regex, though if
someone has a neat other way to do the above task, I'm also interested
in suggestions.


If the regex was illegal then it would raise an exception. It's doing
exactly what you're asking it to do!

First of all, there are 4 groups, with group 1 containing groups 2..4 as
alternatives, so group 1 will match whatever groups 2..4 match:

Group 1: (([A-Za-z]+)|([0-9]+)|([-.#=]))
Group 2: ([A-Za-z]+)
Group 3: ([0-9]+)
Group 4: ([-.#=])

It matches like this:

Group 1 and group 3 match '555'.
Group 1 and group 2 match 'tHe'.
Group 1 and group 4 match '-'.
Group 1 and group 2 match 'rain'.
Group 1 and group 4 match '.'.
Group 1 and group 2 match 'in'.
Group 1 and group 4 match '#'.
Group 1 and group 4 match '='.
Group 1 and group 3 match '1234'.

If a group matches then any earlier match of that group is discarded,
so:

Group 1 finishes with '1234'.
Group 2 finishes with 'in'.
Group 3 finishes with '1234'.
Group 4 finishes with '='.

A solution is:

 re.findall('[A-Za-z]+|[0-9]+|[-.#=]', '555tHe-rain.in#=1234')
['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234']

Note: re.findall() returns a list of matches, so if the regex doesn't
contain any groups then it returns the matched substrings. Compare:

 re.findall(a(.), ax ay)
['x', 'y']
 re.findall(a., ax ay)
['ax', 'ay']
--
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread Jon Clements
On 8 Apr, 19:49, gry georgeryo...@gmail.com wrote:
 [ python3.1.1, re.__version__='2.2.1' ]
 I'm trying to use re to split a string into (any number of) pieces of
 these kinds:
 1) contiguous runs of letters
 2) contiguous runs of digits
 3) single other characters

 e.g.   555tHe-rain.in#=1234   should give:   [555, 'tHe', '-', 'rain',
 '.', 'in', '#', '=', 1234]
 I tried: re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', 
 '555tHe-rain.in#=1234').groups()

 ('1234', 'in', '1234', '=')

 Why is 1234 repeated in two groups?  and why doesn't tHe appear as a
 group?  Is my regexp illegal somehow and confusing the engine?

 I *would* like to understand what's wrong with this regex, though if
 someone has a neat other way to do the above task, I'm also interested
 in suggestions.

I would avoid .match and use .findall
(if you walk through them both together, it'll make sense what's
happening
with your match string).

 s = 555tHe-rain.in#=1234
 re.findall('[A-Za-z]+|[0-9]+|[-.#=]', s)
['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234']

hth,

Jon.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread Patrick Maupin
On Apr 8, 1:49 pm, gry georgeryo...@gmail.com wrote:
 [ python3.1.1, re.__version__='2.2.1' ]
 I'm trying to use re to split a string into (any number of) pieces of
 these kinds:
 1) contiguous runs of letters
 2) contiguous runs of digits
 3) single other characters

 e.g.   555tHe-rain.in#=1234   should give:   [555, 'tHe', '-', 'rain',
 '.', 'in', '#', '=', 1234]
 I tried: re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', 
 '555tHe-rain.in#=1234').groups()

 ('1234', 'in', '1234', '=')

 Why is 1234 repeated in two groups?  and why doesn't tHe appear as a
 group?  Is my regexp illegal somehow and confusing the engine?

 I *would* like to understand what's wrong with this regex, though if
 someone has a neat other way to do the above task, I'm also interested
 in suggestions.

IMO, for most purposes, for people who don't want to become re
experts, the easiest, fastest, best, most predictable way to use re is
re.split.  You can either call re.split directly, or, if you are going
to be splitting on the same pattern over and over, compile the pattern
and grab its split method.  Use a *single* capture group in the
pattern, that covers the *whole* pattern.  In the case of your example
data:

 import re
 splitter=re.compile('([A-Za-z]+|[0-9]+|[-.#=])').split
 s='555tHe-rain.in#=1234'
 [x for x in splitter(s) if x]
['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234']

The reason for the list comprehension is that re.split will always
return a non-matching string between matches.  Sometimes this is
useful even when it is a null string (see recent discussion in the
group about splitting digits out of a string), but if you don't care
to see null (empty) strings, this comprehension will remove them.

The reason for a single capture group that covers the whole pattern is
that it is much easier to reason about the output.  The split will
give you all your data, in order, e.g.

 ''.join(splitter(s)) == s
True

HTH,
Pat
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread Tim Chase

gry wrote:

[ python3.1.1, re.__version__='2.2.1' ]
I'm trying to use re to split a string into (any number of) pieces of
these kinds:
1) contiguous runs of letters
2) contiguous runs of digits
3) single other characters

e.g.   555tHe-rain.in#=1234   should give:   [555, 'tHe', '-', 'rain',
'.', 'in', '#', '=', 1234]
I tried:

re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', '555tHe-rain.in#=1234').groups()

('1234', 'in', '1234', '=')

Why is 1234 repeated in two groups?  and why doesn't tHe appear as a
group?  Is my regexp illegal somehow and confusing the engine?


well, I'm not sure what it thinks its finding but nested capture-groups 
always produce somewhat weird results for me (I suspect that's what's 
triggering the duplication).  Additionally, you're only searching for 
one match (.match() returns a single match-object or None; not all 
possible matches within the repeated super-group).



I *would* like to understand what's wrong with this regex, though if
someone has a neat other way to do the above task, I'm also interested
in suggestions.


Tweaking your original, I used

   s='555tHe-rain.in#=1234'
   import re
   r=re.compile(r'([a-zA-Z]+|\d+|.)')
   r.findall(s)
  ['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234']

The only difference between my results and your results is that the 555 
and 1234 come back as strings, not ints.


-tkc




--
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread gry
On Apr 8, 3:40 pm, MRAB pyt...@mrabarnett.plus.com wrote:

...
 Group 1 and group 4 match '='.
 Group 1 and group 3 match '1234'.

 If a group matches then any earlier match of that group is discarded,
Wow, that makes this much clearer!  I wonder if this behaviour
shouldn't be mentioned in some form in the python docs?
Thanks much!

 so:

 Group 1 finishes with '1234'.
 Group 2 finishes with 'in'.
 Group 3 finishes with '1234'.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread Jon Clements
On 8 Apr, 19:49, gry georgeryo...@gmail.com wrote:
 [ python3.1.1, re.__version__='2.2.1' ]
 I'm trying to use re to split a string into (any number of) pieces of
 these kinds:
 1) contiguous runs of letters
 2) contiguous runs of digits
 3) single other characters

 e.g.   555tHe-rain.in#=1234   should give:   [555, 'tHe', '-', 'rain',
 '.', 'in', '#', '=', 1234]
 I tried: re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', 
 '555tHe-rain.in#=1234').groups()

 ('1234', 'in', '1234', '=')

 Why is 1234 repeated in two groups?  and why doesn't tHe appear as a
 group?  Is my regexp illegal somehow and confusing the engine?

 I *would* like to understand what's wrong with this regex, though if
 someone has a neat other way to do the above task, I'm also interested
 in suggestions.

Avoiding re's (for a bit of fun):
(no good for unicode obviously)

import string
from itertools import groupby, chain, repeat, count, izip

s = 555tHe-rain.in#=1234

unique_group = count()
lookup = dict(
chain(
izip(string.ascii_letters, repeat('L')),
izip(string.digits, repeat('D')),
izip(string.punctuation, unique_group)
)
)
parse = dict(D=int, L=str.capitalize)


print [ parse.get(key, lambda L: L)(''.join(items)) for key, items in
groupby(s, lambda L: lookup[L]) ]
[555, 'The', '-', 'Rain', '.', 'In', '#', '=', 1234]

Jon.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread gry
     s='555tHe-rain.in#=1234'
     import re
     r=re.compile(r'([a-zA-Z]+|\d+|.)')
     r.findall(s)
    ['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234']
This is nice and simple and has the invertible property that Patrick
mentioned above.  Thanks much!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help: splitting string gets weird groups

2010-04-08 Thread Patrick Maupin
On Apr 8, 3:40 pm, gry georgeryo...@gmail.com wrote:
      s='555tHe-rain.in#=1234'
      import re
      r=re.compile(r'([a-zA-Z]+|\d+|.)')
      r.findall(s)
     ['555', 'tHe', '-', 'rain', '.', 'in', '#', '=', '1234']

 This is nice and simple and has the invertible property that Patrick
 mentioned above.  Thanks much!

Yes, like using split(), this is invertible.  But you will see a
difference (and for a given task, you might prefer one way or the
other) if, for example, you put a few consecutive spaces in the middle
of your string, where this pattern and findall() will return each
space individually, and split() will return them all together.

You *can* fix up the pattern for findall() where it will have the same
properties as the split(), but it will almost always be a more
complicated pattern than for the equivalent split().

Another thing you can do with split(): if you *think* you have a
pattern that fully covers every string you expect to throw at it, but
would like to verify this, you can make use of the fact that split()
returns a string between each match (and before the first match and
after the last match).  So if you expect that every character in your
entire string should be a part of a match, you can do something like:

strings = splitter(s)
tokens = strings[1::2]
assert not ''.join(strings[::2])

Regards,
Pat
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2010-01-07 Thread Aahz
In article 19de1d6e-5ba9-42b5-9221-ed7246e39...@u36g2000prn.googlegroups.com,
Oltmans  rolf.oltm...@gmail.com wrote:

I've written this regex that's kind of working
re.findall(\w+\s*\W+amazon_(\d+),str)

but I was just wondering that there might be a better RegEx to do that
same thing. Can you kindly suggest a better/improved Regex. Thank you
in advance.

'Some people, when confronted with a problem, think I know, I'll use
regular expressions.  Now they have two problems.'
--Jamie Zawinski

Take the advice other people gave you and use BeautifulSoup.
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur.  --Red Adair
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2010-01-07 Thread Rolando Espinoza La Fuente
# http://gist.github.com/271661

import lxml.html
import re

src = 
lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
=   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
hello, my age is 86 years old and I was born in 1945. Do you know
that
PI is roughly 3.1443534534534534534 

regex = re.compile('amazon_(\d+)')

doc = lxml.html.document_fromstring(src)

for div in doc.xpath('//div[starts-with(@id, amazon_)]'):
match = regex.match(div.get('id'))
if match:
print match.groups()[0]



On Thu, Jan 7, 2010 at 4:42 PM, Aahz a...@pythoncraft.com wrote:
 In article 
 19de1d6e-5ba9-42b5-9221-ed7246e39...@u36g2000prn.googlegroups.com,
 Oltmans  rolf.oltm...@gmail.com wrote:

I've written this regex that's kind of working
re.findall(\w+\s*\W+amazon_(\d+),str)

but I was just wondering that there might be a better RegEx to do that
same thing. Can you kindly suggest a better/improved Regex. Thank you
in advance.

 'Some people, when confronted with a problem, think I know, I'll use
 regular expressions.  Now they have two problems.'
 --Jamie Zawinski

 Take the advice other people gave you and use BeautifulSoup.
 --
 Aahz (a...@pythoncraft.com)           *         http://www.pythoncraft.com/

 If you think it's expensive to hire a professional to do the job, wait
 until you hire an amateur.  --Red Adair
 --
 http://mail.python.org/mailman/listinfo/python-list




-- 
Rolando Espinoza La fuente
www.rolandoespinoza.info
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-24 Thread F.R.



On 21.12.2009 12:38, Oltmans wrote:

Hello,. everyone.

I've a string that looks something like

lksjdflsdiv id ='amazon_345343'  kdjff lsdfs/div  sdjflsdiv id
=   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div


 From above string I need the digits within the ID attribute. For
example, required output from above string is
- 35343433
- 345343
- 8898

I've written this regex that's kind of working
re.findall(\w+\s*\W+amazon_(\d+),str)

but I was just wondering that there might be a better RegEx to do that
same thing. Can you kindly suggest a better/improved Regex. Thank you
in advance.
   


If you filter in two or even more sequential steps the problem becomes a 
lot simpler, not least because you can

test each step separately:

 r1 = re.compile ('div id\D*\d+[^]*')   # Add ignore case and 
variable white space

 r2 = re.compile ('\d+')
 [r2.search (item).group () for item in r1.findall (s) if item] 
# s is your sample

['345343', '35343433', '8898'] # Supposing all ids have digits

Frederic

--
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-22 Thread Umakanth
how about re.findall(r'\w+.=\W\D+(\d+)?',str) ?

this will work for any string within id !

~Ukanth

On Dec 21, 6:06 pm, Oltmans rolf.oltm...@gmail.com wrote:
 On Dec 21, 5:05 pm, Umakanth cum...@gmail.com wrote:

  How about re.findall(r'\d+(?:\.\d+)?',str)

  extracts only numbers from any string

 Thank you. However, I only need the digits within the ID attribute of
 the DIV. Regex that you suggested fails on the following string

 
 lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
 =   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
 hello, my age is 86 years old and I was born in 1945. Do you know that
 PI is roughly 3.1443534534534534534
 

  ~uk

  On Dec 21, 4:38 pm, Oltmans rolf.oltm...@gmail.com wrote:

   Hello,. everyone.

   I've a string that looks something like
   
   lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
   =   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
   

   From above string I need the digits within the ID attribute. For
   example, required output from above string is
   - 35343433
   - 345343
   - 8898

   I've written this regex that's kind of working
   re.findall(\w+\s*\W+amazon_(\d+),str)

   but I was just wondering that there might be a better RegEx to do that
   same thing. Can you kindly suggest a better/improved Regex. Thank you
   in advance.



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-22 Thread Paul McGuire
On Dec 21, 5:38 am, Oltmans rolf.oltm...@gmail.com wrote:
 Hello,. everyone.

 I've a string that looks something like
 
 lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
 =   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
 

 From above string I need the digits within the ID attribute. For
 example, required output from above string is
 - 35343433
 - 345343
 - 8898

 I've written this regex that's kind of working
 re.findall(\w+\s*\W+amazon_(\d+),str)


The issue with using regexen for parsing HTML is that you often get
surprised by attributes that you never expected, or out of order, or
with weird or missing quotation marks, or tags or attributes that are
in upper/lower case.  BeautifulSoup is one tool to use for HTML
scraping, here is a pyparsing example, with hopefully descriptive
comments:


from pyparsing import makeHTMLTags,ParseException

src = 
lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
=   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
hello, my age is 86 years old and I was born in 1945. Do you know
that
PI is roughly 3.1443534534534534534 

# use makeHTMLTags to return an expression that will match
# HTML div tags, including attributes, upper/lower case,
# etc. (makeHTMLTags will return expressions for both
# opening and closing tags, but we only care about the
# opening one, so just use the [0]th returned item
div = makeHTMLTags(div)[0]

# define a parse action to filter only for div tags
# with the proper id form
def filterByIdStartingWithAmazon(tokens):
if not tokens.id.startswith(amazon_):
raise ParseException(
  must have id attribute starting with 'amazon_')

# define a parse action that will add a pseudo-
# attribute 'amazon_id', to make it easier to get the
# numeric portion of the id after the leading 'amazon_'
def makeAmazonIdAttribute(tokens):
tokens[amazon_id] = tokens.id[len(amazon_):]

# attach parse action callbacks to the div expression -
# these will be called during parse time
div.setParseAction(filterByIdStartingWithAmazon,
 makeAmazonIdAttribute)

# search through the input string for matching divs,
# and print out their amazon_id's
for divtag in div.searchString(src):
print divtag.amazon_id


Prints:

345343
35343433
8898

-- 
http://mail.python.org/mailman/listinfo/python-list


Regex help needed!

2009-12-21 Thread Oltmans
Hello,. everyone.

I've a string that looks something like

lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
=   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div


From above string I need the digits within the ID attribute. For
example, required output from above string is
- 35343433
- 345343
- 8898

I've written this regex that's kind of working
re.findall(\w+\s*\W+amazon_(\d+),str)

but I was just wondering that there might be a better RegEx to do that
same thing. Can you kindly suggest a better/improved Regex. Thank you
in advance.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-21 Thread Umakanth
How about re.findall(r'\d+(?:\.\d+)?',str)

extracts only numbers from any string

~uk

On Dec 21, 4:38 pm, Oltmans rolf.oltm...@gmail.com wrote:
 Hello,. everyone.

 I've a string that looks something like
 
 lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
 =   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
 

 From above string I need the digits within the ID attribute. For
 example, required output from above string is
 - 35343433
 - 345343
 - 8898

 I've written this regex that's kind of working
 re.findall(\w+\s*\W+amazon_(\d+),str)

 but I was just wondering that there might be a better RegEx to do that
 same thing. Can you kindly suggest a better/improved Regex. Thank you
 in advance.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-21 Thread mik3
On Dec 21, 7:38 pm, Oltmans rolf.oltm...@gmail.com wrote:
 Hello,. everyone.

 I've a string that looks something like
 
 lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
 =   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
 

 From above string I need the digits within the ID attribute. For
 example, required output from above string is
 - 35343433
 - 345343
 - 8898

 I've written this regex that's kind of working
 re.findall(\w+\s*\W+amazon_(\d+),str)

 but I was just wondering that there might be a better RegEx to do that
 same thing. Can you kindly suggest a better/improved Regex. Thank you
 in advance.

don't need regular expression. just do a split on amazon

 s=lksjdfls div id =\'amazon_345343\' kdjff lsdfs /div sdjfls div id 
 =   amazon_35343433sdfsd/divdiv id=\'amazon_8898\'welcome/div

 for item in s.split(amazon_)[1:]:
...   print item
...
345343' kdjff lsdfs /div sdjfls div id =   
35343433sdfsd/divdiv id='
8898'welcome/div

then find  ' or  indices and do index  slicing.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-21 Thread Peter Otten
Oltmans wrote:

 I've a string that looks something like
 
 lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
 =   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
 
 
 From above string I need the digits within the ID attribute. For
 example, required output from above string is
 - 35343433
 - 345343
 - 8898
 
 I've written this regex that's kind of working
 re.findall(\w+\s*\W+amazon_(\d+),str)
 
 but I was just wondering that there might be a better RegEx to do that
 same thing. Can you kindly suggest a better/improved Regex. Thank you
 in advance.

 from BeautifulSoup import BeautifulSoup
 bs = BeautifulSoup(lksjdfls div id ='amazon_345343' kdjff lsdfs 
/div sdjfls div id
... =   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div)
 [node[id][7:] for node in bs(id=lambda id: id.startswith(amazon_))]
[u'345343', u'35343433', u'8898']

I think BeautifulSoup is a better tool for the task since it actually 
understands HTML.

Peter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-21 Thread Oltmans
On Dec 21, 5:05 pm, Umakanth cum...@gmail.com wrote:
 How about re.findall(r'\d+(?:\.\d+)?',str)

 extracts only numbers from any string


Thank you. However, I only need the digits within the ID attribute of
the DIV. Regex that you suggested fails on the following string


lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
=   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
hello, my age is 86 years old and I was born in 1945. Do you know that
PI is roughly 3.1443534534534534534





 ~uk

 On Dec 21, 4:38 pm, Oltmans rolf.oltm...@gmail.com wrote:

  Hello,. everyone.

  I've a string that looks something like
  
  lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
  =   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
  

  From above string I need the digits within the ID attribute. For
  example, required output from above string is
  - 35343433
  - 345343
  - 8898

  I've written this regex that's kind of working
  re.findall(\w+\s*\W+amazon_(\d+),str)

  but I was just wondering that there might be a better RegEx to do that
  same thing. Can you kindly suggest a better/improved Regex. Thank you
  in advance.



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-21 Thread Umakanth
Ok. how about re.findall(r'\w+_(\d+)',str) ?

returns ['345343', '35343433', '8898', '8898'] !

On Dec 21, 6:06 pm, Oltmans rolf.oltm...@gmail.com wrote:
 On Dec 21, 5:05 pm, Umakanth cum...@gmail.com wrote:

  How about re.findall(r'\d+(?:\.\d+)?',str)

  extracts only numbers from any string

 Thank you. However, I only need the digits within the ID attribute of
 the DIV. Regex that you suggested fails on the following string

 
 lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
 =   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
 hello, my age is 86 years old and I was born in 1945. Do you know that
 PI is roughly 3.1443534534534534534
 

  ~uk

  On Dec 21, 4:38 pm, Oltmans rolf.oltm...@gmail.com wrote:

   Hello,. everyone.

   I've a string that looks something like
   
   lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
   =   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
   

   From above string I need the digits within the ID attribute. For
   example, required output from above string is
   - 35343433
   - 345343
   - 8898

   I've written this regex that's kind of working
   re.findall(\w+\s*\W+amazon_(\d+),str)

   but I was just wondering that there might be a better RegEx to do that
   same thing. Can you kindly suggest a better/improved Regex. Thank you
   in advance.



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-21 Thread MRAB

Oltmans wrote:

Hello,. everyone.

I've a string that looks something like

lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
=   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div



From above string I need the digits within the ID attribute. For

example, required output from above string is
- 35343433
- 345343
- 8898

I've written this regex that's kind of working
re.findall(\w+\s*\W+amazon_(\d+),str)

but I was just wondering that there might be a better RegEx to do that
same thing. Can you kindly suggest a better/improved Regex. Thank you
in advance.


Try:

re.findall(rdiv\s*id\s*=\s*[']amazon_(\d+)['], str)

You shouldn't be using 'str' as a variable name because it hides the
builtin string class 'str'.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed!

2009-12-21 Thread Johann Spies
 Oltmans wrote:
 I've a string that looks something like
 
 lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
 =   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
 
 
 From above string I need the digits within the ID attribute. For
 example, required output from above string is
 - 35343433
 - 345343
 - 8898
 

Your string is in /tmp/y in this example:

$ grep -o [0-9]+ /tmp/y
345343
35343433
8898

Much simpler, isn't it?  But that is not python.

Regards
Johann

-- 
Johann Spies  Telefoon: 021-808 4599
Informasietegnologie, Universiteit van Stellenbosch

 And there were in the same country shepherds abiding 
  in the field, keeping watch over their flock by night.
  And, lo, the angel of the Lord came upon them, and the
  glory of the Lord shone round about them: and they were 
  sore afraid. And the angel said unto them, Fear not:
  for behold I bring you good tidings of great joy, which
  shall be to all people. For unto you is born this day 
  in the city of David a Saviour, which is Christ the 
  Lord.Luke 2:8-11 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2009-12-17 Thread S.Selvam
On Wed, Dec 16, 2009 at 10:46 PM, Gabriel Rossetti 
gabriel.rosse...@arimaz.com wrote:

 Hello everyone,

 I'm going nuts with some regex, could someone please show me what I'm doing
 wrong?

 I have an XMPP msg :

 message xmlns='jabber:client' to='n...@host.com'
   mynode xmlns='myprotocol:core' version='1.0' type='mytype'
   parameters
   param1123/param1
   param2456/param2
   /parameters
   payload type='plain'.../payload
   /mynode
   x xmlns='jabber:x:expire' seconds='15'/
 /message

 the parameter node may be absent or empty (parameter/), the x node
 may be absent. I'd like to grab everything exept the payload nod and
 create something new using regex, with the XMPP message example above I'd
 get this :

 message xmlns='jabber:client' to='n...@host.com'
   mynode xmlns='myprotocol:core' version='1.0' type='mytype'
   parameters
   param1123/param1
   param2456/param2
   /parameters
   /mynode
   x xmlns='jabber:x:expire' seconds='15'/
 /message

 for some reason my regex doesn't work correctly :

 r(message .*?).*?(mynode
 .*?).*?(?:(parameters.*?/parameters)|parameters/)?.*?(x .*/)?


If all you need is to remove payload node ,this could be useful,

s1=message xmlns='jabber:client' to='n...@host.com'mynode
xmlns='myprotocol:core' version='1.0'
type='mytype'parametersparam1123/param1param2456/param2/parameterspayload
type='plain'.../payload/mynodex xmlns='jabber:x:expire'
seconds='15'//message

pat=re.compile(rpayload.*\/payload)
s1=pat.sub(,s1)


-- 
Regards,
S.Selvam
-- 
http://mail.python.org/mailman/listinfo/python-list


regex help

2009-12-16 Thread Gabriel Rossetti

Hello everyone,

I'm going nuts with some regex, could someone please show me what I'm 
doing wrong?


I have an XMPP msg :

message xmlns='jabber:client' to='n...@host.com'
   mynode xmlns='myprotocol:core' version='1.0' type='mytype'
   parameters
   param1123/param1
   param2456/param2
   /parameters
   payload type='plain'.../payload
   /mynode
   x xmlns='jabber:x:expire' seconds='15'/
/message

the parameter node may be absent or empty (parameter/), the x node 
may be absent. I'd like to grab everything exept the payload nod and 
create something new using regex, with the XMPP message example above 
I'd get this :


message xmlns='jabber:client' to='n...@host.com'
   mynode xmlns='myprotocol:core' version='1.0' type='mytype'
   parameters
   param1123/param1
   param2456/param2
   /parameters
   /mynode
   x xmlns='jabber:x:expire' seconds='15'/
/message

for some reason my regex doesn't work correctly :

r(message .*?).*?(mynode 
.*?).*?(?:(parameters.*?/parameters)|parameters/)?.*?(x .*/)?


I group the opening message node, the opening mynode node and if the 
parameters node is present and not empty I group it and if the x 
node is present I group it. For some reason this doesn't work correctly :


 import re
 s1 = message xmlns='jabber:client' to='n...@host.com'mynode 
xmlns='myprotocol:core' version='1.0' 
type='mytype'parametersparam1123/param1param2456/param2/parameterspayload 
type='plain'.../payload/mynodex xmlns='jabber:x:expire' 
seconds='15'//message
 s2 = message xmlns='jabber:client' to='n...@host.com'mynode 
xmlns='myprotocol:core' version='1.0' 
type='mytype'parameters/payload 
type='plain'.../payload/mynodex xmlns='jabber:x:expire' 
seconds='15'//message
 s3 = message xmlns='jabber:client' to='n...@host.com'mynode 
xmlns='myprotocol:core' version='1.0' type='mytype'payload 
type='plain'.../payload/mynodex xmlns='jabber:x:expire' 
seconds='15'//message
 s4 = message xmlns='jabber:client' to='n...@host.com'mynode 
xmlns='myprotocol:core' version='1.0' 
type='mytype'parametersparam1123/param1param2456/param2/parameterspayload 
type='plain'.../payload/mynode/message
 s5 = message xmlns='jabber:client' to='n...@host.com'mynode 
xmlns='myprotocol:core' version='1.0' 
type='mytype'parameters/payload 
type='plain'.../payload/mynode/message
 s6 = message xmlns='jabber:client' to='n...@host.com'mynode 
xmlns='myprotocol:core' version='1.0' type='mytype'payload 
type='plain'.../payload/mynode/message
 exp = r(message .*?).*?(mynode 
.*?).*?(?:(parameters.*?/parameters)|parameters/)?.*?(x .*/)?


 re.match(exp, s1).groups()
(message xmlns='jabber:client' to='n...@host.com', mynode 
xmlns='myprotocol:core' version='1.0' type='mytype', 
'parametersparam1123/param1param2456/param2/parameters', None)


 re.match(exp, s2).groups()
(message xmlns='jabber:client' to='n...@host.com', mynode 
xmlns='myprotocol:core' version='1.0' type='mytype', None, None)


 re.match(exp, s3).groups()
(message xmlns='jabber:client' to='n...@host.com', mynode 
xmlns='myprotocol:core' version='1.0' type='mytype', None, None)


 re.match(exp, s4).groups()
(message xmlns='jabber:client' to='n...@host.com', mynode 
xmlns='myprotocol:core' version='1.0' type='mytype', 
'parametersparam1123/param1param2456/param2/parameters', None)


 re.match(exp, s5).groups()
(message xmlns='jabber:client' to='n...@host.com', mynode 
xmlns='myprotocol:core' version='1.0' type='mytype', None, None)


 re.match(exp, s6).groups()
(message xmlns='jabber:client' to='n...@host.com', mynode 
xmlns='myprotocol:core' version='1.0' type='mytype', None, None)




Does someone know what is wrong with my expression? Thank you, Gabriel
--
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2009-12-16 Thread r0g
Gabriel Rossetti wrote:
 Hello everyone,
 
 I'm going nuts with some regex, could someone please show me what I'm
 doing wrong?
 
 I have an XMPP msg :
 
snip
 
 
 Does someone know what is wrong with my expression? Thank you, Gabriel




Gabriel, trying to debug a long regex in situ can be a nightmare however
the following technique always works for me...

Use the interactive interpreter and see if half the regex works, if it
does your problem is in the second half, if not it's in the first so try
the first half of that and so on an so forth. You'll find the point at
which it goes wrong in a snip.

Non-trivial regexes are always best built up and tested a bit at a time,
the interactive interpreter is great for this.

Roger.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2009-12-16 Thread Intchanter / Daniel Fackrell
On Dec 16, 10:22 am, r0g aioe@technicalbloke.com wrote:
 Gabriel Rossetti wrote:
  Hello everyone,

  I'm going nuts with some regex, could someone please show me what I'm
  doing wrong?

  I have an XMPP msg :

 snip

  Does someone know what is wrong with my expression? Thank you, Gabriel

 Gabriel, trying to debug a long regex in situ can be a nightmare however
 the following technique always works for me...

 Use the interactive interpreter and see if half the regex works, if it
 does your problem is in the second half, if not it's in the first so try
 the first half of that and so on an so forth. You'll find the point at
 which it goes wrong in a snip.

 Non-trivial regexes are always best built up and tested a bit at a time,
 the interactive interpreter is great for this.

 Roger.

I'll just add that the now you have two problems quip applies here,
especially when there are very good XML parsing libraries for Python
that will keep you from having to reinvent the wheel for every little
change.

See sections 20.5 through 20.13 of the Python Documentation for
several built-in options, and I'm sure there are many community
projects that may fit the bill if none of those happen to.

Personally, I consider regular expressions of any substantial length
and complexity to be bad practice as it inhibits readability and
maintainability.  They are also decidedly non-Zen on at least
Readability counts and Sparse is better than dense.

Intchanter
Daniel Fackrell

P.S. I'm not sure how any of these libraries are implemented yet, but
I'd hope they're using a finite state machine tailored to the parsing
task rather than using regexes, but even if they do the latter, having
that abstracted out in a mature library with a clean interface is
still a huge win.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2009-07-09 Thread Peter Otten
David wrote:

 tdnbsp;/td
 
 td width=1% class=keyOpen:
 /td
 td width=1% class=val5.50
 /td
 tdnbsp;/td
 td width=1% class=keyMkt Cap:
 /td
 td width=1% class=val6.92M
 /td
 tdnbsp;/td
 td width=1% class=keyP/E:
 /td
 td width=1% class=val21.99
 /td
 
 
 I want to extract the open, mkt cap and P/E values - but apart from
 doing loads of indivdual REs which I think would look messy, I can't
 think of a better and neater looking way. Any ideas?

 from BeautifulSoup import BeautifulSoup
 bs = BeautifulSoup(tdnbsp;/td
...
... td width=1% class=keyOpen:
... /td
... td width=1% class=val5.50
... /td
... tdnbsp;/td
... td width=1% class=keyMkt Cap:
... /td
... td width=1% class=val6.92M
... /td
... tdnbsp;/td
... td width=1% class=keyP/E:
... /td
... td width=1% class=val21.99
... /td
... )
 for key in bs.findAll(attrs={class: key}):
... value = key.findNext(attrs={class: val})
... print key.string.strip(), --, value.string.strip()
...
Open: -- 5.50
Mkt Cap: -- 6.92M
P/E: -- 21.99


-- 
http://mail.python.org/mailman/listinfo/python-list


regex help

2009-07-08 Thread David
Hi

I have a few regexs I need to do, but im struggling to come up with a
nice way of doing them, and more than anything am here to learn some
tricks and some neat code rather than getting an answer - although
thats obviously what i would like to get to.

Problem 1 -

span class=chg
id=ref_678774_cp(25.47%)/spanbr

I want to extract 25.47 from here - so far I've tried -

xPer = re.search('span class=chg id=ref_'+str(xID.group(1))+'_cp
\(.*?)%', content)

and

xPer = re.search('span class=\chg\ id=\ref_+str(xID.group(1))+_cp
\\((\d*)%\)/spanbr', content)

neither of these seem to do what I want - am I not doing this
correctly? (obviously!)

Problem 2 -

tdnbsp;/td

td width=1% class=keyOpen:
/td
td width=1% class=val5.50
/td
tdnbsp;/td
td width=1% class=keyMkt Cap:
/td
td width=1% class=val6.92M
/td
tdnbsp;/td
td width=1% class=keyP/E:
/td
td width=1% class=val21.99
/td


I want to extract the open, mkt cap and P/E values - but apart from
doing loads of indivdual REs which I think would look messy, I can't
think of a better and neater looking way. Any ideas?

Cheers

David

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2009-07-08 Thread Chris Rebert
On Wed, Jul 8, 2009 at 3:06 PM, Daviddavid.bra...@googlemail.com wrote:
 Hi

 I have a few regexs I need to do, but im struggling to come up with a
 nice way of doing them, and more than anything am here to learn some
 tricks and some neat code rather than getting an answer - although
 thats obviously what i would like to get to.

 Problem 1 -

 span class=chg
                id=ref_678774_cp(25.47%)/spanbr

 I want to extract 25.47 from here - so far I've tried -

 xPer = re.search('span class=chg id=ref_'+str(xID.group(1))+'_cp
 \(.*?)%', content)

 and

 xPer = re.search('span class=\chg\ id=\ref_+str(xID.group(1))+_cp
 \\((\d*)%\)/spanbr', content)

 neither of these seem to do what I want - am I not doing this
 correctly? (obviously!)

 Problem 2 -

 tdnbsp;/td

 td width=1% class=keyOpen:
 /td
 td width=1% class=val5.50
 /td
 tdnbsp;/td
 td width=1% class=keyMkt Cap:
 /td
 td width=1% class=val6.92M
 /td
 tdnbsp;/td
 td width=1% class=keyP/E:
 /td
 td width=1% class=val21.99
 /td


 I want to extract the open, mkt cap and P/E values - but apart from
 doing loads of indivdual REs which I think would look messy, I can't
 think of a better and neater looking way. Any ideas?

Use an actual HTML parser? Like BeautifulSoup
(http://www.crummy.com/software/BeautifulSoup/), for instance.

I will never understand why so many people try to parse/scrape
HTML/XML with regexes...

Cheers,
Chris
-- 
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2009-07-08 Thread Tim Harig
On 2009-07-08, Chris Rebert c...@rebertia.com wrote:
 On Wed, Jul 8, 2009 at 3:06 PM, Daviddavid.bra...@googlemail.com wrote:
 I want to extract the open, mkt cap and P/E values - but apart from
 doing loads of indivdual REs which I think would look messy, I can't
 think of a better and neater looking way. Any ideas?

You are downloading market data?  Yahoo offers its stats in CSV format that
is easier to parse without a dedicated parser.

 Use an actual HTML parser? Like BeautifulSoup
 (http://www.crummy.com/software/BeautifulSoup/), for instance.

I agree with your sentiment exactly.  If the regex he is trying to get is
difficult enough that he has to ask; then, yes, he should be using a
parser.

 I will never understand why so many people try to parse/scrape
 HTML/XML with regexes...

Why?  Because some times it is good enough to get the job done easily.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2009-07-08 Thread Rhodri James
On Wed, 08 Jul 2009 23:06:22 +0100, David david.bra...@googlemail.com  
wrote:



Hi

I have a few regexs I need to do, but im struggling to come up with a
nice way of doing them, and more than anything am here to learn some
tricks and some neat code rather than getting an answer - although
thats obviously what i would like to get to.

Problem 1 -

span class=chg
id=ref_678774_cp(25.47%)/spanbr

I want to extract 25.47 from here - so far I've tried -

xPer = re.search('span class=chg id=ref_'+str(xID.group(1))+'_cp
\(.*?)%', content)


Supposing that str(xID.group(1)) == 678774, let's see how that string
concatenation turns out:

span class=chg id=ref_678774_cp(.*?)%

The obvious problems here are the spurious double quotes, the spurious
(but harmless) escaping of a double quote, and the lack of (escaped)
backslash and (escaped) open parenthesis.  The latter you can always
strip off later, but the first sink the match rather thoroughly.



and

xPer = re.search('span class=\chg\ id=\ref_+str(xID.group(1))+_cp
\\((\d*)%\)/spanbr', content)


With only two single quotes present, the biggest problem should be obvious.

Unfortunately if you just fix the obvious in either of the two regular
expressions, you're setting yourself up for a fall later on.  As The Fine
Manual says right at the top of the page on the re module
(http://docs.python.org/library/re.html), you want to be using raw string
literals when you're dealing with regular expressions, because you want
the backslashes getting through without being interpreted specially by
Python's own parser.  As it happens you get away with it in this case,
since neither '\d' nor '\(' have a special meaning to Python, so aren't
changed, and '\' is interpreted as '', which happens to be the right
thing anyway.



Problem 2 -

tdnbsp;/td

td width=1% class=keyOpen:
/td
td width=1% class=val5.50
/td
tdnbsp;/td
td width=1% class=keyMkt Cap:
/td
td width=1% class=val6.92M
/td
tdnbsp;/td
td width=1% class=keyP/E:
/td
td width=1% class=val21.99
/td


I want to extract the open, mkt cap and P/E values - but apart from
doing loads of indivdual REs which I think would look messy, I can't
think of a better and neater looking way. Any ideas?


What you're trying to do is inherently messy.  You might want to use
something like BeautifulSoup to hide the mess, but never having had
cause to use it myself I couldn't say for sure.

--
Rhodri James *-* Wildebeest Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


RE: Regex Help

2008-09-25 Thread Lawrence D'Oliveiro
In message [EMAIL PROTECTED], Support
Desk wrote:

 Thanks for the reply ...

A: The vulture doesn't get Frequent Poster miles.
Q: What's the difference between a top-poster and a vulture?
--
http://mail.python.org/mailman/listinfo/python-list


RE: Regex Help

2008-09-24 Thread Support Desk

Thanks for the reply, I found out the problem was occurring later on in the
script. The regexp works well.

-Original Message-
From: Lawrence D'Oliveiro [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 23, 2008 6:51 PM
To: python-list@python.org
Subject: Re: Regex Help

In message [EMAIL PROTECTED], Support
Desk wrote:

 Anybody know of a good regex to parse html links from html code? The one I
 am currently using seems to be cutting off the last letter of some links,
 and returning links like
 
 http://somesite.co
 
 or http://somesite.ph
 
 the code I am using is
 
 
 regex = r'a href=[|\']([^|\']+)[|\']'

Can you post some example HTML sequences that this regexp is not handling
correctly?


--
http://mail.python.org/mailman/listinfo/python-list


More regex help

2008-09-24 Thread Support Desk
I am working on a python webcrawler, that will extract all links from an
html page, and add them to a queue, The problem I am having is building
absolute links from relative links, as there are so many different types of
relative links. If I just append the relative links to the current url, some
websites will send it into a never-ending loop. 

What I am looking for is a regexp that will extract the root url from any 
url string I pass to it, such as

'http://example.com/stuff/stuff/morestuff/index.html'

Regexp = http:example.com

'http://anotherexample.com/stuff/index.php

Regexp = 'http://anotherexample.com/

'http://example.com/stuff/stuff/

Regext = 'http://example.com'





--
http://mail.python.org/mailman/listinfo/python-list


Re: More regex help

2008-09-24 Thread Kirk Strauser
At 2008-09-24T16:25:02Z, Support Desk [EMAIL PROTECTED] writes:

 I am working on a python webcrawler, that will extract all links from an
 html page, and add them to a queue, The problem I am having is building
 absolute links from relative links, as there are so many different types of
 relative links. If I just append the relative links to the current url, some
 websites will send it into a never-ending loop. 

 import urllib
 urllib.basejoin('http://www.example.com/path/to/deep/page',
'/foo')
'http://www.example.com/foo'
 urllib.basejoin('http://www.example.com/path/to/deep/page',
'http://slashdot.org/foo')
'http://slashdot.org/foo'

-- 
Kirk Strauser
The Day Companies
--
http://mail.python.org/mailman/listinfo/python-list


RE: More regex help

2008-09-24 Thread Support Desk
Kirk, 

That's exactly what I needed. Thx!
 

-Original Message-
From: Kirk Strauser [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 24, 2008 11:42 AM
To: python-list@python.org
Subject: Re: More regex help

At 2008-09-24T16:25:02Z, Support Desk [EMAIL PROTECTED] writes:

 I am working on a python webcrawler, that will extract all links from an
 html page, and add them to a queue, The problem I am having is building
 absolute links from relative links, as there are so many different types
of
 relative links. If I just append the relative links to the current url,
some
 websites will send it into a never-ending loop. 

 import urllib
 urllib.basejoin('http://www.example.com/path/to/deep/page',
'/foo')
'http://www.example.com/foo'
 urllib.basejoin('http://www.example.com/path/to/deep/page',
'http://slashdot.org/foo')
'http://slashdot.org/foo'

-- 
Kirk Strauser
The Day Companies


--
http://mail.python.org/mailman/listinfo/python-list


Re: Regex Help

2008-09-23 Thread Miki
Hello,

 Anybody know of a good regex to parse html links from html code?
BeautifulSoup is *the* library to handle HTML

from BeautifulSoup import BeautifulSoup
from urllib import urlopen

soup = BeautifulSoup(urlopen(http://python.org/;))
for a in soup(a):
print a[href]

HTH,
--
Miki [EMAIL PROTECTED]
http://pythonwise.blogspot.com
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regex Help

2008-09-23 Thread Lawrence D'Oliveiro
In message [EMAIL PROTECTED], Support
Desk wrote:

 Anybody know of a good regex to parse html links from html code? The one I
 am currently using seems to be cutting off the last letter of some links,
 and returning links like
 
 http://somesite.co
 
 or http://somesite.ph
 
 the code I am using is
 
 
 regex = r'a href=[|\']([^|\']+)[|\']'

Can you post some example HTML sequences that this regexp is not handling
correctly?
--
http://mail.python.org/mailman/listinfo/python-list


Regex Help

2008-09-22 Thread Support Desk
Anybody know of a good regex to parse html links from html code? The one I
am currently using seems to be cutting off the last letter of some links,
and returning links like

http://somesite.co

or http://somesite.ph

the code I am using is 


regex = r'a href=[|\']([^|\']+)[|\']'

page_text = urllib.urlopen('http://somesite.com')
page_text = page_text.read()

links = re.findall(regex, text, re.IGNORECASE)



--
http://mail.python.org/mailman/listinfo/python-list


Re: Regex Help

2008-09-22 Thread Fredrik Lundh

Support Desk wrote:

the code I am using is 


regex = r'a href=[|\']([^|\']+)[|\']'


that's way too fragile to work with real-life HTML (what if the link has 
a TITLE attribute, for example?  or contains whitespace after the HREF?)


you might want to consider using a real HTML parser for this task.


page_text = urllib.urlopen('http://somesite.com')
page_text = page_text.read()

links = re.findall(regex, text, re.IGNORECASE)


the RE looks fine for the subset of all valid A elements that it can 
handle, though.


got any examples of pages where you see that behaviour?

/F

--
http://mail.python.org/mailman/listinfo/python-list


regex help

2008-06-30 Thread Support Desk
Hello, 
   I am working on a web-app, that querys long distance numbers from a
database of call logs. I am trying to put together a regex that matches any
number that does not start with the following. Basically any number that
does'nt start with:

 

281

713

832 

 

or

 

1281

1713

1832 

 

 

is long distance any, help would be appreciated. 

 

--
http://mail.python.org/mailman/listinfo/python-list

Re: regex help

2008-06-30 Thread Cédric Lucantis
Le Monday 30 June 2008 16:53:54 Support Desk, vous avez écrit :
 Hello,
I am working on a web-app, that querys long distance numbers from a
 database of call logs. I am trying to put together a regex that matches any
 number that does not start with the following. Basically any number that
 does'nt start with:



 281

 713

 832



 or



 1281

 1713

 1832





 is long distance any, help would be appreciated.

sounds like str.startswith() is enough for your needs:

if not number.startswith(('281', '713', '832', ...)) :
...

-- 
Cédric Lucantis
--
http://mail.python.org/mailman/listinfo/python-list

RE: regex help

2008-06-30 Thread Metal Zong
 import re

 if __name__ == __main__:
... lst = [281, 713, 832, 1281, 1713, 1832, 2281, 2713, 2832]
... for item in lst:
... if re.match(^1?(?=281)|^1?(?=713)|^1?(?=832), str(item)):
... print %d invalid % item
... else:
... print %d valid % item
...
281 invalid
713 invalid
832 invalid
1281 invalid
1713 invalid
1832 invalid
2281 valid
2713 valid
2832 valid



  _  

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Support Desk
Sent: Monday, June 30, 2008 10:54 PM
To: python-list@python.org
Subject: regex help



Hello, 
   I am working on a web-app, that querys long distance numbers from a
database of call logs. I am trying to put together a regex that matches any
number that does not start with the following. Basically any number that
does'nt start with:

 

281

713

832 

 

or

 

1281

1713

1832 

 

 

is long distance any, help would be appreciated. 

 

--
http://mail.python.org/mailman/listinfo/python-list

regex help

2008-06-03 Thread Support Desk
I am trying to put together a regular expression that will rename users
address books on our server due to a recent change we made.  Users with
address books user.abook need to be changed to [EMAIL PROTECTED] I'm
having trouble with the regex. Any help would be appreciated.

 

-Mike

--
http://mail.python.org/mailman/listinfo/python-list

RE: regex help

2008-06-03 Thread Reedick, Andrew
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] 
 On Behalf Of Support Desk
 Sent: Tuesday, June 03, 2008 9:32 AM
 To: python-list@python.org
 Subject: regex help

 I am trying to put together a regular expression that will 
 rename users address books on our server due to a recent 
 change we made.  Users with address books user.abook need 
 to be changed to [EMAIL PROTECTED] I'm having trouble 
 with the regex. Any help would be appreciated.


import re

emails = ('foo.abook', 'abook.foo', 'bob.abook.com', 'john.doe.abook')

for email in emails:
print email, '--', 
print re.sub(r'\.abook$', '@domain.com.abook', email)



*

The information transmitted is intended only for the person or entity to which 
it is addressed and may contain confidential, proprietary, and/or privileged 
material. Any review, retransmission, dissemination or other use of, or taking 
of any action in reliance upon this information by persons or entities other 
than the intended recipient is prohibited. If you received this in error, 
please contact the sender and delete the material from all computers. GA623


--
http://mail.python.org/mailman/listinfo/python-list


RE: regex help

2008-06-03 Thread Support Desk
That’s it exactly..thx

-Original Message-
From: Reedick, Andrew [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 03, 2008 9:26 AM
To: Support Desk
Subject: RE: regex help

The regex will now skip anything with an '@'in the filename on the
assumption it's already in the correct format.  Uncomment the os.rename line
once you're satisfied you won't mangle anything.


import glob
import os
import re


for filename in glob.glob('*.abook'):
newname = filename
newname = re.sub(r'[EMAIL PROTECTED]', '@domain.com.abook', filename)
if filename != newname:
print rename, filename, to, newname
#os.rename(filename, newname)



 -Original Message-
 From: Support Desk [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, June 03, 2008 10:07 AM
 To: Reedick, Andrew
 Subject: RE: regex help
 
 Thx for the reply,
 
 I would first have to list all files matching user.abook then rename
 them to
 [EMAIL PROTECTED] something like Im still new to python and haven't
 had
 much experience with the re module
 
 import os
 import re
 
 emails = os.popen('ls').readlines()
 for email in emails:
 print email, '--',
 print re.findall(r'\.abook$', email)
 
 
 
 -Original Message-
 From: Reedick, Andrew [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, June 03, 2008 8:52 AM
 To: Support Desk; python-list@python.org
 Subject: RE: regex help
 
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED]
  On Behalf Of Support Desk
  Sent: Tuesday, June 03, 2008 9:32 AM
  To: python-list@python.org
  Subject: regex help
 
  I am trying to put together a regular expression that will
  rename users address books on our server due to a recent
  change we made.  Users with address books user.abook need
  to be changed to [EMAIL PROTECTED] I'm having trouble
  with the regex. Any help would be appreciated.
 
 
 import re
 
 emails = ('foo.abook', 'abook.foo', 'bob.abook.com', 'john.doe.abook')
 
 for email in emails:
   print email, '--',
   print re.sub(r'\.abook$', '@domain.com.abook', email)
 
 
 
 *
 
 The information transmitted is intended only for the person or entity
 to
 which it is addressed and may contain confidential, proprietary, and/or
 privileged material. Any review, retransmission, dissemination or other
 use
 of, or taking of any action in reliance upon this information by
 persons or
 entities other than the intended recipient is prohibited. If you
 received
 this in error, please contact the sender and delete the material from
 all
 computers. GA623
 
 


--
http://mail.python.org/mailman/listinfo/python-list


Re: pexpect regex help

2007-02-23 Thread amadain
On Feb 21, 11:15 pm, [EMAIL PROTECTED] wrote:
 On Feb 21, 6:13 pm, [EMAIL PROTECTED] wrote:



  I have apexpectscript to walk through a cisco terminal server and I
  was hoping to get some help with this regex because I really suck at
  it.

  This is the code:

  index = s.expect(['login: ',pexpect.EOF,pexpect.TIMEOUT])
  if index == 0:
  m = re.search('((#.+\r\n){20,25})(\s.*)',
  s.before)  #-- MY PROBLEM
  print m.group(3),
  print ' %s %s' % (ip[0], port)
  s.send(chr(30))
  s.sendline('x')
  s.sendline('disco')
  s.sendline('\n')
  elif index == 1:
  print s.before
  elif index == 2:
  print
  print '%s %s FAILED' % (ip[0], port)
  print 'This host may be down or locked on the TS'
  s.send(chr(30))
  s.sendline('x')
  s.sendline('disco')
  s.sendline('\n')

  This is attempting to match the hostname of the connected host using
  the output of a motd file which unfortunately is not the same
  everywhere...  It looks like this:

  #
  #   This system is the property
  of: #
  #
  #
  #DefNet
  #
  #
  #
  #   Use of this system is for authorized users
  only.#
  #   Individuals using this computer system without authority, or
  in #
  #   excess of their authority, are subject to having all of
  their   #
  #   activities on this system monitored and recorded by
  system  #
  #
  personnel.  #
  #
  #
  #   In the course of monitoring individuals improperly using
  this   #
  #   system, or in the course of system maintenance, the
  activities  #
  #   of authorized users may also be
  monitored.  #
  #
  #
  #   Anyone using this system expressly consents to such
  monitoring  #
  #   and is advised that if such monitoring reveals
  possible #
  #   evidence of criminal activity, system personnel may provide
  the #
  #   evidence of such monitoring to law enforcement
  officials.   #
  #

  pa-chi1 console login:

  And sometimes it looks like this:

  #
  #   This system is the property
  of: #
  #
  #
  #DefNet
  #
  #
  #
  #   Use of this system is for authorized users
  only.#
  #   Individuals using this computer system without authority, or
  in #
  #   excess of their authority, are subject to having all of
  their   #
  #   activities on this system monitored and recorded by
  system  #
  #
  personnel.  #
  #
  #
  #   In the course of monitoring individuals improperly using
  this   #
  #   system, or in the course of system maintenance, the
  activities  #
  #   of authorized users may also be
  monitored.  #
  #
  #
  #   Anyone using this system expressly consents to such
  monitoring  #
  #   and is advised that if such monitoring reveals
  possible #
  #   evidence of criminal activity, system personnel may provide
  the #
  #   evidence of such monitoring to law enforcement
  officials.   #
  #
  pa11-chi1 login:

  The second one works and it will print out pa11-chi1  but when there
  is a space or console is in the output it wont print anything or it
  wont match anything...I want to be able to match just the hostname
  and print it out.

  Any ideas?

  Thanks,

  Jonathan

 It is also posted here more clearly and formatted as it would appear
 on the terminal:  http://www.pastebin.ca/366822



what about using s.before.split(\r\n)[-1]?

A

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pexpect regex help

2007-02-23 Thread amadain
On Feb 23, 8:46 am, amadain [EMAIL PROTECTED] wrote:
 On Feb 21, 11:15 pm, [EMAIL PROTECTED] wrote:



  On Feb 21, 6:13 pm, [EMAIL PROTECTED] wrote:

   I have apexpectscript to walk through a cisco terminal server and I
   was hoping to get some help with this regex because I really suck at
   it.

   This is the code:

   index = s.expect(['login: ',pexpect.EOF,pexpect.TIMEOUT])
   if index == 0:
   m = re.search('((#.+\r\n){20,25})(\s.*)',
   s.before)  #-- MY PROBLEM
   print m.group(3),
   print ' %s %s' % (ip[0], port)
   s.send(chr(30))
   s.sendline('x')
   s.sendline('disco')
   s.sendline('\n')
   elif index == 1:
   print s.before
   elif index == 2:
   print
   print '%s %s FAILED' % (ip[0], port)
   print 'This host may be down or locked on the TS'
   s.send(chr(30))
   s.sendline('x')
   s.sendline('disco')
   s.sendline('\n')

   This is attempting to match the hostname of the connected host using
   the output of a motd file which unfortunately is not the same
   everywhere...  It looks like this:

   #
   #   This system is the property
   of: #
   #
   #
   #DefNet
   #
   #
   #
   #   Use of this system is for authorized users
   only.#
   #   Individuals using this computer system without authority, or
   in #
   #   excess of their authority, are subject to having all of
   their   #
   #   activities on this system monitored and recorded by
   system  #
   #
   personnel.  #
   #
   #
   #   In the course of monitoring individuals improperly using
   this   #
   #   system, or in the course of system maintenance, the
   activities  #
   #   of authorized users may also be
   monitored.  #
   #
   #
   #   Anyone using this system expressly consents to such
   monitoring  #
   #   and is advised that if such monitoring reveals
   possible #
   #   evidence of criminal activity, system personnel may provide
   the #
   #   evidence of such monitoring to law enforcement
   officials.   #
   #

   pa-chi1 console login:

   And sometimes it looks like this:

   #
   #   This system is the property
   of: #
   #
   #
   #DefNet
   #
   #
   #
   #   Use of this system is for authorized users
   only.#
   #   Individuals using this computer system without authority, or
   in #
   #   excess of their authority, are subject to having all of
   their   #
   #   activities on this system monitored and recorded by
   system  #
   #
   personnel.  #
   #
   #
   #   In the course of monitoring individuals improperly using
   this   #
   #   system, or in the course of system maintenance, the
   activities  #
   #   of authorized users may also be
   monitored.  #
   #
   #
   #   Anyone using this system expressly consents to such
   monitoring  #
   #   and is advised that if such monitoring reveals
   possible #
   #   evidence of criminal activity, system personnel may provide
   the #
   #   evidence of such monitoring to law enforcement
   officials.   #
   #
   pa11-chi1 login:

   The second one works and it will print out pa11-chi1  but when there
   is a space or console is in the output it wont print anything or it
   wont match anything...I want to be able to match just the hostname
   and print it out.

   Any ideas?

   Thanks,

   Jonathan

  It is also posted here more clearly and formatted as it would appear
  on the terminal:  http://www.pastebin.ca/366822

 what about using s.before.split(\r\n)[-1]?

 A



result=[x for x in s.before.split(\r\n) if x != ]
print result[-1]

should cover the blank line problem

A

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pexpect regex help

2007-02-23 Thread amadain
On Feb 23, 8:53 am, amadain [EMAIL PROTECTED] wrote:
 On Feb 23, 8:46 am, amadain [EMAIL PROTECTED] wrote:



  On Feb 21, 11:15 pm, [EMAIL PROTECTED] wrote:

   On Feb 21, 6:13 pm, [EMAIL PROTECTED] wrote:

I have apexpectscript to walk through a cisco terminal server and I
was hoping to get some help with this regex because I really suck at
it.

This is the code:

index = s.expect(['login: ',pexpect.EOF,pexpect.TIMEOUT])
if index == 0:
m = re.search('((#.+\r\n){20,25})(\s.*)',
s.before)  #-- MY PROBLEM
print m.group(3),
print ' %s %s' % (ip[0], port)
s.send(chr(30))
s.sendline('x')
s.sendline('disco')
s.sendline('\n')
elif index == 1:
print s.before
elif index == 2:
print
print '%s %s FAILED' % (ip[0], port)
print 'This host may be down or locked on the TS'
s.send(chr(30))
s.sendline('x')
s.sendline('disco')
s.sendline('\n')

This is attempting to match the hostname of the connected host using
the output of a motd file which unfortunately is not the same
everywhere...  It looks like this:

#
#   This system is the property
of: #
#
#
#DefNet
#
#
#
#   Use of this system is for authorized users
only.#
#   Individuals using this computer system without authority, or
in #
#   excess of their authority, are subject to having all of
their   #
#   activities on this system monitored and recorded by
system  #
#
personnel.  #
#
#
#   In the course of monitoring individuals improperly using
this   #
#   system, or in the course of system maintenance, the
activities  #
#   of authorized users may also be
monitored.  #
#
#
#   Anyone using this system expressly consents to such
monitoring  #
#   and is advised that if such monitoring reveals
possible #
#   evidence of criminal activity, system personnel may provide
the #
#   evidence of such monitoring to law enforcement
officials.   #
#

pa-chi1 console login:

And sometimes it looks like this:

#
#   This system is the property
of: #
#
#
#DefNet
#
#
#
#   Use of this system is for authorized users
only.#
#   Individuals using this computer system without authority, or
in #
#   excess of their authority, are subject to having all of
their   #
#   activities on this system monitored and recorded by
system  #
#
personnel.  #
#
#
#   In the course of monitoring individuals improperly using
this   #
#   system, or in the course of system maintenance, the
activities  #
#   of authorized users may also be
monitored.  #
#
#
#   Anyone using this system expressly consents to such
monitoring  #
#   and is advised that if such monitoring reveals
possible #
#   evidence of criminal activity, system personnel may provide
the #
#   evidence of such monitoring to law enforcement
officials.   #
#
pa11-chi1 login:

The second one works and it will print out pa11-chi1  but when there
is a space or console is in the output it wont print anything or it
wont match anything...I want to be able to match just the hostname
and print it out.

Any ideas?

Thanks,

Jonathan

   It is also posted here more clearly and formatted as it would appear
   on the terminal:  http://www.pastebin.ca/366822

  what about using s.before.split(\r\n)[-1]?

  A

 result=[x for x in s.before.split(\r\n) if x != ]
 print result[-1]

 should cover the blank line problem

 A



sorry I just read that you are not matching sometimes. Try expecting
for ogin: (without the first letter and trailing space). There could
be no space after login: or there could be \t (tab).

A

-- 
http://mail.python.org/mailman/listinfo/python-list


pexpect regex help

2007-02-21 Thread jonathan . sabo
I have a pexpect script to walk through a cisco terminal server and I
was hoping to get some help with this regex because I really suck at
it.

This is the code:

index = s.expect(['login: ', pexpect.EOF, pexpect.TIMEOUT])
if index == 0:
m = re.search('((#.+\r\n){20,25})(\s.*)',
s.before)  #-- MY PROBLEM
print m.group(3),
print ' %s %s' % (ip[0], port)
s.send(chr(30))
s.sendline('x')
s.sendline('disco')
s.sendline('\n')
elif index == 1:
print s.before
elif index == 2:
print
print '%s %s FAILED' % (ip[0], port)
print 'This host may be down or locked on the TS'
s.send(chr(30))
s.sendline('x')
s.sendline('disco')
s.sendline('\n')

This is attempting to match the hostname of the connected host using
the output of a motd file which unfortunately is not the same
everywhere...  It looks like this:

#
#   This system is the property
of: #
#
#
#DefNet
#
#
#
#   Use of this system is for authorized users
only.#
#   Individuals using this computer system without authority, or
in #
#   excess of their authority, are subject to having all of
their   #
#   activities on this system monitored and recorded by
system  #
#
personnel.  #
#
#
#   In the course of monitoring individuals improperly using
this   #
#   system, or in the course of system maintenance, the
activities  #
#   of authorized users may also be
monitored.  #
#
#
#   Anyone using this system expressly consents to such
monitoring  #
#   and is advised that if such monitoring reveals
possible #
#   evidence of criminal activity, system personnel may provide
the #
#   evidence of such monitoring to law enforcement
officials.   #
#

pa-chi1 console login:

And sometimes it looks like this:

#
#   This system is the property
of: #
#
#
#DefNet
#
#
#
#   Use of this system is for authorized users
only.#
#   Individuals using this computer system without authority, or
in #
#   excess of their authority, are subject to having all of
their   #
#   activities on this system monitored and recorded by
system  #
#
personnel.  #
#
#
#   In the course of monitoring individuals improperly using
this   #
#   system, or in the course of system maintenance, the
activities  #
#   of authorized users may also be
monitored.  #
#
#
#   Anyone using this system expressly consents to such
monitoring  #
#   and is advised that if such monitoring reveals
possible #
#   evidence of criminal activity, system personnel may provide
the #
#   evidence of such monitoring to law enforcement
officials.   #
#
pa11-chi1 login:

The second one works and it will print out pa11-chi1  but when there
is a space or console is in the output it wont print anything or it
wont match anything...I want to be able to match just the hostname
and print it out.

Any ideas?

Thanks,

Jonathan

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: pexpect regex help

2007-02-21 Thread jonathan . sabo
On Feb 21, 6:13 pm, [EMAIL PROTECTED] wrote:
 I have a pexpect script to walk through a cisco terminal server and I
 was hoping to get some help with this regex because I really suck at
 it.

 This is the code:

 index = s.expect(['login: ', pexpect.EOF, pexpect.TIMEOUT])
 if index == 0:
 m = re.search('((#.+\r\n){20,25})(\s.*)',
 s.before)  #-- MY PROBLEM
 print m.group(3),
 print ' %s %s' % (ip[0], port)
 s.send(chr(30))
 s.sendline('x')
 s.sendline('disco')
 s.sendline('\n')
 elif index == 1:
 print s.before
 elif index == 2:
 print
 print '%s %s FAILED' % (ip[0], port)
 print 'This host may be down or locked on the TS'
 s.send(chr(30))
 s.sendline('x')
 s.sendline('disco')
 s.sendline('\n')

 This is attempting to match the hostname of the connected host using
 the output of a motd file which unfortunately is not the same
 everywhere...  It looks like this:

 #
 #   This system is the property
 of: #
 #
 #
 #DefNet
 #
 #
 #
 #   Use of this system is for authorized users
 only.#
 #   Individuals using this computer system without authority, or
 in #
 #   excess of their authority, are subject to having all of
 their   #
 #   activities on this system monitored and recorded by
 system  #
 #
 personnel.  #
 #
 #
 #   In the course of monitoring individuals improperly using
 this   #
 #   system, or in the course of system maintenance, the
 activities  #
 #   of authorized users may also be
 monitored.  #
 #
 #
 #   Anyone using this system expressly consents to such
 monitoring  #
 #   and is advised that if such monitoring reveals
 possible #
 #   evidence of criminal activity, system personnel may provide
 the #
 #   evidence of such monitoring to law enforcement
 officials.   #
 #

 pa-chi1 console login:

 And sometimes it looks like this:

 #
 #   This system is the property
 of: #
 #
 #
 #DefNet
 #
 #
 #
 #   Use of this system is for authorized users
 only.#
 #   Individuals using this computer system without authority, or
 in #
 #   excess of their authority, are subject to having all of
 their   #
 #   activities on this system monitored and recorded by
 system  #
 #
 personnel.  #
 #
 #
 #   In the course of monitoring individuals improperly using
 this   #
 #   system, or in the course of system maintenance, the
 activities  #
 #   of authorized users may also be
 monitored.  #
 #
 #
 #   Anyone using this system expressly consents to such
 monitoring  #
 #   and is advised that if such monitoring reveals
 possible #
 #   evidence of criminal activity, system personnel may provide
 the #
 #   evidence of such monitoring to law enforcement
 officials.   #
 #
 pa11-chi1 login:

 The second one works and it will print out pa11-chi1  but when there
 is a space or console is in the output it wont print anything or it
 wont match anything...I want to be able to match just the hostname
 and print it out.

 Any ideas?

 Thanks,

 Jonathan



It is also posted here more clearly and formatted as it would appear
on the terminal:  http://www.pastebin.ca/366822

-- 
http://mail.python.org/mailman/listinfo/python-list


Regex help...pretty please?

2006-08-23 Thread MooMaster
I'm trying to develop a little script that does some string
manipulation. I have some few hundred strings that currently look like
this:

cond(a,b,c)

and I want them to look like this:

cond(c,a,b)

but it gets a little more complicated because the conds themselves may
have conds within, like the following:

cond(0,cond(c,cond(e,cond(g,h,(af)),(ad)),(ab)),(a1))

What I want to do in this case is move the last parameter to the front
and then work backwards all the way out (if you're thinking recursion
too, I'm vindicated) so that it ends up looking like this:

cond((a1), 0, cond((ab),c,cond((ad), e, cond((af), g, h

futhermore, the conds may be multiplied by an expression, such as the
following:

cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))

Here, all I want to do is switch the parameters of the conds without
touching the expression, like so:

cond(f,-1,1)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))

So that's the gist of my problem statement. I immediately thought that
regular expressions would provide an elegant solution. I would go
through the string by conds, stripping them  the () off, until I got
to the lowest level, then move the parameters and work backwards. That
thought process became this:
-CODE
import re

def swap(left, middle, right):
left = left.replace((, )
right = right.replace(), )
temp = left
left = right
right = temp
temp = middle
middle = right
right = temp
whole = 'cond(' + left + ',' + middle + ',' + right + ')'
return whole

def condReplacer(string):
 #regex = re.compile(r'cond\(.*,.*,.+\)')
 regex = re.compile(r'cond\(.*,.*,.+?\)')
 if not regex.search(string):
  print whole string is:  + string
  [left, middle, right] = string.split(',')
  right = right.replace('\'', ' ')
  string = swap(left.strip(), middle.strip(), right.strip())
  print the new string is: + string
  return string
 else:
  more_conds = regex.search(string)
  temp_string = more_conds.group()
  firstParen = temp_string.find('(')
  temp_string = temp_string[firstParen:]
  print there are more conditionals! + temp_string
  condReplacer(temp_string)
def lineReader(file):
 for line in file:
 regex = r'cond\(.*,.*,.+\)?'
 if re.search(regex,line,re.DOTALL):
condReplacer(line)

if __name__ == __main__:
   input_file = open(only_conds2.txt, 'r')
   lineReader(input_file)
-CODE

I think my problem lies in my regular expression... If I use the one
commented out I do a greedy search and in my test case where I have a
conditional * an expression, I grab the expression too, like so:

INPUT:

cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))
OUTPUT:
whole string is:
(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float
(a))
the new string
is:cond(f*((float(e*(2**4+(float(d*8+(float(c*4+(float(b*2+float
(a,-1,1)

when all I really want to do is grab the part associated with the cond.
But if I do a non-greedy search I avoid that problem but stop too early
when I have an expression like this:

INPUT:
cond(a,b,(abs(c) = d))
OUTPUT:
whole string is: (a,b,(abs(c)
the new string is:cond((abs(c,a,b)

Can anyone help me with the regular expression? Is this even the best
approach to take? Anyone have any thoughts? 

Thanks for your time!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help...pretty please?

2006-08-23 Thread Tim Chase
 cond(a,b,c)
 
 and I want them to look like this:
 
 cond(c,a,b)
 
 but it gets a little more complicated because the conds themselves may
 have conds within, like the following:
 
 cond(0,cond(c,cond(e,cond(g,h,(af)),(ad)),(ab)),(a1))

Regexps are *really* *REALLY* *bad* at arbitrarily nested 
structures.  really.

Sounds more like you want something like a lex/yacc sort of 
solution.  IIUC, pyparsing may do the trick for you.  I'm not a 
pyparsing wonk, but I can hold my own when it comes to crazy 
regexps, and can tell you from experience that regexps are *not* 
a good path to try and go down for this problem.

Many times, a regexp can be hammered into solving problems 
superior solutions than employing regexps.  This case is not even 
one of those.

If you know the maximum depth of nesting you'll encounter, you 
can do some hackish stunts to shoehorn regexps to solve the 
problem.  But if they are truely of arbitrary nesting-depth, 
*good* *luck*! :)

-tkc




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help...pretty please?

2006-08-23 Thread Simon Forman
MooMaster wrote:
 I'm trying to develop a little script that does some string
 manipulation. I have some few hundred strings that currently look like
 this:

 cond(a,b,c)

 and I want them to look like this:

 cond(c,a,b)

 but it gets a little more complicated because the conds themselves may
 have conds within, like the following:

 cond(0,cond(c,cond(e,cond(g,h,(af)),(ad)),(ab)),(a1))

 What I want to do in this case is move the last parameter to the front
 and then work backwards all the way out (if you're thinking recursion
 too, I'm vindicated) so that it ends up looking like this:

 cond((a1), 0, cond((ab),c,cond((ad), e, cond((af), g, h

 futhermore, the conds may be multiplied by an expression, such as the
 following:

 cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))

 Here, all I want to do is switch the parameters of the conds without
 touching the expression, like so:

 cond(f,-1,1)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))

 So that's the gist of my problem statement. I immediately thought that
 regular expressions would provide an elegant solution. I would go
 through the string by conds, stripping them  the () off, until I got
 to the lowest level, then move the parameters and work backwards. That
 thought process became this:
 -CODE
 import re

 def swap(left, middle, right):
 left = left.replace((, )
 right = right.replace(), )
 temp = left
 left = right
 right = temp
 temp = middle
 middle = right
 right = temp
 whole = 'cond(' + left + ',' + middle + ',' + right + ')'
 return whole

 def condReplacer(string):
  #regex = re.compile(r'cond\(.*,.*,.+\)')
  regex = re.compile(r'cond\(.*,.*,.+?\)')
  if not regex.search(string):
   print whole string is:  + string
   [left, middle, right] = string.split(',')
   right = right.replace('\'', ' ')
   string = swap(left.strip(), middle.strip(), right.strip())
   print the new string is: + string
   return string
  else:
   more_conds = regex.search(string)
   temp_string = more_conds.group()
   firstParen = temp_string.find('(')
   temp_string = temp_string[firstParen:]
   print there are more conditionals! + temp_string
   condReplacer(temp_string)
 def lineReader(file):
  for line in file:
  regex = r'cond\(.*,.*,.+\)?'
  if re.search(regex,line,re.DOTALL):
 condReplacer(line)

 if __name__ == __main__:
input_file = open(only_conds2.txt, 'r')
lineReader(input_file)
 -CODE

 I think my problem lies in my regular expression... If I use the one
 commented out I do a greedy search and in my test case where I have a
 conditional * an expression, I grab the expression too, like so:

 INPUT:

 cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))
 OUTPUT:
 whole string is:
 (-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float
 (a))
 the new string
 is:cond(f*((float(e*(2**4+(float(d*8+(float(c*4+(float(b*2+float
 (a,-1,1)

 when all I really want to do is grab the part associated with the cond.
 But if I do a non-greedy search I avoid that problem but stop too early
 when I have an expression like this:

 INPUT:
 cond(a,b,(abs(c) = d))
 OUTPUT:
 whole string is: (a,b,(abs(c)
 the new string is:cond((abs(c,a,b)

 Can anyone help me with the regular expression? Is this even the best
 approach to take? Anyone have any thoughts?

 Thanks for your time!

You're gonna want a parser for this.  pyparsing or spark would suffice.
 However, since it looks like your source strings are valid python you
could get some traction out of the tokenize standard library module:

from tokenize import generate_tokens
from StringIO import StringIO

s =
'cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))'

for t in generate_tokens(StringIO(s).readline):
print t[1],


Prints:
cond ( - 1 , 1 , f ) * ( ( float ( e ) * ( 2 ** 4 ) ) + ( float ( d ) *
8 ) + ( float ( c ) * 4 ) + ( float ( b ) * 2 ) + float ( a ) )

Once you've got that far the rest should be easy.  :)

Peace,
~Simon

http://pyparsing.wikispaces.com/
http://pages.cpsc.ucalgary.ca/~aycock/spark/
http://docs.python.org/lib/module-tokenize.html

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help...pretty please?

2006-08-23 Thread Paul McGuire
MooMaster [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 I'm trying to develop a little script that does some string
 manipulation. I have some few hundred strings that currently look like
 this:

 cond(a,b,c)

 and I want them to look like this:

 cond(c,a,b)

snip

Pyparsing makes this a fairly tractable problem.  The hardest part is
defining the valid contents of a relational and arithmetic expression, which
may be found within the arguments of your cond(a,b,c) constructs.

Not guaranteeing this 100%, but it did convert your pathologically nested
example on the first try.

-- Paul

--
from pyparsing import *

ident = ~Literal(cond) + Word(alphas)
number = Combine(Optional(-) + Word(nums) + Optional(. + Word(nums)))

arithExpr = Forward()
funcCall = ident+(+delimitedList(arithExpr)+)
operand = number | funcCall | ident
binop = oneOf(+ - * /)
arithExpr  ( ( operand + ZeroOrMore( binop + operand ) ) | (( +
arithExpr + ) ) )
relop = oneOf(  == = = != )

condDef = Forward()
simpleCondExpr = arithExpr + ZeroOrMore( relop + arithExpr ) | condDef
multCondExpr = simpleCondExpr + * + arithExpr
condExpr = Forward()
condExpr  ( simpleCondExpr | multCondExpr | ( + condExpr + ) )

def reorderArgs(t):
return cond( + ,.join([.join(t.arg3), .join(t.arg1),
.join(t.arg2)]) + )

condDef  ( Literal(cond) + ( + Group(condExpr).setResultsName(arg1)
+ , +
 Group(condExpr).setResultsName(arg2)
+ , +
 Group(condExpr).setResultsName(arg3)
+ ) ).setParseAction( reorderArgs )

tests = [
cond(a,b,c),
cond(12,b,c),
cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+floa
t(a)),
cond(a,b,(abs(c) = d)),
cond(0,cond(c,cond(e,cond(g,h,(af)),(ad)),(ab)),(a1)),
]

for t in tests:
print t,-,condExpr.transformString(t)
--
Prints:
cond(a,b,c) - cond(c,a,b)
cond(12,b,c) - cond(c,12,b)
cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float
(a)) -
cond(f,-1,1)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float
(a))
cond(a,b,(abs(c) = d)) - cond((abs(c)=d),a,b)
cond(0,cond(c,cond(e,cond(g,h,(af)),(ad)),(ab)),(a1)) -
cond((a1),0,cond((ab),c,cond((ad),e,cond((af),g,h


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help...pretty please?

2006-08-23 Thread vbgunz
MooMaster Wrote:
 I'm trying to develop a little script that does some string
 manipulation. I have some few hundred strings that currently look like
 this:
 cond(a,b,c)
 and I want them to look like this:
 cond(c,a,b)

I zoned out on your question and created a very simple flipper.
Although it will not solve your problem maybe someone looking for a
simpler version may find it useful as a starting point. I hope it
proves useful. I'll post my simple flipper here:

s = 'cond(1,savv(grave(3,2,1),y,x),maxx(c,b,a),0)'
def argFlipper(s):
''' take a string of arguments and reverse'em e.g.
 cond(1,savv(grave(3,2,1),y,x),maxx(c,b,a),0)
 - cond(0,maxx(a,b,c),savv(x,y,grave(1,2,3)),1)

'''

count = 0
keyholder = {}
while 1:
if s.find('(')  0:
count += 1
value = '%sph' + '%d' % count
tempstring = [x for x in s]
startindex = s.rfind('(')
limitindex = s.find(')', startindex)
argtarget = s[startindex + 1:limitindex].split(',')
argreversed = ','.join(reversed(argtarget))
keyholder[value] = '(' + argreversed + ')'
tempstring[startindex:limitindex + 1] = value
s = ''.join(tempstring)
else:
while count and keyholder:
s = s.replace(value, keyholder[value])
count -= 1
value = '%sph' + '%d' % count
return s  

print argFlipper(s)

-- 
http://mail.python.org/mailman/listinfo/python-list


regex help

2006-05-16 Thread Lance Hoffmeyer
I have the following table and I am trying to match percentage the 2nd column 
on the 2nd Tiger line (9.0).

I have tried both of the following.  I expected both to match but neither did?  
Is there a modifier
I am missing?  What changes do I need to make these match?  I need to keep the 
structure of the regex
the same.

TIGER.append(re.search(TIGER\s{10}.*?(?:(\d{1,3}\.\d)\s+){2}, 
target_table).group(1))
TIGER.append(re.search(^TIGER.*?(?:(\d{1,3}\.\d)\s+){2}, 
target_table).group(1))


BASE - TOTAL TIGER 268   268173 95   101 -   10157 -
5778 276   268   19276230 21

DOG 7979 44 3531 -3117 -
1725 124795524 75  1
  29.5  29.5   25.4   36.8  30.7 -  30.7  29.8 -  
29.8  32.1  50.0  31.6  29.5  28.6  31.6   32.64.8

CAT 4646 28 1820 -20 7 -
 714 -14463214 39  4
  17.2  17.2   16.2   18.9  19.8 -  19.8  12.3 -  
12.3  17.9 -  18.4  17.2  16.7  18.4   17.0   19.0

LAMB3232 23  910 -10 8 -
 812 -12322012 28  1
  11.9  11.9   13.39.5   9.9 -   9.9  14.0 -  
14.0  15.4 -  15.8  11.9  10.4  15.8   12.24.8

TRIPOD  3232 23  9 9 - 9 9 -
 911 110322210 28  3
  11.9  11.9   13.39.5   8.9 -   8.9  15.8 -  
15.8  14.1  50.0  13.2  11.9  11.5  13.2   12.2   14.3

TIGER   2424 16  8 5 - 510 -
10 7 - 72417 7 18  2
   9.0   9.09.28.4   5.0 -   5.0  17.5 -  
17.5   9.0 -   9.2   9.0   8.9   9.27.89.5
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2006-05-16 Thread Peter Otten
Lance Hoffmeyer wrote:

 I have the following table and I am trying to match percentage the 2nd
 column on the 2nd Tiger line (9.0).
 
 I have tried both of the following.  I expected both to match but neither
 did?  Is there a modifier
 I am missing?  What changes do I need to make these match?  I need to keep
 the structure of the regex the same.
 
 TIGER.append(re.search(TIGER\s{10}.*?(?:(\d{1,3}\.\d)\s+){2},
 target_table).group(1))
 TIGER.append(re.search(^TIGER.*?(?:(\d{1,3}\.\d)\s+){2},
 target_table).group(1))

You can try the re.DOTALL flag (prepend the regex string with (?s)), but
I'd go with something really simple:

instream = iter(target_table.splitlines()) # or: instream = open(datafile)
for line in instream:
if line.startswith(TIGER):
value = instream.next().split()[1] # or ...[0]? they are both '9.0'
TIGER.append(value)
break

Peter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2006-05-16 Thread johnzenger
Why not use split instead of regular expressions?

 ln = 3232 23  9 9 - 9 9 - 911 
 110
 ln.split()
['32', '32', '23', '9', '9', '-', '9', '9', '-', '9', '11', '1', '10']

Much simpler, yes?  Just find the line that comes after a line that
begins with TIGER, split it, and pick the number you want out of the
resulting list.

Lance Hoffmeyer wrote:
 I have the following table and I am trying to match percentage the 2nd column 
 on the 2nd Tiger line (9.0).

 I have tried both of the following.  I expected both to match but neither 
 did?  Is there a modifier
 I am missing?  What changes do I need to make these match?  I need to keep 
 the structure of the regex
 the same.

 TIGER.append(re.search(TIGER\s{10}.*?(?:(\d{1,3}\.\d)\s+){2}, 
 target_table).group(1))
 TIGER.append(re.search(^TIGER.*?(?:(\d{1,3}\.\d)\s+){2}, 
 target_table).group(1))


 BASE - TOTAL TIGER 268   268173 95   101 -   10157 -  
   5778 276   268   19276230 21

 DOG 7979 44 3531 -3117 -  
   1725 124795524 75  1
   29.5  29.5   25.4   36.8  30.7 -  30.7  29.8 -  
 29.8  32.1  50.0  31.6  29.5  28.6  31.6   32.64.8

 CAT 4646 28 1820 -20 7 -  
714 -14463214 39  4
   17.2  17.2   16.2   18.9  19.8 -  19.8  12.3 -  
 12.3  17.9 -  18.4  17.2  16.7  18.4   17.0   19.0

 LAMB3232 23  910 -10 8 -  
812 -12322012 28  1
   11.9  11.9   13.39.5   9.9 -   9.9  14.0 -  
 14.0  15.4 -  15.8  11.9  10.4  15.8   12.24.8

 TRIPOD  3232 23  9 9 - 9 9 -  
911 110322210 28  3
   11.9  11.9   13.39.5   8.9 -   8.9  15.8 -  
 15.8  14.1  50.0  13.2  11.9  11.5  13.2   12.2   14.3

 TIGER   2424 16  8 5 - 510 -  
   10 7 - 72417 7 18  2
9.0   9.09.28.4   5.0 -   5.0  17.5 -  
 17.5   9.0 -   9.2   9.0   8.9   9.27.89.5

-- 
http://mail.python.org/mailman/listinfo/python-list


Regex help needed

2006-01-10 Thread rh0dium
Hi all,

I am using python to drive another tool using pexpect.  The values
which I get back I would like to automatically put into a list if there
is more than one return value. They provide me a way to see that the
data is in set by parenthesising it.

This is all generated as I said using pexpect - Here is how I use it..
 child = pexpect.spawn( _buildCadenceExe(), timeout=timeout)
 child.sendline(somefunction())
 child.expect( )
 data=child.before

Given this data can take on several shapes:

Single return value -- THIS IS THE ONE I CAN'T GET TO WORK..
data = 'somefunction()\r\n@(#)$CDS: icfb.exe version 5.1.0 05/22/2005
23:36 (cicln01) $\r\n'

Multiple return value
data = 'somefunction()\r\n(. ~
/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile)\r\n'

It may take up several lines...
data = 'somefunction()\r\n(. ~
\r\n/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile\r\nfoo)\r\n'

So if you're still reading this I want to parse out data.  Here are the
rules...
- Line 1 ALWAYS is the calling function whatever is there (except
\r\n) should be kept as original
- Anything may occur inside the quotations - I don't care what's in
there per se but it must be maintained.
- Parenthesed items I want to be pushed into a list.  I haven't run
into a case where you have nested paren's but that not to say it won't
happen...

So here is my code..  Pardon my hack job..

import os,re

def main(data=None):

# Get rid of the annoying \r's
dat=data.split(\r)
data=.join(dat)

# Remove the first line - that is the original call
dat = data.split(\n)
original=dat[0]
del dat[0]

print Original, original
# Now join all of the remaining lines
retl=.join(dat)

# self.logger.debug(Original = \'%s\' % original)

try:
# Get rid of the parenthesis
parmatcher = re.compile( r'\(([^()]*)\)' )
parmatch = parmatcher.search(retl)

# Get rid of the first and last quotes
qrmatcher = re.compile( r'\([^()]*)\' )
qrmatch = qrmatcher.search(parmatch.group(1))

# Split the items
qmatch=re.compile(r'\\s+\')
results = qmatch.split(qrmatch.group(1))
except:
qrmatcher = re.compile( r'\([^()]*)\' )
qrmatch = qrmatcher.search(retl)

# Split the items
qmatch=re.compile(r'\\s+\')
results = qmatch.split(qrmatch.group(1))

print Orig, original, Results, results
return original,results


# General run..
if __name__ == '__main__':


# data = 'someFunction\r\n test foo\r\n'
# data = 'someFunction\r\n test  foo\r\n'
data = 'getVersion()\r\n@(#)$CDS: icfb.exe version 5.1.0
05/22/2005 23:36 (cicln01) $\r\n'
# data = 'someFunction\r\n (test test1 foo aasdfasdf\r\n
newline test2)\r\n'

main(data)

CAN SOMEONE PLEASE CLEAN THIS UP?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread Paul McGuire
rh0dium [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 Hi all,

 I am using python to drive another tool using pexpect.  The values
 which I get back I would like to automatically put into a list if there
 is more than one return value. They provide me a way to see that the
 data is in set by parenthesising it.

snip

Well, you asked for regex help, but a pyparsing rendition may be easier to
read and maintain.

-- Paul
(Download pyparsing at http://pyparsing.sourceforge.net.)


# test data strings
test1 = somefunction()
@(#)$CDS: icfb.exe version 5.1.0 05/22/2005 23:36 (cicln01) $


test2 = somefunction()
(. ~
/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile
foo)


test3 = somefunctionWithNestedlist()
(. ~
/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile
(Hey!
this is a nested
list)
foo)



So if you're still reading this I want to parse out data.  Here are the
rules...
- Line 1 ALWAYS is the calling function whatever is there (except
\r\n) should be kept as original
- Anything may occur inside the quotations - I don't care what's in
there per se but it must be maintained.
- Parenthesed items I want to be pushed into a list.  I haven't run
into a case where you have nested paren's but that not to say it won't
happen...


from pyparsing import Literal, Word, alphas, alphanums, \
dblQuotedString, OneOrMore, Group, Forward

LPAR = Literal(()
RPAR = Literal())

# assume function identifiers must start with alphas, followed by zero or
more
# alphas, numbers, or '_' - expand this defn as needed
ident = Word(alphas,alphanums+_)

# define a list as one or more quoted strings, inside ()'s - we'll tackle
nesting
# in a minute
quoteList = Group( LPAR.suppress() +
   OneOrMore(dblQuotedString) +
   RPAR.suppress() )

# define format of a line of data - don't bother with \n's or \r's,
# pyparsing just skips 'em
dataFormat = ident + LPAR + RPAR + ( dblQuotedString | quoteList )

def test(t):
print dataFormat.parseString(t)

print Parse flat lists
test(test1)
test(test2)

# modifications for nested lists
quoteList = Forward()
quoteList  Group( LPAR.suppress() +
   OneOrMore(dblQuotedString | quoteList) +
   RPAR.suppress() )
dataFormat = ident + LPAR + RPAR + ( dblQuotedString | quoteList )

print
print Parse using nested lists
test(test1)
test(test2)
test(test3)

Parsing results:
Parse flat lists
['somefunction', '(', ')', '@(#)$CDS: icfb.exe version 5.1.0 05/22/2005
23:36 (cicln01) $']
['somefunction', '(', ')', ['.', '~',
'/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile', 'foo']]

Parse using nested lists
['somefunction', '(', ')', '@(#)$CDS: icfb.exe version 5.1.0 05/22/2005
23:36 (cicln01) $']
['somefunction', '(', ')', ['.', '~',
'/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile', 'foo']]
['somefunctionWithNestedlist', '(', ')', ['.', '~',
'/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile', ['Hey!',
'this is a nested', 'list'], 'foo']]



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread rh0dium

Paul McGuire wrote:
 -- Paul
 (Download pyparsing at http://pyparsing.sourceforge.net.)

Done.


Hey this is pretty cool!  I have one small problem that I don't know
how to resolve.  I want the entire contents (whatever it is) of line 1
to be the ident.  Now digging into the code showed a method line,
lineno and LineStart LineEnd.  I tried to use all three but it didn't
work for a few reasons ( line = type issues, lineno - I needed the data
and could't get it to work, LineStart/End - I think it matches every
line and I need the scope to line 1 )

So here is my rendition of the code - But this is REALLY slick..

I think the problem is the parens on line one

def main(data=None):

LPAR = Literal(()
RPAR = Literal())

# assume function identifiers must start with alphas, followed by
zero or more
# alphas, numbers, or '_' - expand this defn as needed
ident = LineStart + LineEnd

# define a list as one or more quoted strings, inside ()'s - we'll
tackle nesting
# in a minute
quoteList = Group( LPAR.suppress() + OneOrMore(dblQuotedString) +
RPAR.suppress())

# define format of a line of data - don't bother with \n's or \r's,

# pyparsing just skips 'em
dataFormat = ident + ( dblQuotedString | quoteList )

return dataFormat.parseString(data)


# General run..
if __name__ == '__main__':


# data = 'someFunction\r\n test foo\r\n'
# data = 'someFunction\r\n test  foo\r\n'
data = 'getVersion()\r\n@(#)$CDS: icfb.exe version 5.1.0
05/22/2005 23:36 (cicln01) $\r\n'
# data = 'someFunction\r\n (test test1 foo aasdfasdf\r\n
newline test2)\r\n'

foo = main(data)

print foo

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread Paul McGuire
rh0dium [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]

 Paul McGuire wrote:
  -- Paul
  (Download pyparsing at http://pyparsing.sourceforge.net.)

 Done.


 Hey this is pretty cool!  I have one small problem that I don't know
 how to resolve.  I want the entire contents (whatever it is) of line 1
 to be the ident.  Now digging into the code showed a method line,
 lineno and LineStart LineEnd.  I tried to use all three but it didn't
 work for a few reasons ( line = type issues, lineno - I needed the data
 and could't get it to work, LineStart/End - I think it matches every
 line and I need the scope to line 1 )

 So here is my rendition of the code - But this is REALLY slick..

 I think the problem is the parens on line one

 def main(data=None):

 LPAR = Literal(()
 RPAR = Literal())

 # assume function identifiers must start with alphas, followed by
 zero or more
 # alphas, numbers, or '_' - expand this defn as needed
 ident = LineStart + LineEnd

 # define a list as one or more quoted strings, inside ()'s - we'll
 tackle nesting
 # in a minute
 quoteList = Group( LPAR.suppress() + OneOrMore(dblQuotedString) +
 RPAR.suppress())

 # define format of a line of data - don't bother with \n's or \r's,

 # pyparsing just skips 'em
 dataFormat = ident + ( dblQuotedString | quoteList )

 return dataFormat.parseString(data)


 # General run..
 if __name__ == '__main__':


 # data = 'someFunction\r\n test foo\r\n'
 # data = 'someFunction\r\n test  foo\r\n'
 data = 'getVersion()\r\n@(#)$CDS: icfb.exe version 5.1.0
 05/22/2005 23:36 (cicln01) $\r\n'
 # data = 'someFunction\r\n (test test1 foo aasdfasdf\r\n
 newline test2)\r\n'

 foo = main(data)

 print foo


LineStart() + LineEnd() will only match an empty line.


If you describe in words what you want ident to be, it may be more natural
to translate to pyparsing.

A word starting with an alpha, followed by zero or more alphas, numbers, or
'_'s, with a trailing pair of parens

ident = Word(alpha,alphanums+_) + LPAR + RPAR


If you want the ident all combined into a single token, use:

ident = Combine( Word(alpha,alphanums+_) + LPAR + RPAR )


LineStart and LineEnd are geared more for line-oriented or
whitespace-sensitive grammars.  Your example doesn't really need them, I
don't think.

If you *really* want everything on the first line to be the ident, try this:

ident = Word(alpha,alphanums+_) + restOfLine
or
ident = Combine( Word(alpha,alphanums+_) + restOfLine )


Now the next step is to assign field names to the results:

dataFormat = ident.setResultsName(ident) + ( dblQuotedString |
quoteList ).setResultsName(contents)

test = blah blah test string

results = dataFormat.parseString(test)
print results.ident, results.contents

I'm glad pyparsing is working out for you!  There should be a number of
examples that ship with pyparsing that may give you some more ideas on how
to proceed from here.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread Michael Spencer
rh0dium wrote:
 Hi all,
 
 I am using python to drive another tool using pexpect.  The values
 which I get back I would like to automatically put into a list if there
 is more than one return value. They provide me a way to see that the
 data is in set by parenthesising it.
 
...

 
 CAN SOMEONE PLEASE CLEAN THIS UP?
 

How about using the Python tokenizer rather than re:

   import cStringIO, tokenize
  ...
   def get_tokens(source):
  ... allowed_tokens = (tokenize.STRING, tokenize.OP)
  ... src = cStringIO.StringIO(source).readline
  ... src = tokenize.generate_tokens(src)
  ... return (token[1] for token in src if token[0] in allowed_tokens)
  ...
   def rest_eval(tokens):
  ... output = []
  ... for token in tokens:
  ... if token == (:
  ... output.append(rest_eval(tokens))
  ... elif token == ):
  ... return output
  ... else:
  ... output.append(token[1:-1])
  ... return output
  ...
   def parse(source):
  ... source = source.splitlines()
  ... original, rest = source[0], \n.join(source[1:])
  ... return original, rest_eval(get_tokens(rest))
  ...
   sources = [
  ... 'someFunction\r\n test foo\r\n',
  ... 'someFunction\r\n test  foo\r\n',
  ... 'getVersion()\r\n@(#)$CDS: icfb.exe version 5.1.0 05/22/2005 23:36 
(cicln01) $\r\n',
  ... 'someFunction\r\n (test test1 foo aasdfasdf\r\n newline 
test2)\r\n']
  
   for data in sources: parse(data)
  ...
  ('someFunction', ['test', 'foo'])
  ('someFunction', ['test  foo'])
  ('getVersion()', ['@(#)$CDS: icfb.exe version 5.1.0 05/22/2005 23:36 
(cicln01) 
$'])
  ('someFunction', [['test', 'test1', 'foo aasdfasdf', 'newline', 'test2']])
  

Cheers

Michael

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread rh0dium

Paul McGuire wrote:

 ident = Combine( Word(alpha,alphanums+_) + LPAR + RPAR )

This will only work for a word with a parentheses ( ie.  somefunction()
)

 If you *really* want everything on the first line to be the ident, try this:

 ident = Word(alpha,alphanums+_) + restOfLine
 or
 ident = Combine( Word(alpha,alphanums+_) + restOfLine )

This nicely grabs the \r..  How can I get around it?

 Now the next step is to assign field names to the results:

 dataFormat = ident.setResultsName(ident) + ( dblQuotedString |
 quoteList ).setResultsName(contents)

This is super cool!!

So let's take this for example

test= 'fprintf( outFile leSetInstSelectable( t )\n )\r\n (test
test1 foo aasdfasdf\r\n newline test2)\r\n'

Now I want the ident to pull out 'fprintf( outFile
leSetInstSelectable( t )\n )' so I tried to do this?

ident = Forward()
ident  Group( Word(alphas,alphanums) + LPAR + ZeroOrMore(
dblQuotedString | ident | Word(alphas,alphanums) ) + RPAR)

Borrowing from the example listed previously.  But it bombs out cause
it wants a )  but it has one..  Forward() ROCKS!!

Also how does it know to do this for just the first line?  It would
seem that this will work for every line - No?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread rh0dium

Michael Spencer wrote:
def parse(source):
   ... source = source.splitlines()
   ... original, rest = source[0], \n.join(source[1:])
   ... return original, rest_eval(get_tokens(rest))

This is a very clean and elegant way to separate them - Very nice!!  I
like this alot - I will definately use this in the future!!

 
 Cheers
 
 Michael

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread Paul McGuire
rh0dium [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]

 Paul McGuire wrote:

  ident = Combine( Word(alpha,alphanums+_) + LPAR + RPAR )

 This will only work for a word with a parentheses ( ie.  somefunction()
 )

  If you *really* want everything on the first line to be the ident, try
this:
 
  ident = Word(alpha,alphanums+_) + restOfLine
  or
  ident = Combine( Word(alpha,alphanums+_) + restOfLine )

 This nicely grabs the \r..  How can I get around it?

  Now the next step is to assign field names to the results:
 
  dataFormat = ident.setResultsName(ident) + ( dblQuotedString |
  quoteList ).setResultsName(contents)

 This is super cool!!

 So let's take this for example

 test= 'fprintf( outFile leSetInstSelectable( t )\n )\r\n (test
 test1 foo aasdfasdf\r\n newline test2)\r\n'

 Now I want the ident to pull out 'fprintf( outFile
 leSetInstSelectable( t )\n )' so I tried to do this?

 ident = Forward()
 ident  Group( Word(alphas,alphanums) + LPAR + ZeroOrMore(
 dblQuotedString | ident | Word(alphas,alphanums) ) + RPAR)

 Borrowing from the example listed previously.  But it bombs out cause
 it wants a )  but it has one..  Forward() ROCKS!!

 Also how does it know to do this for just the first line?  It would
 seem that this will work for every line - No?

This works for me:

test4 = rfprintf( outFile leSetInstSelectable( t )\n )
(test
test1 foo aasdfasdf
newline test2)


ident = Forward()
ident  Group( Word(alphas,alphanums) + LPAR + ZeroOrMore(
dblQuotedString | ident | Word(alphas,alphanums) ) + RPAR)
dataFormat = ident + ( dblQuotedString | quoteList )

print dataFormat.parseString(test4)

Prints:
[['fprintf', '(', 'outFile', 'leSetInstSelectable( t )\\n', ')'],
['test', 'test1', 'foo aasdfasdf', 'newline', 'test2']]


1. Is there supposed to be a real line break in the string
leSetInstSelectable( t )\n, or just a slash-n at the end?  pyparsing
quoted strings do not accept multiline quotes, but they do accept escaped
characters such as \t \n, etc.  That is, to pyparsing:

\n this is a valid \t \n string

this is not
a valid string

Part of the confusion is that your examples include explicit \r\n
characters.  I'm assuming this is to reflect what you see when listing out
the Python variable containing the string.  (Are you opening a text file
with rb to read in binary?  Try opening with just r, and this may
resolve your \r\n problems.)

2. If restOfLine is still giving you \r's at the end, you can redefine
restOfLine to not include them, or to include and suppress them.  Or (this
is easier) define a parse action for restOfLine that strips trailing \r's:

def stripTrailingCRs(st,loc,toks):
try:
  if toks[0][-1] == '\r':
return toks[0][:-1]
except:
  pass

restOfLine.setParseAction( stripTrailingCRs )


3.  How does it know to only do it for the first line?  Presumably you told
it to do so.  pyparsing's parseString method starts at the beginning of the
input string, and matches expressions until it finds a mismatch, or runs out
of expressions to match - even if there is more input string to process,
pyparsing does not continue.  To search through the whole file looking for
idents, try using scanString which returns a generator; for each match, the
generator gives a tuple containing:
- tokens - the matched tokens
- start - the start location of the match
- end - the end location of the match

If your input file consists *only* of these constructs, you can also just
expand dataFormat.parseString to OneOrMore(dataFormat).parseString.


-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex help needed

2006-01-10 Thread Michael Spencer
rh0dium wrote:
 Michael Spencer wrote:
def parse(source):
   ... source = source.splitlines()
   ... original, rest = source[0], \n.join(source[1:])
   ... return original, rest_eval(get_tokens(rest))
 
 This is a very clean and elegant way to separate them - Very nice!!  I
 like this alot - I will definately use this in the future!!
 
 Cheers

 Michael
 
On reflection, this simplifies further (to 9 lines), at least for the test 
cases 
your provide, which don't involve any nested parens:

   import cStringIO, tokenize
  ...
   def get_tokens2(source):
  ... src = cStringIO.StringIO(source).readline
  ... src = tokenize.generate_tokens(src)
  ... return [token[1][1:-1] for token in src if token[0] == 
tokenize.STRING]
  ...
   def parse2(source):
  ... source = source.splitlines()
  ... original, rest = source[0], \n.join(source[1:])
  ... return original, get_tokens2(rest)
  ...
  

This matches your main function for the three tests where main works...

   for source in sources[:3]: #matches your main function where it works
  ... assert parse2(source) == main(source)
  ...
  Original someFunction
  Orig someFunction Results ['test', 'foo']
  Original someFunction
  Orig someFunction Results ['test  foo']
  Original someFunction
  Orig someFunction Results ['test', 'test1', 'foo aasdfasdf', 'newline', 
'test2']

...and handles the case where main fails (I think correctly, although I'm not 
entirely sure what your desired output is in this case:
   parse2(sources[3])
  ('getVersion()', ['@(#)$CDS: icfb.exe version 5.1.0 05/22/2005 23:36 
(cicln01) 
$'])
  

If you really do need nested parens, then you'd need the slightly longer 
version 
I posted earlier

Cheers

Michael

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2005-08-11 Thread John Machin
jeff sacksteder wrote:
 Regex questions seem to be rather resistant to googling.
 
 My regex currently looks like - 'FOO:.*\n\n'
 
 The chunk of text I am attempting to locate is a line beginning with
 FOO:, followed by an unknown number of lines, terminating with a
 blank line. Clearly the .* phrase does not match the single newlines
 occuring inside the block.
 
 Suggestions are warmly welcomed.

I suggest you read the manual first:

.
(Dot.) In the default mode, this matches any character except a newline. 
If the DOTALL flag has been specified, this matches any character 
including a newline.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2005-08-11 Thread gene tani
when *I* google

http://www.awaretek.com/tutorials.html#regular
http://en.wikibooks.org/wiki/Programming:Python_Strings
http://www.regexlib.com/Default.aspx

http://docs.python.org/lib/module-re.html

http://diveintopython.org/regular_expressions/index.html#re.intro
http://www.amk.ca/python/howto/regex/
http://gnosis.cx/publish/programming/regular_expressions.html

also look into ActiveStateKomodo reg ex debugger ( I think WIng IDE has
it too

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2005-08-11 Thread Shantanoo Mahajan
John Machin wrote:

 jeff sacksteder wrote:
 Regex questions seem to be rather resistant to googling.
 
 My regex currently looks like - 'FOO:.*\n\n'
 
 The chunk of text I am attempting to locate is a line beginning with
 FOO:, followed by an unknown number of lines, terminating with a
 blank line. Clearly the .* phrase does not match the single newlines
 occuring inside the block.
 
 Suggestions are warmly welcomed.
 
 I suggest you read the manual first:
 
 .
 (Dot.) In the default mode, this matches any character except a newline.
 If the DOTALL flag has been specified, this matches any character
 including a newline.
 

I think you need to write you own function. Something like:

for x in open('_file_name'):
 if x == 'Foo:\n':
 flag=1
 if x == '\n':
 flag=0
 if flag == 1:
 print x


if the line is 'FOO: _some_more_data_' you may try, 
if x.startswith('Foo:'):
instead of
if x == 'Foo:\n':

Hope this help.

Shantanoo
-- 
http://mail.python.org/mailman/listinfo/python-list


regex help

2005-08-10 Thread jeff sacksteder
Regex questions seem to be rather resistant to googling.

My regex currently looks like - 'FOO:.*\n\n'

The chunk of text I am attempting to locate is a line beginning with
FOO:, followed by an unknown number of lines, terminating with a
blank line. Clearly the .* phrase does not match the single newlines
occuring inside the block.

Suggestions are warmly welcomed.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regex help

2005-08-10 Thread Christopher Subich
jeff sacksteder wrote:
 Regex questions seem to be rather resistant to googling.
 
 My regex currently looks like - 'FOO:.*\n\n'
 
 The chunk of text I am attempting to locate is a line beginning with
 FOO:, followed by an unknown number of lines, terminating with a
 blank line. Clearly the .* phrase does not match the single newlines
 occuring inside the block.

Include the re.DOTALL flag when you compile the regular expression.
-- 
http://mail.python.org/mailman/listinfo/python-list


Multiline regex help

2005-03-03 Thread Yatima
Hey Folks,

I've got some info in a bunch of files that kind of looks like so:

Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34

and so on...

Anyhow, these fields repeat several times in a given file (number of
repetitions varies from file to file). The number on the line following the
RelevantInfo lines is really what I'm after. Ideally, I would like to have
something like so:

RelevantInfo1 = 10/10/04 # The variable name isn't actually important
RelevantInfo3 = 23   # it's just there to illustrate what info I'm
 # trying to snag.

Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2

Collected from all of the files.

So, there would be several of these scores per file and there are a bunch
of files. Ultimately, I am interested in printing them out as a csv file but
that should be relatively easy once they are trapped in my array of doom
cue evil laughter.

I've got a fairly ugly solution (I am using this term *very* loosely)
using awk and his faithfail companion sed, but I would prefer something in
python.

Thanks for your time.

-- 
McGowan's Madison Avenue Axiom:
If an item is advertised as under $50, you can bet it's not $19.95.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Kent Johnson
Yatima wrote:
Hey Folks,
I've got some info in a bunch of files that kind of looks like so:
Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34
and so on...
Anyhow, these fields repeat several times in a given file (number of
repetitions varies from file to file). The number on the line following the
RelevantInfo lines is really what I'm after. Ideally, I would like to have
something like so:
RelevantInfo1 = 10/10/04 # The variable name isn't actually important
RelevantInfo3 = 23   # it's just there to illustrate what info I'm
 # trying to snag.
Here is a way to create a list of [RelevantInfo, value] pairs:
import cStringIO
raw_data = '''Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34'''
raw_data = cStringIO.StringIO(raw_data)
data = []
for line in raw_data:
if line.startswith('RelevantInfo'):
key = line.strip()
value = raw_data.next().strip()
data.append([key, value])
print data

Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2
I'm not sure what you mean by this. Do you want to build a Score dictionary 
as well?
Kent
Collected from all of the files.
So, there would be several of these scores per file and there are a bunch
of files. Ultimately, I am interested in printing them out as a csv file but
that should be relatively easy once they are trapped in my array of doom
cue evil laughter.
I've got a fairly ugly solution (I am using this term *very* loosely)
using awk and his faithfail companion sed, but I would prefer something in
python.
Thanks for your time.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Steven Bethard
Yatima wrote:
Hey Folks,
I've got some info in a bunch of files that kind of looks like so:
Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34
and so on...
Anyhow, these fields repeat several times in a given file (number of
repetitions varies from file to file). The number on the line following the
RelevantInfo lines is really what I'm after. Ideally, I would like to have
something like so:
RelevantInfo1 = 10/10/04 # The variable name isn't actually important
RelevantInfo3 = 23   # it's just there to illustrate what info I'm
 # trying to snag.
Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2
A possible solution, using the re module:
py s = \
... Gibberish
... 53
... MoreGarbage
... 12
... RelevantInfo1
... 10/10/04
... NothingImportant
... ThisDoesNotMatter
... 44
... RelevantInfo2
... 22
... BlahBlah
... 343
... RelevantInfo3
... 23
... Hubris
... Crap
... 34
... 
py import re
py m = re.compile(r^RelevantInfo1\n([^\n]*)
....*
...^RelevantInfo2\n([^\n]*)
....*
...^RelevantInfo3\n([^\n]*),
...re.DOTALL | re.MULTILINE | re.VERBOSE)
py score = {}
py for info1, info2, info3 in m.findall(s):
... score.setdefault(info1, {})[info3] = info2
...
py score
{'10/10/04': {'23': '22'}}
Note that I use DOTALL to allow .* to cross line boundaries, MULTILINE 
to have ^ apply at the start of each line, and VERBOSE to allow me to 
write the re in a more readable form.

If I didn't get your dict update quite right, hopefully you can see how 
to fix it!

HTH,
STeVe
--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard [EMAIL PROTECTED] wrote:

 A possible solution, using the re module:

 py s = \
 ... Gibberish
 ... 53
 ... MoreGarbage
 ... 12
 ... RelevantInfo1
 ... 10/10/04
 ... NothingImportant
 ... ThisDoesNotMatter
 ... 44
 ... RelevantInfo2
 ... 22
 ... BlahBlah
 ... 343
 ... RelevantInfo3
 ... 23
 ... Hubris
 ... Crap
 ... 34
 ... 
 py import re
 py m = re.compile(r^RelevantInfo1\n([^\n]*)
 ....*
 ...^RelevantInfo2\n([^\n]*)
 ....*
 ...^RelevantInfo3\n([^\n]*),
 ...re.DOTALL | re.MULTILINE | re.VERBOSE)
 py score = {}
 py for info1, info2, info3 in m.findall(s):
 ... score.setdefault(info1, {})[info3] = info2
 ...
 py score
 {'10/10/04': {'23': '22'}}

 Note that I use DOTALL to allow .* to cross line boundaries, MULTILINE 
 to have ^ apply at the start of each line, and VERBOSE to allow me to 
 write the re in a more readable form.

 If I didn't get your dict update quite right, hopefully you can see how 
 to fix it!

Thanks! That was very helpful. Unfortunately, I wasn't completely clear when
describing the problem. Is there anyway to extract multiple scores from the
same file and from multiple files (I will probably use the fileinput
module to deal with multiple files). So, if I've got say:

Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34

SecondSetofGarbage
2423
YouGetThePicture
342342
RelevantInfo1
10/10/04
HoHum
343
MoreStuffNotNeeded
232
RelevantInfo2
33
RelevantInfo3
44
sdfsdf
RelevantInfo1
10/11/04
InsertBoringFillerHere
43234
Stuff
MoreStuff
RelevantInfo2
45
ExcitingIsntIt
324234
RelevantInfo3
60
Lalala

Sorry for the long and painful example input. Notice that the first two
RelevantInfo1 fields have the same info but that the RelevantInfo2 and
RelevantInfo3 fields have different info. Also, there will be cases where
RelevantInfo3 might be the same with a different RelevantInfo2. What, I'm
hoping for is something along then lines of being able to organize it like
so (don't worry about the format of the output -- I'll deal with that
later; RelevantInfo shortened to Info for readability):

Info1[0],   Info[1],Info[2] ...
Info3[0]Info2[Info1[0],Info3[0]]Info2[Info1[1],Info3[1]]...
Info3[1]Info2[Info1[0],Info3[1]]...
Info3[2]Info2[Info1[0],Info3[2]]...
...

I don't really care if it's a list, dictionary, array etc. 

Thanks again for your help. The multiline option in the re module is very
useful. 

Take care.

-- 
Clarke's Conclusion:
Never let your sense of morals interfere with doing the right thing.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread James Stroud
Have a look at martel, part of biopython. The world of bioinformatics is 
filled with files with structure like this.

http://www.biopython.org/docs/api/public/Martel-module.html

James

On Thursday 03 March 2005 12:03 pm, Yatima wrote:
 On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard 
[EMAIL PROTECTED] wrote:
  A possible solution, using the re module:
 
  py s = \
  ... Gibberish
  ... 53
  ... MoreGarbage
  ... 12
  ... RelevantInfo1
  ... 10/10/04
  ... NothingImportant
  ... ThisDoesNotMatter
  ... 44
  ... RelevantInfo2
  ... 22
  ... BlahBlah
  ... 343
  ... RelevantInfo3
  ... 23
  ... Hubris
  ... Crap
  ... 34
  ... 
  py import re
  py m = re.compile(r^RelevantInfo1\n([^\n]*)
  ....*
  ...^RelevantInfo2\n([^\n]*)
  ....*
  ...^RelevantInfo3\n([^\n]*),
  ...re.DOTALL | re.MULTILINE | re.VERBOSE)
  py score = {}
  py for info1, info2, info3 in m.findall(s):
  ... score.setdefault(info1, {})[info3] = info2
  ...
  py score
  {'10/10/04': {'23': '22'}}
 
  Note that I use DOTALL to allow .* to cross line boundaries, MULTILINE
  to have ^ apply at the start of each line, and VERBOSE to allow me to
  write the re in a more readable form.
 
  If I didn't get your dict update quite right, hopefully you can see how
  to fix it!

 Thanks! That was very helpful. Unfortunately, I wasn't completely clear
 when describing the problem. Is there anyway to extract multiple scores
 from the same file and from multiple files (I will probably use the
 fileinput module to deal with multiple files). So, if I've got say:

 Gibberish
 53
 MoreGarbage
 12
 RelevantInfo1
 10/10/04
 NothingImportant
 ThisDoesNotMatter
 44
 RelevantInfo2
 22
 BlahBlah
 343
 RelevantInfo3
 23
 Hubris
 Crap
 34

 SecondSetofGarbage
 2423
 YouGetThePicture
 342342
 RelevantInfo1
 10/10/04
 HoHum
 343
 MoreStuffNotNeeded
 232
 RelevantInfo2
 33
 RelevantInfo3
 44
 sdfsdf
 RelevantInfo1
 10/11/04
 InsertBoringFillerHere
 43234
 Stuff
 MoreStuff
 RelevantInfo2
 45
 ExcitingIsntIt
 324234
 RelevantInfo3
 60
 Lalala

 Sorry for the long and painful example input. Notice that the first two
 RelevantInfo1 fields have the same info but that the RelevantInfo2 and
 RelevantInfo3 fields have different info. Also, there will be cases where
 RelevantInfo3 might be the same with a different RelevantInfo2. What, I'm
 hoping for is something along then lines of being able to organize it like
 so (don't worry about the format of the output -- I'll deal with that
 later; RelevantInfo shortened to Info for readability):

 Info1[0],   Info[1],Info[2]
 ... Info3[0]Info2[Info1[0],Info3[0]]Info2[Info1[1],Info3[1]]...
 Info3[1]Info2[Info1[0],Info3[1]]...
 Info3[2]Info2[Info1[0],Info3[2]]...
 ...

 I don't really care if it's a list, dictionary, array etc.

 Thanks again for your help. The multiline option in the re module is very
 useful.

 Take care.

 --
 Clarke's Conclusion:
   Never let your sense of morals interfere with doing the right thing.

-- 
James Stroud, Ph.D.
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 03 Mar 2005 07:14:50 -0500, Kent Johnson [EMAIL PROTECTED] wrote:

 Here is a way to create a list of [RelevantInfo, value] pairs:
 import cStringIO

 raw_data = '''Gibberish
 53
 MoreGarbage
 12
 RelevantInfo1
 10/10/04
 NothingImportant
 ThisDoesNotMatter
 44
 RelevantInfo2
 22
 BlahBlah
 343
 RelevantInfo3
 23
 Hubris
 Crap
 34'''
 raw_data = cStringIO.StringIO(raw_data)

 data = []
 for line in raw_data:
  if line.startswith('RelevantInfo'):
  key = line.strip()
  value = raw_data.next().strip()
  data.append([key, value])

 print data


Thank you. This isn't exactly what I'm looking for (I wasn't clear in
describing the problem -- please see my reply to Steve for a, hopefully,
better explanation) but it does give me a few ideas.

 
 Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2

 I'm not sure what you mean by this. Do you want to build a Score dictionary 
 as well?

Sure... Uhhh.. I think. Okay, what I want is some kind of awk-like
associative array because the raw data files will have repeats for certain
field vaues such that there would be, for example, multiple RelevantInfo2's
and RelevantInfo3's for the same RelevantInfo1 (i.e. on the same date). To
make matters more exciting, there will be multiple RelevantInfo1's (dates)
for the same RelevantInfo3 (e.g. a subject ID). RelevantInfo2 will be the
value for all unique combinations of RelevantInfo1 and RelevantInfo3. There
will be multiple occurrences of these fields in the same file (original data
sample was not very good for this reason) and multiple files as well. The
interesting three fields will always be repeated in the same order although
the amount of irrelevant data in between may vary. So:

RelevantInfo1
10/10/04
snipped crap
RelevantInfo2
12
more snippage
RelevantInfo3
43
more snippage
RelevantInfo1
10/10/04- The same as the first occurrence of RelevantInfo1
snipped
RelevantInfo2
22
snipped
RelevantInfo3
25
snipped
RelevantInfo1
10/11/04
snipped
RelevantInfo2
34
snipped
RelevantInfo3
28
snipped
RelevantInfo1
10/12/04
snipped
RelevantInfo2
98
snipped
RelevantInfo3
25- The same as the second occurrence of RelevantInfo3
...

Sorry for the long and tedious data example.

There will be missing values for some combinations of RelevantInfo1 and
RelevantInfo3 so hopefully that won't be an issue.

Thanks again for your reply.

Take care.

-- 
I figured there was this holocaust, right, and the only ones left alive were
 Donna Reed, Ozzie and Harriet, and the Cleavers.
-- Wil Wheaton explains why everyone in Star Trek: The Next Generation 
is so nice
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread James Stroud
I found the original paper for Martel:

http://www.dalkescientific.com/Martel/ipc9/

On Thursday 03 March 2005 12:26 pm, James Stroud wrote:
 Have a look at martel, part of biopython. The world of bioinformatics is
 filled with files with structure like this.

 http://www.biopython.org/docs/api/public/Martel-module.html

 James

 On Thursday 03 March 2005 12:03 pm, Yatima wrote:

-- 
James Stroud, Ph.D.
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Steven Bethard
Yatima wrote:
On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard [EMAIL PROTECTED] wrote:
A possible solution, using the re module:
py s = \
... Gibberish
... 53
... MoreGarbage
... 12
... RelevantInfo1
... 10/10/04
... NothingImportant
... ThisDoesNotMatter
... 44
... RelevantInfo2
... 22
... BlahBlah
... 343
... RelevantInfo3
... 23
... Hubris
... Crap
... 34
... 
py import re
py m = re.compile(r^RelevantInfo1\n([^\n]*)
....*
...^RelevantInfo2\n([^\n]*)
....*
...^RelevantInfo3\n([^\n]*),
...re.DOTALL | re.MULTILINE | re.VERBOSE)
py score = {}
py for info1, info2, info3 in m.findall(s):
... score.setdefault(info1, {})[info3] = info2
...
py score
{'10/10/04': {'23': '22'}}
Note that I use DOTALL to allow .* to cross line boundaries, MULTILINE 
to have ^ apply at the start of each line, and VERBOSE to allow me to 
write the re in a more readable form.

If I didn't get your dict update quite right, hopefully you can see how 
to fix it!

Thanks! That was very helpful. Unfortunately, I wasn't completely clear when
describing the problem. Is there anyway to extract multiple scores from the
same file and from multiple files
I think if you use the non-greedy .*? instead of the greedy .*, you'll 
get this behavior.  For example:

py s = \
... Gibberish
... 53
... MoreGarbage
[snip a whole bunch of stuff]
... RelevantInfo3
... 60
... Lalala
... 
py import re
py m = re.compile(r^RelevantInfo1\n([^\n]*)
....*?
...^RelevantInfo2\n([^\n]*)
....*?
...^RelevantInfo3\n([^\n]*),
...re.DOTALL | re.MULTILINE | re.VERBOSE)
py score = {}
py for info1, info2, info3 in m.findall(s):
... score.setdefault(info1, {})[info3] = info2
...
py score
{'10/10/04': {'44': '33', '23': '22'}, '10/11/04': {'60': '45'}}
If you might have multiple info2 values for the same (info1, info3) 
pair, you can try something like:

py score = {}
py for info1, info2, info3 in m.findall(s):
... score.setdefault(info1, {}).setdefault(info3, []).append(info2)
...
py score
{'10/10/04': {'44': ['33'], '23': ['22']}, '10/11/04': {'60': ['45']}}
HTH,
STeVe
--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Kent Johnson
Here is another attempt. I'm still not sure I understand what form you want the data in. I made a 
dict - dict - list structure so if you lookup e.g. scores['10/11/04']['60'] you get a list of all 
the RelevantInfo2 values for Relevant1='10/11/04' and Relevant2='60'.

The parser is a simple-minded state machine that will misbehave if the input does not have entries 
in the order Relevant1, Relevant2, Relevant3 (with as many intervening lines as you like).

All three values are available when Relevant3 is detected so you could do something else with them 
if you want.

HTH
Kent
import cStringIO
raw_data = '''Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34
Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34
SecondSetofGarbage
2423
YouGetThePicture
342342
RelevantInfo1
10/10/04
HoHum
343
MoreStuffNotNeeded
232
RelevantInfo2
33
RelevantInfo3
44
sdfsdf
RelevantInfo1
10/11/04
InsertBoringFillerHere
43234
Stuff
MoreStuff
RelevantInfo2
45
ExcitingIsntIt
324234
RelevantInfo3
60
Lalala'''
raw_data = cStringIO.StringIO(raw_data)
scores = {}
info1 = info2 = info3 = None
for line in raw_data:
if line.startswith('RelevantInfo1'):
info1 = raw_data.next().strip()
elif line.startswith('RelevantInfo2'):
info2 = raw_data.next().strip()
elif line.startswith('RelevantInfo3'):
info3 = raw_data.next().strip()
scores.setdefault(info1, {}).setdefault(info3, []).append(info2)
info1 = info2 = info3 = None
print scores
print scores['10/11/04']['60']
print scores['10/10/04']['23']
## prints:
{'10/10/04': {'44': ['33'], '23': ['22', '22']}, '10/11/04': {'60': ['45']}}
['45']
['22', '22']
--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 03 Mar 2005 16:25:39 -0500, Kent Johnson [EMAIL PROTECTED] wrote:
 Here is another attempt. I'm still not sure I understand what form you want 
 the data in. I made a 
 dict - dict - list structure so if you lookup e.g. scores['10/11/04']['60'] 
 you get a list of all 
 the RelevantInfo2 values for Relevant1='10/11/04' and Relevant2='60'.

 The parser is a simple-minded state machine that will misbehave if the input 
 does not have entries 
 in the order Relevant1, Relevant2, Relevant3 (with as many intervening lines 
 as you like).

 All three values are available when Relevant3 is detected so you could do 
 something else with them 
 if you want.

 HTH
 Kent

 import cStringIO

 raw_data = '''Gibberish
 53
 MoreGarbage
[mass snippage]
 60
 Lalala'''
 raw_data = cStringIO.StringIO(raw_data)

 scores = {}
 info1 = info2 = info3 = None

 for line in raw_data:
  if line.startswith('RelevantInfo1'):
  info1 = raw_data.next().strip()
  elif line.startswith('RelevantInfo2'):
  info2 = raw_data.next().strip()
  elif line.startswith('RelevantInfo3'):
  info3 = raw_data.next().strip()
  scores.setdefault(info1, {}).setdefault(info3, []).append(info2)
  info1 = info2 = info3 = None

 print scores
 print scores['10/11/04']['60']
 print scores['10/10/04']['23']

 ## prints:
 {'10/10/04': {'44': ['33'], '23': ['22', '22']}, '10/11/04': {'60': ['45']}}
 ['45']
 ['22', '22']

Thank you so much. Your solution and Steve's both give me what I'm looking
for. I appreciate both of your incredibly quick replies!

Take care.

-- 
You worry too much about your job.  Stop it.  You are not paid enough to worry.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 3 Mar 2005 12:26:37 -0800, James Stroud [EMAIL PROTECTED] wrote:
 Have a look at martel, part of biopython. The world of bioinformatics is 
 filled with files with structure like this.

 http://www.biopython.org/docs/api/public/Martel-module.html

 James

Thanks for the link. Steve and Kent have provided me with nice solutions but
I will check this out anyways for future referenced.

Take care.

-- 
You may easily play a joke on a man who likes to argue -- agree with him.
-- Ed Howe
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline regex help

2005-03-03 Thread Steven Bethard
Kent Johnson wrote:
for line in raw_data:
if line.startswith('RelevantInfo1'):
info1 = raw_data.next().strip()
elif line.startswith('RelevantInfo2'):
info2 = raw_data.next().strip()
elif line.startswith('RelevantInfo3'):
info3 = raw_data.next().strip()
scores.setdefault(info1, {}).setdefault(info3, []).append(info2)
info1 = info2 = info3 = None
Very pretty. =)  I have to say, I hadn't ever used iterators this way 
before, that is, calling their next method from within a for-loop.  I 
like it. =)

Thanks for opening my mind. ;)
STeVe
--
http://mail.python.org/mailman/listinfo/python-list


  1   2   >