subject:"regular expression help"

regular expression help

2010-11-29 Thread goldtech

Hi,

say:
 import re
 m=cccvlvlvlvnnnflfllffccclfnnnooo
 re.compile(r'ccc.*nnn')
 rtt=.sub(||,m)
 rtt
'||ooo'

The regex is eating up too much. What I want is every non-overlapping
occurrence I think.

so rtt would be:

'||flfllff||ooo'

just like findall acts but in this case I want sub to act like that.

Thanks

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: regular expression help

2010-11-29 Thread Yingjie Lan

--- On Tue, 11/30/10, goldtech goldt...@worldpost.com wrote:

 From: goldtech goldt...@worldpost.com
 Subject: regular expression help
 To: python-list@python.org
 Date: Tuesday, November 30, 2010, 9:17 AM
 The regex is eating up too much. What I want is every
 non-overlapping
 occurrence I think.

 so rtt would be:

 '||flfllff||ooo'

Hi, I'll just let Python do most of the talk here.

 import re
 m=cccvlvlvlvnnnflfllffccclfnnnooo
 p=re.compile(r'ccc.*?nnn')
 p.sub(||, m)
'||flfllff||ooo'

Cheers,

Yingjie

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: regular expression help

2010-11-29 Thread Tim Harig

On 2010-11-30, goldtech goldt...@worldpost.com wrote:
 Hi,

 say:
 import re
 m=cccvlvlvlvnnnflfllffccclfnnnooo
 re.compile(r'ccc.*nnn')
 rtt=.sub(||,m)
 rtt
 '||ooo'

 The regex is eating up too much. What I want is every non-overlapping
 occurrence I think.

 so rtt would be:

 '||flfllff||ooo'

Python 3.1.2 (r312:79147, Oct  9 2010, 00:16:06)
[GCC 4.4.4] on linux2
Type help, copyright, credits or license for more information.
 import re
 m=cccvlvlvlvnnnflfllffccclfnnnooo
 pattern = re.compile(r'ccc[^n]*nnn')
 pattern.sub(||, m)
'||flfllff||ooo'

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: regular expression help

2010-11-29 Thread Tim Harig

Python 3.1.2 (r312:79147, Oct  9 2010, 00:16:06)
[GCC 4.4.4] on linux2
Type help, copyright, credits or license for more information.
 import re
 m=cccvlvlvlvnnnflfllffccclfnnnooo
 pattern = re.compile(r'ccc[^n]*nnn')
 pattern.sub(||, m)
'||flfllff||ooo'
 # or, assuming that the middle sequence might contain singular or
 # double 'n's
 pattern = re.compile(r'ccc.*?nnn')
 pattern.sub(||, m)
'||flfllff||ooo'

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: regular expression help

2010-11-29 Thread goldtech

 .*?  fixed it. Every occurrence of the pattern is now affected, which
is what I want.

Thank you very much.
-- 
http://mail.python.org/mailman/listinfo/python-list

Python's regular expression help

2010-04-29 Thread goldtech

Hi,
Trying to start out with simple things but apparently there's some
basics I need help with. This works OK:
 import re
 p = re.compile('(ab*)(sss)')
 m = p.match( 'absss' )
 m.group(0)
'absss'
 m.group(1)
'ab'
 m.group(2)
'sss'
...
But two questions:

How can I operate a regex on a string variable?
I'm doing something wrong here:

 f=r'abss'
 f
'abss'
 m = p.match( f )
 m.group(0)
Traceback (most recent call last):
  File pyshell#15, line 1, in module
m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'

How do I implement a regex on a multiline string?  I thought this
might work but there's problem:

 p = re.compile('(ab*)(sss)', re.S)
 m = p.match( 'ab\nsss' )
 m.group(0)
Traceback (most recent call last):
  File pyshell#26, line 1, in module
m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'


Thanks for the newbie regex help, Lee
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python's regular expression help

2010-04-29 Thread Dodo


Le 29/04/2010 20:00, goldtech a écrit :

Hi,
Trying to start out with simple things but apparently there's some
basics I need help with. This works OK:

import re
p = re.compile('(ab*)(sss)')
m = p.match( 'absss' )
m.group(0)

'absss'

m.group(1)

'ab'

m.group(2)

'sss'
...
But two questions:

How can I operate a regex on a string variable?
I'm doing something wrong here:


f=r'abss'
f

'abss'

m = p.match( f )
m.group(0)

Traceback (most recent call last):
   File pyshell#15, line 1, inmodule
 m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'

How do I implement a regex on a multiline string?  I thought this
might work but there's problem:


p = re.compile('(ab*)(sss)', re.S)
m = p.match( 'ab\nsss' )
m.group(0)

Traceback (most recent call last):
   File pyshell#26, line 1, inmodule
 m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'




Thanks for the newbie regex help, Lee


for multiline, I use re.DOTALL

I do not know match(), findall is pretty efficient :
my = a href=\hello world.com\LINK/a
res = re.findall((.*?),my)
 res
['LINK']

Dorian
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python's regular expression help

2010-04-29 Thread MRAB


goldtech wrote:

Hi,
Trying to start out with simple things but apparently there's some
basics I need help with. This works OK:

import re
p = re.compile('(ab*)(sss)')
m = p.match( 'absss' )
m.group(0)

'absss'

m.group(1)

'ab'

m.group(2)

'sss'
...
But two questions:

How can I operate a regex on a string variable?
I'm doing something wrong here:


f=r'abss'
f

'abss'

m = p.match( f )
m.group(0)

Traceback (most recent call last):
  File pyshell#15, line 1, in module
m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'


Look closely: the regex contains 3 letter 's', but the string referred
to by f has only 2.


How do I implement a regex on a multiline string?  I thought this
might work but there's problem:


p = re.compile('(ab*)(sss)', re.S)
m = p.match( 'ab\nsss' )
m.group(0)

Traceback (most recent call last):
  File pyshell#26, line 1, in module
m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'

Thanks for the newbie regex help, Lee


The string contains a newline between the 'b' and the 's', but the regex
isn't expecting any newline (or any other character) between the 'b' and
the 's', hence no match.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python's regular expression help

2010-04-29 Thread Tim Chase


On 04/29/2010 01:00 PM, goldtech wrote:

Trying to start out with simple things but apparently there's some
basics I need help with. This works OK:

import re
p = re.compile('(ab*)(sss)')
m = p.match( 'absss' )



f=r'abss'
f

'abss'

m = p.match( f )
m.group(0)

Traceback (most recent call last):
   File pyshell#15, line 1, inmodule
 m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'


'absss' != 'abss'

Your regexp looks for 3 s, your f contains only 2.  So the 
regexp object doesn't, well, match.  Try


  f = 'absss'

and it will work.  As an aside, using raw-strings for this text 
doesn't change anything, but if you want, you _can_ write it as


  f = r'absss'

if it will make you feel better :)


How do I implement a regex on a multiline string?  I thought this
might work but there's problem:


p = re.compile('(ab*)(sss)', re.S)
m = p.match( 'ab\nsss' )
m.group(0)

Traceback (most recent call last):
   File pyshell#26, line 1, inmodule
 m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'


Well, it depends on what you want to do -- regexps are fairly 
precise, so if you want to allow whitespace between the two, you 
can use


  r = re.compile(r'(ab*)\s*(sss)')

If you want to allow whitespace anywhere, it gets uglier, and 
your capture/group results will contain that whitespace:


  r'(a\s*b*)\s*(s\s*s\s*s)'

Alternatively, if you don't want to allow arbitrary whitespace 
but only newlines, you can use \n* instead of \s*


-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Re: Python's regular expression help

2010-04-29 Thread goldtech

On Apr 29, 11:49 am, Tim Chase python.l...@tim.thechases.com wrote:
 On 04/29/2010 01:00 PM, goldtech wrote:

  Trying to start out with simple things but apparently there's some
  basics I need help with. This works OK:
  import re
  p = re.compile('(ab*)(sss)')
  m = p.match( 'absss' )

  f=r'abss'
  f
  'abss'
  m = p.match( f )
  m.group(0)
  Traceback (most recent call last):
     File pyshell#15, line 1, inmodule
       m.group(0)
  AttributeError: 'NoneType' object has no attribute 'group'

 'absss' != 'abss'

 Your regexp looks for 3 s, your f contains only 2.  So the
 regexp object doesn't, well, match.  Try

    f = 'absss'

 and it will work.  As an aside, using raw-strings for this text
 doesn't change anything, but if you want, you _can_ write it as

    f = r'absss'

 if it will make you feel better :)

  How do I implement a regex on a multiline string?  I thought this
  might work but there's problem:

  p = re.compile('(ab*)(sss)', re.S)
  m = p.match( 'ab\nsss' )
  m.group(0)
  Traceback (most recent call last):
     File pyshell#26, line 1, inmodule
       m.group(0)
  AttributeError: 'NoneType' object has no attribute 'group'

 Well, it depends on what you want to do -- regexps are fairly
 precise, so if you want to allow whitespace between the two, you
 can use

    r = re.compile(r'(ab*)\s*(sss)')

 If you want to allow whitespace anywhere, it gets uglier, and
 your capture/group results will contain that whitespace:

    r'(a\s*b*)\s*(s\s*s\s*s)'

 Alternatively, if you don't want to allow arbitrary whitespace
 but only newlines, you can use \n* instead of \s*

 -tkc

Yes, most of my problem is w/my patterns not w/any python re syntax.

I thought re.S will take a multiline string with any spaces or
newlines and make it appear as one line to the regex. Make /n be
ignored in a way...still playing w/it. Thanks for the help!
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression Help

2009-04-13 Thread Graham Breed


Jean-Claude Neveu wrote:

Hello,

I was wondering if someone could tell me where I'm going wrong with my 
regular expression. I'm trying to write a regexp that identifies whether 
a string contains a correctly-formatted currency amount. I want to 
support dollars, UK pounds and Euros, but the example below deliberately 
omits Euros in case the Euro symbol get mangled anywhere in email or 
listserver processing. I also want people to be able to omit the 
currency symbol if they wish.


If Euro symbols can get mangled, so can Pound signs. 
They're both outside ASCII.



My regexp that I'm matching against is: ^\$\£?\d{0,10}(\.\d{2})?$

Here's how I think it should work (but clearly I'm wrong, because it 
does not actually work):


^\$\£?  Require zero or one instance of $ or £ at the start of the 
string.


^[$£]? is correct.  And, as you're using re.match, the ^ is 
superfluous.  (A previous message suggested ^[\$£]? which 
will also work.  You generally need to escape a Dollar sign 
but not here.)


You should also think about the encoding.  In my terminal, 
£ is identical to '\xc2\xa3'.  That is, two bytes for a 
UTF-8 code point.  If you assume this encoding, it's best to 
make it explicit.  And if you don't assume a specific 
encoding it's best to convert to unicode to do the 
comparisons, so for 2.x (or portability) your string should 
start u



d{0,10} Next, require between zero and ten alpha characters.


There's a backslash missing, but not from your original 
expression.  Digits are not alpha characters.


(\.\d{2})?  Optionally, two characters can follow. They must be preceded 
by a decimal point.


That works.  Of course, \d{2} is longer than the simpler \d\d

Note that you can comment the original expression like this:

rex = u(?x)
^[$£]?# Zero or one instance of $ or £
   # at the start of the string.
\d{0,10}   # Between zero and ten digits
(\.\d{2})? # Optionally, two digits.
   # They must be preceded by a decimal point.
$  # End of line


Then anybody (including you) who comes to read this in the 
future will have some idea what you were trying to do.


\ Examples of acceptable input should be:


$12.42
$12
£12.42
$12,482.96  (now I think about it, I have not catered for this in my 
regexp)


Yes, you need to think about that.


   Graham

--
http://mail.python.org/mailman/listinfo/python-list

Regular Expression Help

2009-04-11 Thread Jean-Claude Neveu


Hello,

I was wondering if someone could tell me where 
I'm going wrong with my regular expression. I'm 
trying to write a regexp that identifies whether 
a string contains a correctly-formatted currency 
amount. I want to support dollars, UK pounds and 
Euros, but the example below deliberately omits 
Euros in case the Euro symbol get mangled 
anywhere in email or listserver processing. I 
also want people to be able to omit the currency symbol if they wish.


My regexp that I'm matching against is: ^\$\£?\d{0,10}(\.\d{2})?$

Here's how I think it should work (but clearly 
I'm wrong, because it does not actually work):


^\$\£?  Require zero or one instance of $ or £ at the start of the string.
d{0,10} Next, require between zero and ten alpha characters.
(\.\d{2})?  Optionally, two characters can 
follow. They must be preceded by a decimal point.


Examples of acceptable input should be:

$12.42
$12
£12.42
$12,482.96  (now I think about it, I have not catered for this in my regexp)

And unacceptable input would be:

$12b.42
blah
$blah
etc


Here is my Python script:

#
import re

def is_currency(str):
   rex = ^\$\£?\d{0,10}(\.\d{2})?$
   if re.match(rex, str):
  return 1
   else:
  return 0

def test_match(str):
   if is_currency (str):
  print str +  is a match
   else:
  print str +  is not a match

# All should match except the last two
test_match($12.47)
test_match(12.47)
test_match(£12.47)
test_match(£12)
test_match($12)
test_match($12588.47)
test_match($12,588.47)
test_match(£12588.47)
test_match(12588.47)
test_match(£12588)
test_match($12588)
test_match(blah)
test_match($12b.56)


AND HERE IS THE OUTPUT FROM THE ABOVE SCRIPT:
$12.47 is a match
12.47 is not a match
£12.47 is not a match
£12 is not a match
$12 is a match
$12588.47 is a match
$12,588.47 is not a match
£12588.47 is not a match
12588.47 is not a match
£12588 is not a match
$12588 is a match
blah is not a match
$12b.56 is not a match

Many thanks in advance. Regular expressions are not my strong suit :)

J-C

--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression Help

2009-04-11 Thread rurpy

On Apr 11, 9:42 pm, Jean-Claude Neveu jcn-france1...@pobox.com
wrote:

 My regexp that I'm matching against is: ^\$\£?\d{0,10}(\.\d{2})?$

 Here's how I think it should work (but clearly
 I'm wrong, because it does not actually work):

 ^\$\£?  Require zero or one instance of $ or £ at the start of the string.

The or in $ or £ above is a vertical bar.  You
want ^(\$|£)? here.

 d{0,10} Next, require between zero and ten alpha characters.
 (\.\d{2})?  Optionally, two characters can
 follow. They must be preceded by a decimal point.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression Help

2009-04-11 Thread John Machin

On Apr 12, 2:19 pm, ru...@yahoo.com wrote:
 On Apr 11, 9:42 pm, Jean-Claude Neveu jcn-france1...@pobox.com
 wrote:

  My regexp that I'm matching against is: ^\$\£?\d{0,10}(\.\d{2})?$

  Here's how I think it should work (but clearly
  I'm wrong, because it does not actually work):

  ^\$\£?      Require zero or one instance of $ or £ at the start of the 
  string.

 The or in $ or £ above is a vertical bar.  You
 want ^(\$|£)? here.

Best not to use a capturing group (blah) when you don't need to
capture ... use (?:blah) instead.

When the alternatives are all single characters, for greater typing
efficiency and computing efficiency use a character class:

^[\$£]?
--
http://mail.python.org/mailman/listinfo/python-list

regular expression, help

2009-01-27 Thread Vincent Davis

I think there are two parts to this question and I am sure lots I am
missing. I am hoping an example will help meI have a html doc that I am
trying to use regular expressions to get a value out of.
here is an example or the line
td colspan='2'Parcel ID: 39-034-15-009 /td
I want to get the number 39-034-15-009 after Parcel ID: The number will
be different each time but always the same format.
I think I can match Parcel ID: but not sure how to get the number after.
Parcel ID: only occurs once in the document.

is this how i need to start?
pid = re.compile('Parcel ID: ')

Basically I am completely lost and am not finding examples I find helpful.

I am getting the html using myurl=urllib.urlopen().
Can I use RE like this
thenum=pid.match(myurl)


I think the two key things I need to know are
1, how do I get the text after a match?
2, when I use myurl=urllib.urlopen(http://...). can I use the myurl as
the string in a RE, thenum=pid.match(myurl)

Thanks
Vincent
--
http://mail.python.org/mailman/listinfo/python-list

regular expression, help

2009-01-27 Thread Vincent Davis

I think there are two parts to this question and I am sure lots I am
missing. I am hoping an example will help meI have a html doc that I am
trying to use regular expressions to get a value out of.
here is an example or the line
td colspan='2'Parcel ID: 39-034-15-009 /td
I want to get the number 39-034-15-009 after Parcel ID: The number will
be different each time but always the same format.
I think I can match Parcel ID: but not sure how to get the number after.
Parcel ID: only occurs once in the document.

is this how i need to start?
pid = re.compile('Parcel ID: ')

Basically I am completely lost and am not finding examples I find helpful.

I am getting the html using myurl=urllib.urlopen().
Can I use RE like this
thenum=pid.match(myurl)


I think the two key things I need to know are
1, how do I get the text after a match?
2, when I use myurl=urllib.urlopen(http://...). can I use the myurl as
the string in a RE, thenum=pid.match(myurl)

Thanks
Vincent
--
http://mail.python.org/mailman/listinfo/python-list

Re: regular expression, help

2009-01-27 Thread Vincent Davis

is BeautifulSoup really better? Since I don't know either I would prefer to
learn only one for now.
Thanks
Vincent Davis



On Tue, Jan 27, 2009 at 10:39 AM, MRAB goo...@mrabarnett.plus.com wrote:

 Vincent Davis wrote:

 I think there are two parts to this question and I am sure lots I am
 missing. I am hoping an example will help me
 I have a html doc that I am trying to use regular expressions to get a
 value out of.
 here is an example or the line
 td colspan='2'Parcel ID: 39-034-15-009 /td
 I want to get the number 39-034-15-009 after Parcel ID: The number
 will be different each time but always the same format.
 I think I can match Parcel ID: but not sure how to get the number after.
 Parcel ID: only occurs once in the document.

 is this how i need to start?
 pid = re.compile('Parcel ID: ')

 Basically I am completely lost and am not finding examples I find helpful.

 I am getting the html using myurl=urllib.urlopen(). Can I use RE like this
 thenum=pid.match(myurl)

 I think the two key things I need to know are
 1, how do I get the text after a match?
 2, when I use myurl=urllib.urlopen(http://...). can I use the myurl
 as the string in a RE, thenum=pid.match(myurl)

  Something like:

 pid = re.compile(r'Parcel ID: (\d+(?:-\d+)*)')
 myurl = urllib.urlopen(url)
 text = myurl.read()
 myurl.close()
 thenum = pid.search(text).group(1)

 Although BeautifulSoup is the preferred solution.
 --
 http://mail.python.org/mailman/listinfo/python-list

--
http://mail.python.org/mailman/listinfo/python-list

Re: regular expression, help

2009-01-27 Thread MRAB


Vincent Davis wrote:
I think there are two parts to this question and I am sure lots I am 
missing. I am hoping an example will help me
I have a html doc that I am trying to use regular expressions to get a 
value out of.

here is an example or the line
td colspan='2'Parcel ID: 39-034-15-009 /td
I want to get the number 39-034-15-009 after Parcel ID: The number 
will be different each time but always the same format.
I think I can match Parcel ID: but not sure how to get the number 
after. Parcel ID: only occurs once in the document.


is this how i need to start?
pid = re.compile('Parcel ID: ')

Basically I am completely lost and am not finding examples I find helpful.

I am getting the html using myurl=urllib.urlopen(). 
Can I use RE like this
thenum=pid.match(myurl) 



I think the two key things I need to know are
1, how do I get the text after a match?
2, when I use myurl=urllib.urlopen(http://...). can I use the myurl 
as the string in a RE, thenum=pid.match(myurl)



Something like:

pid = re.compile(r'Parcel ID: (\d+(?:-\d+)*)')
myurl = urllib.urlopen(url)
text = myurl.read()
myurl.close()
thenum = pid.search(text).group(1)

Although BeautifulSoup is the preferred solution.
--
http://mail.python.org/mailman/listinfo/python-list

Regular expression help: unable to search ' # ' character in the file

2008-09-27 Thread dudeja . rajat

Hi,

Can some help me with the regular expression. I'm looking to search #
character in my file?

My file has contents:

###

Hello World

###

length = 10
breadth = 20
height = 30

###



###

Hello World

###

length = 20
breadth = 30
height = 40

###


I used the following search :

import re

fd = open(file, 'r')
line = fd.readline
pat1 = re.compile(\#*)
while(line):
mat1 = pat1.search(line)
if mat1:
print line
line = fd.readline()


But the above prints the whole file instead of the hash lines only.


Please help


Regards,
Rajat
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular expression help: unable to search ' # ' character in the file

2008-09-27 Thread Fredrik Lundh


[EMAIL PROTECTED] wrote:


import re

fd = open(file, 'r')
line = fd.readline
pat1 = re.compile(\#*)
while(line):
mat1 = pat1.search(line)
if mat1:
print line
line = fd.readline()


I strongly doubt that this is the code you used.


But the above prints the whole file instead of the hash lines only.


* means zero or more matches.  all lines is a file contain zero or 
more # characters.


but using a RE is overkill in this case, of course.  to check for a 
character or substring, use the in operator:


for line in open(file):
if # in line:
print line

/F

--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular expression help: unable to search ' # ' character in the file

2008-09-27 Thread dudeja . rajat

On Sat, Sep 27, 2008 at 1:58 PM, Fredrik Lundh [EMAIL PROTECTED]wrote:

 [EMAIL PROTECTED] wrote:

  import re

 fd = open(file, 'r')
 line = fd.readline
 pat1 = re.compile(\#*)
while(line):
mat1 = pat1.search(line)
if mat1:
print line
line = fd.readline()


 I strongly doubt that this is the code you used.

  But the above prints the whole file instead of the hash lines only.


 * means zero or more matches.  all lines is a file contain zero or more #
 characters.

 but using a RE is overkill in this case, of course.  to check for a
 character or substring, use the in operator:

for line in open(file):
if # in line:
print line

 /F

 --
 http://mail.python.org/mailman/listinfo/python-list


Thanks Fredrik, this works. Indeed  it is a much better and cleaner
approach.

-- 
Regards,
Rajat
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular expression help

2008-07-18 Thread Russell Blau

[EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 I am new to Python, with a background in scientific computing. I'm
 trying to write a script that will take a file with lines like

 c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
 3pv=0

 extract the values of afrac and etot and plot them.
...

 What is being stored in energy is '_sre.SRE_Match object at
 0x2a955e4ed0', not '-11.020107'. Why?

because the re.match() method returns a match object, as documented at 
http://www.python.org/doc/current/lib/match-objects.html

But this looks like a problem where regular expressions are overkill. 
Assuming all your lines are formatted as in the example above (every value 
you are interested in contains an equals sign and is surrounded by spaces), 
you could do this:

values = {}
for expression in line.split( ):
if = in expression:
name, val = expression.split(=)
values[name] = val

I'd wager that this will run a fair bit faster than any regex-based 
solution.  Then you just use values['afrac'] and values['etot'] when you 
need them.

And when you get to be a really hard-core Pythonista, you could write the 
whole routine above in one line, but this seems clearer.  ;-)

Russ



--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular expression help

2008-07-18 Thread Brad


[EMAIL PROTECTED] wrote:

Hello,

I am new to Python, with a background in scientific computing. I'm
trying to write a script that will take a file with lines like

c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
3pv=0

extract the values of afrac and etot...


Why not just split them out instead of using REs?

fp = open(test.txt)
lines = fp.readlines()
fp.close()

for line in lines:
split = line.split()
for pair in split:
pair_split = pair.split(=)
if len(pair_split) == 2:
try:
print pair_split[0], is, pair_split[1]
except:
pass

Results:

IDLE 1.2.2   No Subprocess 

afrac is .7
mmom is 0
sev is -9.56646
erep is 0
etot is -11.020107
emad is -3.597647
3pv is 0

--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular expression help

2008-07-18 Thread Gerard flanagan


[EMAIL PROTECTED] wrote:

Hello,

I am new to Python, with a background in scientific computing. I'm
trying to write a script that will take a file with lines like

c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
3pv=0

extract the values of afrac and etot and plot them. I'm really
struggling with getting the values of efrac and etot. So far I have
come up with (small snippet of script just to get the energy, etot):

def get_data_points(filename):
file = open(filename,'r')
data_points = []
while 1:
line = file.readline()
if not line: break
energy = get_total_energy(line)
data_points.append(energy)
return data_points

def get_total_energy(line):
rawstr = r(?Pkey.*?)=(?Pvalue.*?)\s
p = re.compile(rawstr)
return p.match(line,5)

What is being stored in energy is '_sre.SRE_Match object at
0x2a955e4ed0', not '-11.020107'. Why? 




1. Consider using the 'split' method on each line rather than regexes
2. In your code you are compiling the regex for every line in the file, 
you should lift it out of the 'get_total-energy' function so that the 
compilation is only done once.
3. A Match object has a 'groups' function which is what you need to 
retrieve the data

4. Also look at the findall method:

data = 'c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 
emad=-3.597647 3pv=0 '


import re

rx = re.compile(r'(\w+)=(\S+)')

data = dict(rx.findall(data))

print data

hth

G.

--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular expression help

2008-07-18 Thread Nick Dumas

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I think you're over-complicating this. I'm assuming that you're going to
do a line graph of some sorta, and each new line of the file contains a
new set of data.

The problem you mentioned with your regex returning a match object
rather than a string is because you're simply using a re function that
doesn't return strings. re.findall() is what you want. That being said,
here is working code to mine data from your file.

[code]
line = 'c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107
mad=-3.597647 3pv=0'

energypat = r'\betot=(-?\d*?[.]\d*)'

#Note: To change the data grabbed from the line, you can change the
#'etot' to 'afrac' or 'emad' or anything that doesn't contain a regex
#special character.

energypat = re.compile(energypat)

re.findall(energypat, line)# returns a STRING containing '-12.020107'

[/code]

This returns a string, which is easy enough to convert to an int. After
that, you can datapoints.append() to your heart's content. Good luck
with your work.

[EMAIL PROTECTED] wrote:
 Hello,
 
 I am new to Python, with a background in scientific computing. I'm
 trying to write a script that will take a file with lines like
 
 c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
 3pv=0
 
 extract the values of afrac and etot and plot them. I'm really
 struggling with getting the values of efrac and etot. So far I have
 come up with (small snippet of script just to get the energy, etot):
 
 def get_data_points(filename):
 file = open(filename,'r')
 data_points = []
 while 1:
 line = file.readline()
 if not line: break
 energy = get_total_energy(line)
 data_points.append(energy)
 return data_points
 
 def get_total_energy(line):
 rawstr = r(?Pkey.*?)=(?Pvalue.*?)\s
 p = re.compile(rawstr)
 return p.match(line,5)
 
 What is being stored in energy is '_sre.SRE_Match object at
 0x2a955e4ed0', not '-11.020107'. Why? I've been struggling with
 regular expressions for two days now, with no luck. Could someone
 please put me out of my misery and give me a clue as to what's going
 on? Apologies if it's blindingly obvious or if this question has been
 asked and answered before.
 
 Thanks,
 
 Nicole
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkiAqiAACgkQLMI5fndAv9h7HgCfU6a7v1nE5iLYcUPbXhC6sfU7
mpkAn1Q/DyOI4Zo7QJhF9zqfqCq6boXv
=L2VZ
-END PGP SIGNATURE-
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular expression help

2008-07-18 Thread nclbndk759

On Jul 18, 3:35 pm, Nick Dumas [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 I think you're over-complicating this. I'm assuming that you're going to
 do a line graph of some sorta, and each new line of the file contains a
 new set of data.

 The problem you mentioned with your regex returning a match object
 rather than a string is because you're simply using a re function that
 doesn't return strings. re.findall() is what you want. That being said,
 here is working code to mine data from your file.

 [code]
 line = 'c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107
 mad=-3.597647 3pv=0'

 energypat = r'\betot=(-?\d*?[.]\d*)'

 #Note: To change the data grabbed from the line, you can change the
 #'etot' to 'afrac' or 'emad' or anything that doesn't contain a regex
 #special character.

 energypat = re.compile(energypat)

 re.findall(energypat, line)# returns a STRING containing '-12.020107'

 [/code]

 This returns a string, which is easy enough to convert to an int. After
 that, you can datapoints.append() to your heart's content. Good luck
 with your work.



 [EMAIL PROTECTED] wrote:
  Hello,

  I am new to Python, with a background in scientific computing. I'm
  trying to write a script that will take a file with lines like

  c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
  3pv=0

  extract the values of afrac and etot and plot them. I'm really
  struggling with getting the values of efrac and etot. So far I have
  come up with (small snippet of script just to get the energy, etot):

  def get_data_points(filename):
      file = open(filename,'r')
      data_points = []
      while 1:
          line = file.readline()
          if not line: break
          energy = get_total_energy(line)
          data_points.append(energy)
      return data_points

  def get_total_energy(line):
      rawstr = r(?Pkey.*?)=(?Pvalue.*?)\s
      p = re.compile(rawstr)
      return p.match(line,5)

  What is being stored in energy is '_sre.SRE_Match object at
  0x2a955e4ed0', not '-11.020107'. Why? I've been struggling with
  regular expressions for two days now, with no luck. Could someone
  please put me out of my misery and give me a clue as to what's going
  on? Apologies if it's blindingly obvious or if this question has been
  asked and answered before.

  Thanks,

  Nicole

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.9 (MingW32)
 Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org

 iEYEARECAAYFAkiAqiAACgkQLMI5fndAv9h7HgCfU6a7v1nE5iLYcUPbXhC6sfU7
 mpkAn1Q/DyOI4Zo7QJhF9zqfqCq6boXv
 =L2VZ
 -END PGP SIGNATURE-

Thanks guys :-)
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular expression help

2008-07-18 Thread Marc 'BlackJack' Rintsch

On Fri, 18 Jul 2008 10:04:29 -0400, Russell Blau wrote:

 values = {}
 for expression in line.split( ):
 if = in expression:
 name, val = expression.split(=)
 values[name] = val
 […]

 And when you get to be a really hard-core Pythonista, you could write
 the whole routine above in one line, but this seems clearer.  ;-)

I know it's a matter of taste but I think the one liner is still clear
(enough)::

  values = dict(s.split('=') for s in line.split() if '=' in s)

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list

Regular Expression Help

2008-03-16 Thread santhosh kumar

Hi all,
 I have text like ,
STRINGTABLE
BEGIN
ID_NEXT_PANECambiar a la siguiente sección de laventana
\nSiguiente sección
ID_PREV_PANERegresar a la sección anterior de
laventana\nSección anterior
END
STRINGTABLE
BEGIN
ID_VIEW_TOOLBAR Mostrar u ocultar la barra de
herramientas\nMostrar/Ocultar la barra de herramientas
ID_VIEW_STATUS_BAR  Mostrar u ocultar la barra de
estado\nMostrar/Ocultar la barra de estado
END


..

and i need to parse from STRINGTABLE to END as a list object. whatkind of
regular expression should i write.

-- 
Regards,
Santhoshkumar.S
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression Help

2008-03-16 Thread Duncan Booth

santhosh kumar [EMAIL PROTECTED] wrote:

  I have text like ,
 STRINGTABLE
 BEGIN
 ID_NEXT_PANECambiar a la siguiente secciÃ³n de laventana
 \nSiguiente secciÃ³n
 ID_PREV_PANERegresar a la secciÃ³n anterior de
 laventana\nSecciÃ³n anterior
 END
 STRINGTABLE
 BEGIN
 ID_VIEW_TOOLBAR Mostrar u ocultar la barra de
 herramientas\nMostrar/Ocultar la barra de herramientas
 ID_VIEW_STATUS_BAR  Mostrar u ocultar la barra de
 estado\nMostrar/Ocultar la barra de estado
 END
 
 
 ..
 and i need to parse from STRINGTABLE to END as a list object. whatkind of
 regular expression should i write.
 

I doubt very much whether you want any regular expressions at all. I'd do 
something alone these lines:

find a line==STRINGTABLE
assert the next line==BEGIN
then until we find a line==END:
idvalue = line.strip().split(None,1)
assert len(idvalue)==2
result.append(idvalue)

-- 
http://mail.python.org/mailman/listinfo/python-list

Regular Expression Help

2008-02-26 Thread Lythoner

Hi All,

I have a python utility which helps to generate an excel file for
language translation. For any new language, we will generate the excel
file which will have the English text and column for interested
translation language. The translator  will provide the language string
and again I will have python utility to read the excel file target
language string and update/generate the resource file  database
records. Our application is VC++ application, we use MS Access db.


We have string table like this.

STRINGTABLE
BEGIN
IDS_CONTEXT_API_ API Totalizer Control Dialog
IDS_CONTEXT Gas Analyzer
END

STRINGTABLE
BEGIN
ID_APITOTALIZER_CONTROL
Start, stop, and reset API volume flow
\nTotalizer Control
END

this repeats.


I read the file line by line and pick the contents inside the
STRINGTABLE.


I want to use the regular expression while should give me all the
entries with in
STRINGTABLE
BEGIN
Get what ever put in this
END


I tried little bit, but no luck. Note that it is multi-line string
entries which we cannot make as single line


Regards,

Krish
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression Help

2008-02-26 Thread John Machin

On Feb 27, 6:28 am, [EMAIL PROTECTED] wrote:
 Hi All,

 I have a python utility which helps to generate an excel file for
 language translation. For any new language, we will generate the excel
 file which will have the English text and column for interested
 translation language. The translator  will provide the language string
 and again I will have python utility to read the excel file target
 language string and update/generate the resource file  database
 records. Our application is VC++ application, we use MS Access db.

 We have string table like this.

 STRINGTABLE
 BEGIN
 IDS_CONTEXT_API_ API Totalizer Control Dialog
 IDS_CONTEXT Gas Analyzer
 END

 STRINGTABLE
 BEGIN
 ID_APITOTALIZER_CONTROL
 Start, stop, and reset API volume flow
 \nTotalizer Control
 END
 
 this repeats.

 I read the file line by line and pick the contents inside the
 STRINGTABLE.

 I want to use the regular expression while should give me all the
 entries with in
 STRINGTABLE
 BEGIN
 Get what ever put in this
 END

 I tried little bit, but no luck. Note that it is multi-line string
 entries which we cannot make as single line


Looks to me like you have a very simple grammar:
entry ::= id quoted_string

id is matched by r'[A-Z]+[A-Z_]+'
quoted_string is matched by r'[^]*'

So a pattern which will pick out one entry would be something like
r'([A-Z]+[A-Z_]+)\s+([^]*)'
Not that using \s+ (whitespace) allows for having \n etc between id
and quoted_string.

You need to build a string containing all the lines between BEGIN and
END, and then use re.findall.

If you still can't get it to work, ask again -- but do show the code
from your best attempt, and reduce ambiguity by showing your test
input as a Python expression e.g.
test1_in = \
ID_F fough
ID_B_
barre
ID__Z
zotte start
  zotte end

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regular expression help

2007-04-12 Thread 7stud

On Apr 11, 11:15 pm, [EMAIL PROTECTED] wrote:
 On Apr 11, 9:50 pm, Gabriel Genellina [EMAIL PROTECTED]
 lhs = re.compile(r'\s*(\b\w+\s*=)')
 for s in [ a = 4 b =3.4 5.4 c = 4.5,
 a = 4.5 b = 'h'  'd' c = 4.5 3.5]:
 tokens = lhs.split(s)
 results = [tokens[_] + tokens[_+1] for _ in range(1,len(tokens),

The only thing I can think when I look at that is: what a syntactic
abomination.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regular expression help

2007-04-12 Thread Qilong Ren

Hi,

Yeah, a little bit tricky. Actually it is part of some Fortran input file.

Thanks for suggestion! It helps a lot!

Thanks,Qilong

- Original Message 
From: Gabriel Genellina [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, April 11, 2007 9:50:00 PM
Subject: Re: python regular expression help

En Wed, 11 Apr 2007 23:14:01 -0300, Qilong Ren [EMAIL PROTECTED]  
escribió:

 Thanks for reply. That actually is not what I want. Strings I am dealing  
 with may look like this:
  s = 'a = 4.5 b = 'h'  'd' c = 4.5 3.5'
 What I want is
  a = 4.5
  b = 'h' 'd'
  c = 4.5 3.5

That's a bit tricky. You have LHS = RHS where RHS includes all the  
following text *except* the very next word before the following = (which  
is the LHS of the next expression). Or something like that :)

py import re
py s = a = 4.5 b = 'h'  'd' c = 4.5 3.5
py r = re.compile(r\w+\s*=\s*.*?(?=\w+\s*=|$))
py for item in r.findall(s):
...   print item
...
a = 4.5
b = 'h'  'd'
c = 4.5 3.5

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list







   

Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.
http://autos.yahoo.com/new_cars.html -- 
http://mail.python.org/mailman/listinfo/python-list

python regular expression help

2007-04-11 Thread Qilong Ren

Hi, everyone,

I am extracting some information from a given string using python RE. The 
string is ,for example,
   s = 'a = 4 b =3.4 5.4 c = 4.5'
What I want is :
   a = 4
b = 3.4 5.4 
   c = 4.5
Right now I use : 
   pattern = re.compile(r'\w+\s*=\s*.*?\s+')
   lists = pattern.findall(s)
It works for the string like 'a = 4 b = 3.4 c = 4.5', but does not work with 
strings like 'a=4 b=3.4 5.4 c = 4.5'

Any suggestion?

Thanks,Qilong




   

It's here! Your new message!  
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regular expression help

2007-04-11 Thread liupeng

pattern = re.compile(r'\w+\s*=\s*[0-9]*.[0-9]*\s*')
lists = pattern.findall(s)
print lists
['a=4 ', 'b=3.4 ', 'c=4.5']
On Wed, Apr 11, 2007 at 06:10:07PM -0700, Qilong Ren wrote:
 Hi, everyone,
 
 I am extracting some information from a given string using python RE. The
 string is ,for example,
s = 'a = 4 b =3.4 5.4 c = 4.5'
 What I want is :
a = 4
 b = 3.4 5.4
c = 4.5
 Right now I use :
pattern = re.compile(r'\w+\s*=\s*.*?\s+')
lists = pattern.findall(s)
 It works for the string like 'a = 4 b = 3.4 c = 4.5', but does not work with
 strings like 'a=4 b=3.4 5.4 c = 4.5'
 
 Any suggestion?
 
 Thanks,Qilong
 
 ━━━
 Don't get soaked. Take a quick peak at the forecast
 with theYahoo! Search weather shortcut.

 -- 
 http://mail.python.org/mailman/listinfo/python-list


signature.asc
Description: Digital signature
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regular expression help

2007-04-11 Thread Qilong Ren

Hi,

Thanks for reply. That actually is not what I want. Strings I am dealing with 
may look like this:
 s = 'a = 4.5 b = 'h'  'd' c = 4.5 3.5'
What I want is
 a = 4.5
 b = 'h' 'd'
 c = 4.5 3.5


- Original Message 
From: liupeng [EMAIL PROTECTED]
To: python-list@python.org
Sent: Wednesday, April 11, 2007 6:41:30 PM
Subject: Re: python regular expression help

pattern = re.compile(r'\w+\s*=\s*[0-9]*.[0-9]*\s*')
lists = pattern.findall(s)
print lists
['a=4 ', 'b=3.4 ', 'c=4.5']
On Wed, Apr 11, 2007 at 06:10:07PM -0700, Qilong Ren wrote:
 Hi, everyone,
 
 I am extracting some information from a given string using python RE. The
 string is ,for example,
s = 'a = 4 b =3.4 5.4 c = 4.5'
 What I want is :
a = 4
 b = 3.4 5.4
c = 4.5
 Right now I use :
pattern = re.compile(r'\w+\s*=\s*.*?\s+')
lists = pattern.findall(s)
 It works for the string like 'a = 4 b = 3.4 c = 4.5', but does not work with
 strings like 'a=4 b=3.4 5.4 c = 4.5'
 
 Any suggestion?
 
 Thanks,Qilong
 
 ━━━
 Don't get soaked. Take a quick peak at the forecast
 with theYahoo! Search weather shortcut.

 -- 
 http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list






   

Need Mail bonding?
Go to the Yahoo! Mail QA for great tips from Yahoo! Answers users.
http://answers.yahoo.com/dir/?link=listsid=396546091-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regular expression help

2007-04-11 Thread 7stud

On Apr 11, 7:41 pm, liupeng [EMAIL PROTECTED] wrote:
 pattern = re.compile(r'\w+\s*=\s*[0-9]*.[0-9]*\s*')
 lists = pattern.findall(s)
 print lists
 ['a=4 ', 'b=3.4 ', 'c=4.5']

 On Wed, Apr 11, 2007 at 06:10:07PM -0700, Qilong Ren wrote:
  Hi, everyone,

  I am extracting some information from a given string using python RE. The
  string is ,for example,
     s = 'a = 4 b =3.4 5.4 c = 4.5'
  What I want is :
     a = 4
      b = 3.4 5.4
     c = 4.5
  Right now I use :
     pattern = re.compile(r'\w+\s*=\s*.*?\s+')
     lists = pattern.findall(s)
  It works for the string like 'a = 4 b = 3.4 c = 4.5', but does not work with
  strings like 'a=4 b=3.4 5.4 c = 4.5'

  Any suggestion?

  Thanks,Qilong

  ━━━ 
  
  Don't get soaked. Take a quick peak at the forecast
  with theYahoo! Search weather shortcut.
  --
 http://mail.python.org/mailman/listinfo/python-list

  signature.asc
 1KDownload

Try this:

import re

s = 'a = 4 b =3.4 5.4 c = 4.5'
r = re.compile([a-z]+.*?(?=[a-z]|$) )
l = r.findall(s)
print l
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regular expression help

2007-04-11 Thread Gabriel Genellina

En Wed, 11 Apr 2007 23:14:01 -0300, Qilong Ren [EMAIL PROTECTED]  
escribió:

 Thanks for reply. That actually is not what I want. Strings I am dealing  
 with may look like this:
  s = 'a = 4.5 b = 'h'  'd' c = 4.5 3.5'
 What I want is
  a = 4.5
  b = 'h' 'd'
  c = 4.5 3.5

That's a bit tricky. You have LHS = RHS where RHS includes all the  
following text *except* the very next word before the following = (which  
is the LHS of the next expression). Or something like that :)

py import re
py s = a = 4.5 b = 'h'  'd' c = 4.5 3.5
py r = re.compile(r\w+\s*=\s*.*?(?=\w+\s*=|$))
py for item in r.findall(s):
...   print item
...
a = 4.5
b = 'h'  'd'
c = 4.5 3.5

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regular expression help

2007-04-11 Thread Qilong Ren


Hi,

I don't quite understand the regular expression:
  re.compile([a-z]+.*?(?=[a-z]|$) )
and I tried. In some cases it works. But if the string looks like:
   s = 'a = 3.4 b = 4.5 5.6 c = h,d'
it failed.

What I came up with is :
 names = re.compile(r'(\w+)\s*=').findall(s)
the corresponding values
values = re.split(r'\w+\s*=',s)[1:]
It dose not look good but it works. What do you think?

Thanks,Qilong



- Original Message 
From: 7stud [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, April 11, 2007 8:27:57 PM
Subject: Re: python regular expression help

On Apr 11, 7:41 pm, liupeng [EMAIL PROTECTED] wrote:
 pattern = re.compile(r'\w+\s*=\s*[0-9]*.[0-9]*\s*')
 lists = pattern.findall(s)
 print lists
 ['a=4 ', 'b=3.4 ', 'c=4.5']

 On Wed, Apr 11, 2007 at 06:10:07PM -0700, Qilong Ren wrote:
  Hi, everyone,

  I am extracting some information from a given string using python RE. The
  string is ,for example,
 s = 'a = 4 b =3.4 5.4 c = 4.5'
  What I want is :
 a = 4
  b = 3.4 5.4
 c = 4.5
  Right now I use :
 pattern = re.compile(r'\w+\s*=\s*.*?\s+')
 lists = pattern.findall(s)
  It works for the string like 'a = 4 b = 3.4 c = 4.5', but does not work with
  strings like 'a=4 b=3.4 5.4 c = 4.5'

  Any suggestion?

  Thanks,Qilong

  ━━━ 
  
  Don't get soaked. Take a quick peak at the forecast
  with theYahoo! Search weather shortcut.
  --
 http://mail.python.org/mailman/listinfo/python-list

  signature.asc
 1KDownload

Try this:

import re

s = 'a = 4 b =3.4 5.4 c = 4.5'
r = re.compile([a-z]+.*?(?=[a-z]|$) )
l = r.findall(s)
print l
-- 
http://mail.python.org/mailman/listinfo/python-list






   

Be a PS3 game guru.
Get your game face on with the latest PS3 news and previews at Yahoo! Games.
http://videogames.yahoo.com/platform?platform=120121-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regular expression help

2007-04-11 Thread 7stud

On Apr 11, 10:50 pm, Gabriel Genellina [EMAIL PROTECTED]
wrote:
 En Wed, 11 Apr 2007 23:14:01 -0300, Qilong Ren [EMAIL PROTECTED]  
 escribió:

  Thanks for reply. That actually is not what I want. Strings I am dealing  
  with may look like this:
   s = 'a = 4.5 b = 'h'  'd' c = 4.5 3.5'
  What I want is
   a = 4.5
   b = 'h' 'd'
   c = 4.5 3.5

I suppose next you'll post your strings can also  look like this:

[EMAIL PROTECTED]@[EMAIL PROTECTED]@%12341234qeerasdfdae

and you want A = 3


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regular expression help

2007-04-11 Thread attn . steven . kuo

On Apr 11, 9:50 pm, Gabriel Genellina [EMAIL PROTECTED]
wrote:
 En Wed, 11 Apr 2007 23:14:01 -0300, Qilong Ren [EMAIL PROTECTED]
 escribió:

  Thanks for reply. That actually is not what I want. Strings I am dealing
  with may look like this:
   s = 'a = 4.5 b = 'h'  'd' c = 4.5 3.5'
  What I want is
   a = 4.5
   b = 'h' 'd'
   c = 4.5 3.5

 That's a bit tricky. You have LHS = RHS where RHS includes all the
 following text *except* the very next word before the following = (which
 is the LHS of the next expression). Or something like that :)

 py import re
 py s = a = 4.5 b = 'h'  'd' c = 4.5 3.5
 py r = re.compile(r\w+\s*=\s*.*?(?=\w+\s*=|$))
 py for item in r.findall(s):
 ...   print item
 ...
 a = 4.5
 b = 'h'  'd'
 c = 4.5 3.5



Another way is to use split:

import re

lhs = re.compile(r'\s*(\b\w+\s*=)')
for s in [ a = 4 b =3.4 5.4 c = 4.5,
a = 4.5 b = 'h'  'd' c = 4.5 3.5]:
tokens = lhs.split(s)
results = [tokens[_] + tokens[_+1] for _ in range(1,len(tokens),
2)]
print s
print results

--
Regards,
Steven


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regular expression help

2007-04-11 Thread Paul McGuire

On Apr 11, 11:50 pm, Gabriel Genellina [EMAIL PROTECTED]
wrote:
 En Wed, 11 Apr 2007 23:14:01 -0300, Qilong Ren [EMAIL PROTECTED]  
 escribió:

  Thanks for reply. That actually is not what I want. Strings I am dealing  
  with may look like this:
   s = 'a = 4.5 b = 'h'  'd' c = 4.5 3.5'
  What I want is
   a = 4.5
   b = 'h' 'd'
   c = 4.5 3.5

 That's a bit tricky. You have LHS = RHS where RHS includes all the  
 following text *except* the very next word before the following = (which  
 is the LHS of the next expression). Or something like that :)

 py import re
 py s = a = 4.5 b = 'h'  'd' c = 4.5 3.5
 py r = re.compile(r\w+\s*=\s*.*?(?=\w+\s*=|$))
 py for item in r.findall(s):
 ...   print item
 ...
 a = 4.5
 b = 'h'  'd'
 c = 4.5 3.5

 --
 Gabriel Genellina

The pyparsing version is a bit more readable, probably simpler to come
back later to expand definition of varName, for example.

from pyparsing import
Word,alphas,nums,FollowedBy,sglQuotedString,OneOrMore

realNum = Word(nums,nums+.).setParseAction(lambda t:float(t[0]))
varName = Word(alphas)
LHS = varName + FollowedBy(=)
RHSval = sglQuotedString | realNum | varName
RHS = OneOrMore( ~LHS + RHSval )
assignment = LHS.setResultsName(LHS) + '=' +
RHS.setResultsName(RHS)

s = a = 4.5 b = 'h'  'd' c = 4.5 3.5
for a in assignment.searchString(s):
print a.LHS, '=', a.RHS

prints:
['a'] = [4.5]
['b'] = ['h', 'd']
['c'] = [4.5, 3.5]

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression help for parsing html tables

2006-10-29 Thread Odalrick


[EMAIL PROTECTED] skrev:

 Hello,

 I am having some difficulty creating a regular expression for the
 following string situation in html. I want to find a table that has
 specific text in it and then extract the html just for that immediate
 table.

 the string would look something like this:

 ...stuff here...
 table
 ...stuff here...
 table
 ...stuff here...
 table
 ...
 text i'm searching for
 ...
 /table
 ...stuff here...
 /table
 ...stuff here...
 /table
 ...stuff here...


 My question:  is there a way in RE to say:   when I find this text I'm
 looking for, search backwards and find the immediate instance of the
 string table  and then search forwards and find the immediate
 instance of the string /table.   ?

 any help is appreciated.

 Steve.

It would have been easier if you'd said what the text you are looking
for is, but I think:

regex = re.compile( r'table(.*?text you are looking for.*?)/table',
re.DOTALL )
match = regex.search( html_string )
found_table = match.group( 1 )

would work.

/Odalrick

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression help for parsing html tables

2006-10-29 Thread Paddy


[EMAIL PROTECTED] wrote:
 Hello,

 I am having some difficulty creating a regular expression for the
 following string situation in html. I want to find a table that has
 specific text in it and then extract the html just for that immediate
 table.

 the string would look something like this:

 ...stuff here...
 table
 ...stuff here...
 table
 ...stuff here...
 table
 ...
 text i'm searching for
 ...
 /table
 ...stuff here...
 /table
 ...stuff here...
 /table
 ...stuff here...


 My question:  is there a way in RE to say:   when I find this text I'm
 looking for, search backwards and find the immediate instance of the
 string table  and then search forwards and find the immediate
 instance of the string /table.   ?

 any help is appreciated.

 Steve.

Might searching the output of BeautifulSoup(html).prettify() make
things easier?

http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsing%20HTML

- Paddy

-- 
http://mail.python.org/mailman/listinfo/python-list

Regular Expression help for parsing html tables

2006-10-28 Thread steve551979

Hello,

I am having some difficulty creating a regular expression for the
following string situation in html. I want to find a table that has
specific text in it and then extract the html just for that immediate
table.

the string would look something like this:

...stuff here...
table
...stuff here...
table
...stuff here...
table
...
text i'm searching for
...
/table
...stuff here...
/table
...stuff here...
/table
...stuff here...


My question:  is there a way in RE to say:   when I find this text I'm
looking for, search backwards and find the immediate instance of the
string table  and then search forwards and find the immediate
instance of the string /table.   ?

any help is appreciated.

Steve.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression help for parsing html tables

2006-10-28 Thread Stefan Behnel

Hi Steve,

[EMAIL PROTECTED] wrote:
 I am having some difficulty creating a regular expression for the
 following string situation in html. I want to find a table that has
 specific text in it and then extract the html just for that immediate
 table.

Any reason why you can't use a real HTML parser and API (e.g. the one provided
by lxml)? That can really make things easier here.

http://codespeak.net/lxml/
http://codespeak.net/lxml/api.html#parsers
http://codespeak.net/lxml/api.html#trees-and-documents
http://effbot.org/zone/element-index.htm

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need some regular expression help

2006-10-08 Thread Diez B. Roggisch


hanumizzle wrote:
 On 7 Oct 2006 15:00:29 -0700, Diez B. Roggisch [EMAIL PROTECTED] wrote:
 
  Chris wrote:
   I need a pattern that  matches a string that has the same number of '('
   as ')':
   findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
   '((2x+2)sin(x))', '(log(2)/log(5))' ]
   Can anybody help me out?
 
  This is not possible with regular expressions - they can't remember
  how many parens they already encountered.

 Remember that regular expressions are used to represent regular
 grammars. Most regex engines actually aren't regular in that they
 support fancy things like look-behind/ahead and capture groups...IIRC,
 these cannot be part of a true regular expression library.

Certainly true, and it always gives me a hard time because I don't know
to which extend a regular expression nowadays might do the job because
of these extensions. It was so much easier back in the old times

 With that said, the quote-unquote regexes in Lua have a special
 feature that supports balanced expressions. I believe Python has a
 PCRE lib somewhere; you may be able to use the experimental ??{ }
 construct in that case.

Even if it has - I'm not sure if it really does you good, for several
reasons:

 - regexes - even enhanced ones - don't build trees. But that is what
you ultimately want
   from an expression like sin(log(x))

 - even if they are more powerful these days, the theory of context
free grammars still applies.
   so if what you need isn't LL(k) but LR(k), how do you specify that
to the regex engine?

 - the regexes are useful because of their compact notations, parsers
allow for better structured outcome 


Diez

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need some regular expression help

2006-10-08 Thread Theerasak Photha

On 8 Oct 2006 01:49:50 -0700, Diez B. Roggisch [EMAIL PROTECTED] wrote:

 Even if it has - I'm not sure if it really does you good, for several
 reasons:

  - regexes - even enhanced ones - don't build trees. But that is what
 you ultimately want
from an expression like sin(log(x))

  - even if they are more powerful these days, the theory of context
 free grammars still applies.
so if what you need isn't LL(k) but LR(k), how do you specify that
 to the regex engine?

  - the regexes are useful because of their compact notations, parsers
 allow for better structured outcome

Just wait for Perl 6 :D

-- Theerasak
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need some regular expression help

2006-10-08 Thread bearophileHUGS

Tim Chase:
 It still doesn't solve the aforementioned problem
 of things like ')))((('  which is balanced, but psychotic. :)

This may solve the problem:

def balanced(txt):
d = {'(':1, ')':-1}
tot = 0
for c in txt:
tot += d.get(c, 0)
if tot  0:
return False
return tot == 0

print balanced(42^((2x+2)sin(x)) + (log(2)/log(5))) # True
print balanced(42^((2x+2)sin(x) + (log(2)/log(5))) # False
print balanced(42^((2x+2)sin(x))) + (log(2)/log(5))) # False
print balanced()))((() # False

A possibile alternative for Py 2.5. The dict solution looks better, but
this may be faster:

def balanced2(txt):
tot = 0
for c in txt:
tot += 1 if c==( else (-1 if c==) else 0)
if tot  0:
return False
return tot == 0

Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need some regular expression help

2006-10-08 Thread Fredrik Lundh

[EMAIL PROTECTED] wrote:

  The dict solution looks better, but this may be faster:

it's slightly faster, but both your alternatives are about 10x slower 
than a straightforward:

def balanced(txt):
 return txt.count(() == txt.count())

/F

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need some regular expression help

2006-10-08 Thread Mirco Wahab

Thus spoke Diez B. Roggisch (on 2006-10-08 10:49):
 Certainly true, and it always gives me a hard time because I don't know
 to which extend a regular expression nowadays might do the job because
 of these extensions. It was so much easier back in the old times

Right, in perl, this would be a no-brainer,
its documented all over the place, like:

   my $re;

   $re = qr{
(?:
  (? [^\\()]+ | \\. )
   |
 \( (??{ $re }) \)
)*
}xs;

where you have a 'delayed execution'
of the

  (??{ $re })

which in the end makes the whole a thing
recursive one, it gets expanded and
executed if the match finds its way
to it.

Above regex will match balanced parens,
as in:

   my $good = 'a + (b / (c - 2)) * (d ^ (e+f))  ';
   my $bad1 = 'a + (b / (c - 2)  * (d ^ (e+f))  ';
   my $bad2 = 'a + (b / (c - 2)) * (d) ^ (e+f) )';

if you do:

   print ok \n if $good =~ /^$re$/;
   print ok \n if $bad1 =~ /^$re$/;
   print ok \n if $bad2 =~ /^$re$/;


This in some depth documented e.g. in
http://japhy.perlmonk.org/articles/tpj/2004-summer.html
(topic: Recursive Regexes)

Regards

M.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need some regular expression help

2006-10-08 Thread Diez B. Roggisch

Mirco Wahab schrieb:
 Thus spoke Diez B. Roggisch (on 2006-10-08 10:49):
 Certainly true, and it always gives me a hard time because I don't know
 to which extend a regular expression nowadays might do the job because
 of these extensions. It was so much easier back in the old times
 
 Right, in perl, this would be a no-brainer,
 its documented all over the place, like:
 
my $re;
 
$re = qr{
 (?:
   (? [^\\()]+ | \\. )
|
  \( (??{ $re }) \)
 )*
 }xs;
 
 where you have a 'delayed execution'
 of the
 
   (??{ $re })
 
 which in the end makes the whole a thing
 recursive one, it gets expanded and
 executed if the match finds its way
 to it.
 
 Above regex will match balanced parens,
 as in:
 
my $good = 'a + (b / (c - 2)) * (d ^ (e+f))  ';
my $bad1 = 'a + (b / (c - 2)  * (d ^ (e+f))  ';
my $bad2 = 'a + (b / (c - 2)) * (d) ^ (e+f) )';
 
 if you do:
 
print ok \n if $good =~ /^$re$/;
print ok \n if $bad1 =~ /^$re$/;
print ok \n if $bad2 =~ /^$re$/;
 
 
 This in some depth documented e.g. in
 http://japhy.perlmonk.org/articles/tpj/2004-summer.html
 (topic: Recursive Regexes)

That clearly is a recursive grammar rule, and thus it can't be regular 
anymore :) But first of all, I find it ugly - the clean separation of 
lexical and syntactical analysis is better here, IMHO - and secondly, 
what are the properties of that parsing? Is it LL(k), LR(k), backtracking?

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need some regular expression help

2006-10-08 Thread bearophileHUGS

Fredrik Lundh wrote:

 it's slightly faster, but both your alternatives are about 10x slower
 than a straightforward:
 def balanced(txt):
  return txt.count(() == txt.count())

I know, but if you read my post again you see that I have shown those
solutions to mark )))((( as bad expressions. Just counting the parens
isn't enough.

Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need some regular expression help

2006-10-08 Thread Roy Smith

Diez B. Roggisch [EMAIL PROTECTED] wrote:
 Certainly true, and it always gives me a hard time because I don't know
 to which extend a regular expression nowadays might do the job because
 of these extensions. It was so much easier back in the old times

What old times?  I've been working with regex for mumble years and there's 
always been the problem that every implementation supports a slightly 
different syntax.  Even back in the good old days, grep, awk, sed, and ed 
all had slightly different flavors.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need some regular expression help

2006-10-08 Thread Theerasak Photha

On 10/8/06, Roy Smith [EMAIL PROTECTED] wrote:
 Diez B. Roggisch [EMAIL PROTECTED] wrote:
  Certainly true, and it always gives me a hard time because I don't know
  to which extend a regular expression nowadays might do the job because
  of these extensions. It was so much easier back in the old times

 What old times?  I've been working with regex for mumble years and there's
 always been the problem that every implementation supports a slightly
 different syntax.  Even back in the good old days, grep, awk, sed, and ed
 all had slightly different flavors.

Which grep? Which awk? :)

-- Theerasak
-- 
http://mail.python.org/mailman/listinfo/python-list

need some regular expression help

2006-10-07 Thread Chris

I need a pattern that  matches a string that has the same number of '('
as ')':
findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
'((2x+2)sin(x))', '(log(2)/log(5))' ]
Can anybody help me out?

Thanks for any help!

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need some regular expression help

2006-10-07 Thread Diez B. Roggisch


Chris wrote:
 I need a pattern that  matches a string that has the same number of '('
 as ')':
 findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
 '((2x+2)sin(x))', '(log(2)/log(5))' ]
 Can anybody help me out?

This is not possible with regular expressions - they can't remember
how many parens they already encountered.

You will need a real parser for this - pyparsing seems to be the most
popular choice today, I personally like spark. I'm sure you find an
example-grammar that will parse simple arithmetical expressions like
the one above.

Diez

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need some regular expression help

2006-10-07 Thread John Machin

Chris wrote:
 I need a pattern that  matches a string that has the same number of '('
 as ')':
 findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
 '((2x+2)sin(x))', '(log(2)/log(5))' ]
 Can anybody help me out?


No, there is so such pattern. You will have to code up a function.

Consider what your spec really is: '42^((2x+2)sin(x)) +
(log(2)/log(5))' has the same number of left and right parentheses; so
does the zero-length string; so does ') + (' -- perhaps you need to add
'and starts with a ('

Consider what you are going to do with input like this:

print '(' + some_text + ')'

Maybe you need to do some lexical analysis and work at the level of
tokens rather than individual characters.

Which then raises the usual question: you have a perception that
regular expressions are the solution -- to what problem??

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need some regular expression help

2006-10-07 Thread hanumizzle

On 7 Oct 2006 15:00:29 -0700, Diez B. Roggisch [EMAIL PROTECTED] wrote:

 Chris wrote:
  I need a pattern that  matches a string that has the same number of '('
  as ')':
  findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
  '((2x+2)sin(x))', '(log(2)/log(5))' ]
  Can anybody help me out?

 This is not possible with regular expressions - they can't remember
 how many parens they already encountered.

Remember that regular expressions are used to represent regular
grammars. Most regex engines actually aren't regular in that they
support fancy things like look-behind/ahead and capture groups...IIRC,
these cannot be part of a true regular expression library.

With that said, the quote-unquote regexes in Lua have a special
feature that supports balanced expressions. I believe Python has a
PCRE lib somewhere; you may be able to use the experimental ??{ }
construct in that case.

-- Theerasak
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need some regular expression help

2006-10-07 Thread Roy Smith

In article [EMAIL PROTECTED],
 Chris [EMAIL PROTECTED] wrote:

 I need a pattern that  matches a string that has the same number of '('
 as ')':
 findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
 '((2x+2)sin(x))', '(log(2)/log(5))' ]
 Can anybody help me out?
 
 Thanks for any help!

Why does it need to be a regex?  There is a very simple and well-known 
algorithm which does what you want.

Start with i=0.  Walk the string one character at a time, incrementing i 
each time you see a '(', and decrementing it each time you see a ')'.  At 
the end of the string, the count should be back to 0.  If at any time 
during the process, the count goes negative, you've got mis-matched 
parentheses.

The algorithm runs in O(n), same as a regex.

Regex is a wonderful tool, but it's not the answer to all problems.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: need some regular expression help

2006-10-07 Thread Tim Chase

 Why does it need to be a regex?  There is a very simple and well-known 
 algorithm which does what you want.
 
 Start with i=0.  Walk the string one character at a time, incrementing i 
 each time you see a '(', and decrementing it each time you see a ')'.  At 
 the end of the string, the count should be back to 0.  If at any time 
 during the process, the count goes negative, you've got mis-matched 
 parentheses.
 
 The algorithm runs in O(n), same as a regex.
 
 Regex is a wonderful tool, but it's not the answer to all problems.

Following Roy's suggestion, one could use something like:

  s = '42^((2x+2)sin(x)) + (log(2)/log(5))'
  d = {'(':1, ')':-1}
  sum(d.get(c, 0) for c in s)
0


If you get a sum()  0, then you have too many (, and if you 
have sum()  0, you have too many ) characters.  A sum() of 0 
means there's the same number of parens.  It still doesn't solve 
the aforementioned problem of things like ')))((('  which is 
balanced, but psychotic. :)

-tkc




-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression help

2006-04-28 Thread Kent Johnson

Edward Elliott wrote:
 [EMAIL PROTECTED] wrote:
 If you are parsing HTML, it may make more sense to use a package
 designed especially for that purpose, like Beautiful Soup.
 
 I don't know Beautiful Soup, but one advantage regexes have over some
 parsers is handling malformed html. 

Beautiful Soup is intended to handle malformed HTML and seems to do 
pretty well.

Kent
-- 
http://mail.python.org/mailman/listinfo/python-list

Regular Expression help

2006-04-27 Thread RunLevelZero

I have some data and I need to put it in a list in a particular way.  I
have that figured out but there is  stuff  in the data that I don't
want.

Example:

10:00am - 11:00am:/b a
href=/tvpdb?d=tvpid=167540528cf=0lineup=us_KS57836dchannels=us_KCTVchspid=166030466chname=CBSprogutn=114615.intl=usThe
Price Is Right/aem

All I want is  Price Is Right 

Here is the re.

findshows =
re.compile(r'(\d\d:\d\d\D\D\s-\s\d\d:\d\d\D\D:*.*/aem)')

I have used a for loop to remove the extra data but then it ruins the
list that I am building.  Basically I want the list to be something
like this.

[[Government Access], [Price Is Right, Guiding Light, Another show]]

the for loop just comma deliminates all of them so I lose the list in a
list that I need.  I hope I have explained this well enough.  Any help
or ideas would be appreciated.

TIA

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression help

2006-04-27 Thread Edward Elliott

RunLevelZero wrote:
 10:00am - 11:00am:/b a href=/tvpdb?d=tvpid=167540528[snip]The
 Price Is Right/aem
 
 All I want is  Price Is Right 
 
 Here is the re.
 
 findshows =
 re.compile(r'(\d\d:\d\d\D\D\s-\s\d\d:\d\d\D\D:*.*/aem)')

1. A regex remembers everything it matches -- no need to wrap the entire
thing in parens.  Just call group() on the returned MatchObject.

2. If all you want is the link text, you don't need to do so much matching. 
If you don't need the time, don't match it in the first place.  If you're
using it as a marker, try matching each time with r'[\d:]{4,5}[ap]m'.  Not
as exact but a bit simpler.  Or just r'[\d:apm]{6,7}'

3. To grab what's inside the link: r'a[^]*(.*?)/a'

4. If the link text itself contains html tags, you'll have to strip those
off separately.  Extracting the text from arbitrarily nested html tags in
one shot requires a parser, not a regex.

5. If you're just going to run this regex repeatedly on an html doc and make
a list of the results, it's easier to read the whole doc into a string and
then use re.findall.


 I have used a for loop to remove the extra data but then it ruins the
 list that I am building.  Basically I want the list to be something
 like this.
 
 [[Government Access], [Price Is Right, Guiding Light, Another show]]
 
 the for loop just comma deliminates all of them so I lose the list in a
 list that I need.  I hope I have explained this well enough.  Any help
 or ideas would be appreciated.

No one can help with that unless you show us how you're building your list.


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression help

2006-04-27 Thread RunLevelZero

Great I will test this out once I have the time... thanks for the quick
response

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression help

2006-04-27 Thread johnzenger

If you are parsing HTML, it may make more sense to use a package
designed especially for that purpose, like Beautiful Soup.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression help

2006-04-27 Thread RunLevelZero

I considered that but what I need is simple and I don't want to use
another library for something so simple but thank you.  Plus I don't
understand them all that well :)

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression help

2006-04-27 Thread johnzenger

If what you need is simple, regular expressions are almost never the
answer.  And how simple can it be if you are posting here?  :)

BeautifulSoup isn't all that hard.  Observe:

 from BeautifulSoup import BeautifulSoup
 html = '10:00am - 11:00am:/b a 
 href=/tvpdb?d=tvpid=167540528[snip]The Price Is Right/aem'
 soup = BeautifulSoup(html)
 soup('a')
[a href=/tvpdb?d=tvpid=167540528ThePrice Is Right/a]
 for show in soup('a'):
print show.contents[0]


The Price Is Right



RunLevelZero wrote:
 I considered that but what I need is simple and I don't want to use
 another library for something so simple but thank you.  Plus I don't
 understand them all that well :)

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression help

2006-04-27 Thread RunLevelZero

r'a[^]*(.*?)/a'

With a slight modification that did exactly what I wanted, and yes the
findall was the only way to get all that I needed as I buffered all the
read.

Thanks a bunch.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression help

2006-04-27 Thread RunLevelZero

Interesting... thank you.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression help

2006-04-27 Thread Edward Elliott

[EMAIL PROTECTED] wrote:
 If you are parsing HTML, it may make more sense to use a package
 designed especially for that purpose, like Beautiful Soup.

I don't know Beautiful Soup, but one advantage regexes have over some
parsers is handling malformed html.  Omitted closing tags can wreak havoc. 
Regexes can also help if you only want elements preceded/followed by a
certain sibling or cousin in the parse tree.  It all depends on what you're
trying to accomplish.  In general though, yes parsers are better suited to
extracting from markup.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression help

2006-04-27 Thread John Bokma

Edward Elliott [EMAIL PROTECTED] wrote:

 [EMAIL PROTECTED] wrote:
 If you are parsing HTML, it may make more sense to use a package
 designed especially for that purpose, like Beautiful Soup.
 
 I don't know Beautiful Soup, but one advantage regexes have over some
 parsers is handling malformed html.  Omitted closing tags can wreak
 havoc. Regexes can also help if you only want elements
 preceded/followed by a certain sibling or cousin in the parse tree. 
 It all depends on what you're trying to accomplish.  In general
 though, yes parsers are better suited to extracting from markup.

A parser can be written in such a way that it doesn't give up on malformed 
HTML. Probably less hard then coming up with regexes that handle HTML 
that's well-formed. (and that coming from a Perl programmer ;-) )

-- 
John   MexIT: http://johnbokma.com/mexit/
   personal page:   http://johnbokma.com/
Experienced programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html
-- 
http://mail.python.org/mailman/listinfo/python-list

72 matches

Mail list logo