Regular Expression Help

2009-04-11 Thread Jean-Claude Neveu

Hello,

I was wondering if someone could tell me where 
I'm going wrong with my regular expression. I'm 
trying to write a regexp that identifies whether 
a string contains a correctly-formatted currency 
amount. I want to support dollars, UK pounds and 
Euros, but the example below deliberately omits 
Euros in case the Euro symbol get mangled 
anywhere in email or listserver processing. I 
also want people to be able to omit the currency symbol if they wish.


My regexp that I'm matching against is: "^\$\£?\d{0,10}(\.\d{2})?$"

Here's how I think it should work (but clearly 
I'm wrong, because it does not actually work):


^\$\£?  Require zero or one instance of $ or £ at the start of the string.
d{0,10} Next, require between zero and ten alpha characters.
(\.\d{2})?  Optionally, two characters can 
follow. They must be preceded by a decimal point.


Examples of acceptable input should be:

$12.42
$12
£12.42
$12,482.96  (now I think about it, I have not catered for this in my regexp)

And unacceptable input would be:

$12b.42
blah
$blah
etc


Here is my Python script:

#
import re

def is_currency(str):
   rex = "^\$\£?\d{0,10}(\.\d{2})?$"
   if re.match(rex, str):
  return 1
   else:
  return 0

def test_match(str):
   if is_currency (str):
  print str + " is a match"
   else:
  print str + " is not a match"

# All should match except the last two
test_match("$12.47")
test_match("12.47")
test_match("£12.47")
test_match("£12")
test_match("$12")
test_match("$12588.47")
test_match("$12,588.47")
test_match("£12588.47")
test_match("12588.47")
test_match("£12588")
test_match("$12588")
test_match("blah")
test_match("$12b.56")


AND HERE IS THE OUTPUT FROM THE ABOVE SCRIPT:
$12.47 is a match
12.47 is not a match
£12.47 is not a match
£12 is not a match
$12 is a match
$12588.47 is a match
$12,588.47 is not a match
£12588.47 is not a match
12588.47 is not a match
£12588 is not a match
$12588 is a match
blah is not a match
$12b.56 is not a match

Many thanks in advance. Regular expressions are not my strong suit :)

J-C

--
http://mail.python.org/mailman/listinfo/python-list


regular expression help

2010-11-29 Thread goldtech
Hi,

say:
>>> import re
 m="cccvlvlvlvnnnflfllffccclfnnnooo"
>>> re.compile(r'ccc.*nnn')
>>> rtt=.sub("||",m)
>>> rtt
'||ooo'

The regex is eating up too much. What I want is every non-overlapping
occurrence I think.

so rtt would be:

'||flfllff||ooo'

just like findall acts but in this case I want sub to act like that.

Thanks

-- 
http://mail.python.org/mailman/listinfo/python-list


Regular Expression help

2006-04-27 Thread RunLevelZero
I have some data and I need to put it in a list in a particular way.  I
have that figured out but there is " stuff " in the data that I don't
want.

Example:

10:00am - 11:00am: The
Price Is Right

All I want is " Price Is Right "

Here is the re.

findshows =
re.compile(r'(\d\d:\d\d\D\D\s-\s\d\d:\d\d\D\D:*.*)')

I have used a for loop to remove the extra data but then it ruins the
list that I am building.  Basically I want the list to be something
like this.

[[Government Access], [Price Is Right, Guiding Light, Another show]]

the for loop just comma deliminates all of them so I lose the list in a
list that I need.  I hope I have explained this well enough.  Any help
or ideas would be appreciated.

TIA

-- 
http://mail.python.org/mailman/listinfo/python-list


regular expression, help

2009-01-27 Thread Vincent Davis
I think there are two parts to this question and I am sure lots I am
missing. I am hoping an example will help meI have a html doc that I am
trying to use regular expressions to get a value out of.
here is an example or the line
Parcel ID: 39-034-15-009 
I want to get the number "39-034-15-009" after "Parcel ID:" The number will
be different each time but always the same format.
I think I can match "Parcel ID:" but not sure how to get the number after.
"Parcel ID:" only occurs once in the document.

is this how i need to start?
pid = re.compile('Parcel ID: ')

Basically I am completely lost and am not finding examples I find helpful.

I am getting the html using myurl=urllib.urlopen().
Can I use RE like this
thenum=pid.match(myurl)


I think the two key things I need to know are
1, how do I get the text after a match?
2, when I use myurl=urllib.urlopen(http://...). can I use the myurl as
the string in a RE, thenum=pid.match(myurl)

Thanks
Vincent
--
http://mail.python.org/mailman/listinfo/python-list


regular expression, help

2009-01-27 Thread Vincent Davis
I think there are two parts to this question and I am sure lots I am
missing. I am hoping an example will help meI have a html doc that I am
trying to use regular expressions to get a value out of.
here is an example or the line
Parcel ID: 39-034-15-009 
I want to get the number "39-034-15-009" after "Parcel ID:" The number will
be different each time but always the same format.
I think I can match "Parcel ID:" but not sure how to get the number after.
"Parcel ID:" only occurs once in the document.

is this how i need to start?
pid = re.compile('Parcel ID: ')

Basically I am completely lost and am not finding examples I find helpful.

I am getting the html using myurl=urllib.urlopen().
Can I use RE like this
thenum=pid.match(myurl)


I think the two key things I need to know are
1, how do I get the text after a match?
2, when I use myurl=urllib.urlopen(http://...). can I use the myurl as
the string in a RE, thenum=pid.match(myurl)

Thanks
Vincent
--
http://mail.python.org/mailman/listinfo/python-list


Regular Expression Help

2008-02-26 Thread Lythoner
Hi All,

I have a python utility which helps to generate an excel file for
language translation. For any new language, we will generate the excel
file which will have the English text and column for interested
translation language. The translator  will provide the language string
and again I will have python utility to read the excel file target
language string and update/generate the resource file & database
records. Our application is VC++ application, we use MS Access db.


We have string table like this.

"STRINGTABLE
BEGIN
IDS_CONTEXT_API_ "API Totalizer Control Dialog"
IDS_CONTEXT "Gas Analyzer"
END

STRINGTABLE
BEGIN
ID_APITOTALIZER_CONTROL
"Start, stop, and reset API volume flow
\nTotalizer Control"
END
"
this repeats.


I read the file line by line and pick the contents inside the
STRINGTABLE.


I want to use the regular expression while should give me all the
entries with in
STRINGTABLE
BEGIN
<>
END


I tried little bit, but no luck. Note that it is multi-line string
entries which we cannot make as single line


Regards,

Krish
-- 
http://mail.python.org/mailman/listinfo/python-list


Regular Expression Help

2008-03-16 Thread santhosh kumar
Hi all,
 I have text like ,
STRINGTABLE
BEGIN
ID_NEXT_PANE"Cambiar a la siguiente sección de laventana
\nSiguiente sección"
ID_PREV_PANE"Regresar a la sección anterior de
laventana\nSección anterior"
END
STRINGTABLE
BEGIN
ID_VIEW_TOOLBAR "Mostrar u ocultar la barra de
herramientas\nMostrar/Ocultar la barra de herramientas"
ID_VIEW_STATUS_BAR  "Mostrar u ocultar la barra de
estado\nMostrar/Ocultar la barra de estado"
END


..

and i need to parse from STRINGTABLE to END as a list object. whatkind of
regular expression should i write.

-- 
Regards,
Santhoshkumar.S
-- 
http://mail.python.org/mailman/listinfo/python-list

Regular expression help

2008-07-18 Thread nclbndk759
Hello,

I am new to Python, with a background in scientific computing. I'm
trying to write a script that will take a file with lines like

c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
3pv=0

extract the values of afrac and etot and plot them. I'm really
struggling with getting the values of efrac and etot. So far I have
come up with (small snippet of script just to get the energy, etot):

def get_data_points(filename):
file = open(filename,'r')
data_points = []
while 1:
line = file.readline()
if not line: break
energy = get_total_energy(line)
data_points.append(energy)
return data_points

def get_total_energy(line):
rawstr = r"""(?P.*?)=(?P.*?)\s"""
p = re.compile(rawstr)
return p.match(line,5)

What is being stored in energy is '<_sre.SRE_Match object at
0x2a955e4ed0>', not '-11.020107'. Why? I've been struggling with
regular expressions for two days now, with no luck. Could someone
please put me out of my misery and give me a clue as to what's going
on? Apologies if it's blindingly obvious or if this question has been
asked and answered before.

Thanks,

Nicole
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression Help

2009-04-11 Thread rurpy
On Apr 11, 9:42 pm, Jean-Claude Neveu 
wrote:

> My regexp that I'm matching against is: "^\$\£?\d{0,10}(\.\d{2})?$"
>
> Here's how I think it should work (but clearly
> I'm wrong, because it does not actually work):
>
> ^\$\£?  Require zero or one instance of $ or £ at the start of the string.

The "or" in "$ or £" above is a vertical bar.  You
want ^(\$|£)? here.

> d{0,10} Next, require between zero and ten alpha characters.
> (\.\d{2})?  Optionally, two characters can
> follow. They must be preceded by a decimal point.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression Help

2009-04-11 Thread John Machin
On Apr 12, 2:19 pm, ru...@yahoo.com wrote:
> On Apr 11, 9:42 pm, Jean-Claude Neveu 
> wrote:
>
> > My regexp that I'm matching against is: "^\$\£?\d{0,10}(\.\d{2})?$"
>
> > Here's how I think it should work (but clearly
> > I'm wrong, because it does not actually work):
>
> > ^\$\£?      Require zero or one instance of $ or £ at the start of the 
> > string.
>
> The "or" in "$ or £" above is a vertical bar.  You
> want ^(\$|£)? here.

Best not to use a capturing group (blah) when you don't need to
capture ... use (?:blah) instead.

When the alternatives are all single characters, for greater typing
efficiency and computing efficiency use a character class:

^[\$£]?
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression Help

2009-04-12 Thread Graham Breed

Jean-Claude Neveu wrote:

Hello,

I was wondering if someone could tell me where I'm going wrong with my 
regular expression. I'm trying to write a regexp that identifies whether 
a string contains a correctly-formatted currency amount. I want to 
support dollars, UK pounds and Euros, but the example below deliberately 
omits Euros in case the Euro symbol get mangled anywhere in email or 
listserver processing. I also want people to be able to omit the 
currency symbol if they wish.


If Euro symbols can get mangled, so can Pound signs. 
They're both outside ASCII.



My regexp that I'm matching against is: "^\$\£?\d{0,10}(\.\d{2})?$"

Here's how I think it should work (but clearly I'm wrong, because it 
does not actually work):


^\$\£?  Require zero or one instance of $ or £ at the start of the 
string.


^[$£]? is correct.  And, as you're using re.match, the ^ is 
superfluous.  (A previous message suggested ^[\$£]? which 
will also work.  You generally need to escape a Dollar sign 
but not here.)


You should also think about the encoding.  In my terminal, 
"£" is identical to '\xc2\xa3'.  That is, two bytes for a 
UTF-8 code point.  If you assume this encoding, it's best to 
make it explicit.  And if you don't assume a specific 
encoding it's best to convert to unicode to do the 
comparisons, so for 2.x (or portability) your string should 
start u"



d{0,10} Next, require between zero and ten alpha characters.


There's a backslash missing, but not from your original 
expression.  Digits are not "alpha characters".


(\.\d{2})?  Optionally, two characters can follow. They must be preceded 
by a decimal point.


That works.  Of course, \d{2} is longer than the simpler \d\d

Note that you can comment the original expression like this:

rex = u"""(?x)
^[$£]?# Zero or one instance of $ or £
   # at the start of the string.
\d{0,10}   # Between zero and ten digits
(\.\d{2})? # Optionally, two digits.
   # They must be preceded by a decimal point.
$  # End of line
"""

Then anybody (including you) who comes to read this in the 
future will have some idea what you were trying to do.


\> Examples of acceptable input should be:


$12.42
$12
£12.42
$12,482.96  (now I think about it, I have not catered for this in my 
regexp)


Yes, you need to think about that.


   Graham

--
http://mail.python.org/mailman/listinfo/python-list


Python's regular expression help

2010-04-29 Thread goldtech
Hi,
Trying to start out with simple things but apparently there's some
basics I need help with. This works OK:
>>> import re
>>> p = re.compile('(ab*)(sss)')
>>> m = p.match( 'absss' )
>>> m.group(0)
'absss'
>>> m.group(1)
'ab'
>>> m.group(2)
'sss'
...
But two questions:

How can I operate a regex on a string variable?
I'm doing something wrong here:

>>> f=r'abss'
>>> f
'abss'
>>> m = p.match( f )
>>> m.group(0)
Traceback (most recent call last):
  File "", line 1, in 
m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'

How do I implement a regex on a multiline string?  I thought this
might work but there's problem:

>>> p = re.compile('(ab*)(sss)', re.S)
>>> m = p.match( 'ab\nsss' )
>>> m.group(0)
Traceback (most recent call last):
  File "", line 1, in 
m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'
>>>

Thanks for the newbie regex help, Lee
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regular expression help

2010-11-29 Thread Yingjie Lan
--- On Tue, 11/30/10, goldtech  wrote:

> From: goldtech 
> Subject: regular expression help
> To: python-list@python.org
> Date: Tuesday, November 30, 2010, 9:17 AM
> The regex is eating up too much. What I want is every
> non-overlapping
> occurrence I think.
> 
> so rtt would be:
> 
> '||flfllff||ooo'
> 

Hi, I'll just let Python do most of the talk here.

>>> import re
>>> m="cccvlvlvlvnnnflfllffccclfnnnooo"
>>> p=re.compile(r'ccc.*?nnn')
>>> p.sub("||", m)
'||flfllff||ooo'

Cheers,

Yingjie


  
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regular expression help

2010-11-29 Thread Tim Harig
On 2010-11-30, goldtech  wrote:
> Hi,
>
> say:
 import re
> m="cccvlvlvlvnnnflfllffccclfnnnooo"
 re.compile(r'ccc.*nnn')
 rtt=.sub("||",m)
 rtt
> '||ooo'
>
> The regex is eating up too much. What I want is every non-overlapping
> occurrence I think.
>
> so rtt would be:
>
> '||flfllff||ooo'

Python 3.1.2 (r312:79147, Oct  9 2010, 00:16:06)
[GCC 4.4.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> m="cccvlvlvlvnnnflfllffccclfnnnooo"
>>> pattern = re.compile(r'ccc[^n]*nnn')
>>> pattern.sub("||", m)
'||flfllff||ooo'
>>>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regular expression help

2010-11-29 Thread Tim Harig
Python 3.1.2 (r312:79147, Oct  9 2010, 00:16:06)
[GCC 4.4.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> m="cccvlvlvlvnnnflfllffccclfnnnooo"
>>> pattern = re.compile(r'ccc[^n]*nnn')
>>> pattern.sub("||", m)
'||flfllff||ooo'
>>> # or, assuming that the middle sequence might contain singular or
>>> # double 'n's
>>> pattern = re.compile(r'ccc.*?nnn')
>>> pattern.sub("||", m)
'||flfllff||ooo'
>>>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regular expression help

2010-11-29 Thread goldtech
 .*?  fixed it. Every occurrence of the pattern is now affected, which
is what I want.

Thank you very much.
-- 
http://mail.python.org/mailman/listinfo/python-list


python regular expression help

2007-04-11 Thread Qilong Ren
Hi, everyone,

I am extracting some information from a given string using python RE. The 
string is ,for example,
   s = 'a = 4 b =3.4 5.4 c = 4.5'
What I want is :
   a = 4
b = 3.4 5.4 
   c = 4.5
Right now I use : 
   pattern = re.compile(r'\w+\s*=\s*.*?\s+')
   lists = pattern.findall(s)
It works for the string like 'a = 4 b = 3.4 c = 4.5', but does not work with 
strings like 'a=4 b=3.4 5.4 c = 4.5'

Any suggestion?

Thanks,Qilong




   

It's here! Your new message!  
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular Expression help

2006-04-27 Thread Edward Elliott
RunLevelZero wrote:
> 10:00am - 11:00am:  
> Here is the re.
> 
> findshows =
> re.compile(r'(\d\d:\d\d\D\D\s-\s\d\d:\d\d\D\D:*.*)')

1. A regex remembers everything it matches -- no need to wrap the entire
thing in parens.  Just call group() on the returned MatchObject.

2. If all you want is the link text, you don't need to do so much matching. 
If you don't need the time, don't match it in the first place.  If you're
using it as a marker, try matching each time with r'[\d:]{4,5}[ap]m'.  Not
as exact but a bit simpler.  Or just r'[\d:apm]{6,7}'

3. To grab what's inside the link: r']*>(.*?)'

4. If the link text itself contains html tags, you'll have to strip those
off separately.  Extracting the text from arbitrarily nested html tags in
one shot requires a parser, not a regex.

5. If you're just going to run this regex repeatedly on an html doc and make
a list of the results, it's easier to read the whole doc into a string and
then use re.findall.


> I have used a for loop to remove the extra data but then it ruins the
> list that I am building.  Basically I want the list to be something
> like this.
> 
> [[Government Access], [Price Is Right, Guiding Light, Another show]]
> 
> the for loop just comma deliminates all of them so I lose the list in a
> list that I need.  I hope I have explained this well enough.  Any help
> or ideas would be appreciated.

No one can help with that unless you show us how you're building your list.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression help

2006-04-27 Thread RunLevelZero
Great I will test this out once I have the time... thanks for the quick
response

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression help

2006-04-27 Thread johnzenger
If you are parsing HTML, it may make more sense to use a package
designed especially for that purpose, like Beautiful Soup.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression help

2006-04-27 Thread RunLevelZero
I considered that but what I need is simple and I don't want to use
another library for something so simple but thank you.  Plus I don't
understand them all that well :)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression help

2006-04-27 Thread johnzenger
If what you need is "simple," regular expressions are almost never the
answer.  And how simple can it be if you are posting here?  :)

BeautifulSoup isn't all that hard.  Observe:

>>> from BeautifulSoup import BeautifulSoup
>>> html = '10:00am - 11:00am: >> href="/tvpdb?d=tvp&id=167540528&[snip]>The Price Is Right'
>>> soup = BeautifulSoup(html)
>>> soup('a')
[ThePrice Is Right]
>>> for show in soup('a'):
print show.contents[0]


The Price Is Right



RunLevelZero wrote:
> I considered that but what I need is simple and I don't want to use
> another library for something so simple but thank you.  Plus I don't
> understand them all that well :)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression help

2006-04-27 Thread RunLevelZero
r']*>(.*?)'

With a slight modification that did exactly what I wanted, and yes the
findall was the only way to get all that I needed as I buffered all the
read.

Thanks a bunch.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression help

2006-04-27 Thread RunLevelZero
Interesting... thank you.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression help

2006-04-27 Thread Edward Elliott
[EMAIL PROTECTED] wrote:
> If you are parsing HTML, it may make more sense to use a package
> designed especially for that purpose, like Beautiful Soup.

I don't know Beautiful Soup, but one advantage regexes have over some
parsers is handling malformed html.  Omitted closing tags can wreak havoc. 
Regexes can also help if you only want elements preceded/followed by a
certain sibling or cousin in the parse tree.  It all depends on what you're
trying to accomplish.  In general though, yes parsers are better suited to
extracting from markup.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression help

2006-04-27 Thread John Bokma
Edward Elliott <[EMAIL PROTECTED]> wrote:

> [EMAIL PROTECTED] wrote:
>> If you are parsing HTML, it may make more sense to use a package
>> designed especially for that purpose, like Beautiful Soup.
> 
> I don't know Beautiful Soup, but one advantage regexes have over some
> parsers is handling malformed html.  Omitted closing tags can wreak
> havoc. Regexes can also help if you only want elements
> preceded/followed by a certain sibling or cousin in the parse tree. 
> It all depends on what you're trying to accomplish.  In general
> though, yes parsers are better suited to extracting from markup.

A parser can be written in such a way that it doesn't give up on malformed 
HTML. Probably less hard then coming up with regexes that handle HTML 
that's well-formed. (and that coming from a Perl programmer ;-) )

-- 
John   MexIT: http://johnbokma.com/mexit/
   personal page:   http://johnbokma.com/
Experienced programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression help

2006-04-28 Thread Kent Johnson
Edward Elliott wrote:
> [EMAIL PROTECTED] wrote:
>> If you are parsing HTML, it may make more sense to use a package
>> designed especially for that purpose, like Beautiful Soup.
> 
> I don't know Beautiful Soup, but one advantage regexes have over some
> parsers is handling malformed html. 

Beautiful Soup is intended to handle malformed HTML and seems to do 
pretty well.

Kent
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regular expression, help

2009-01-27 Thread Vincent Davis
is BeautifulSoup really better? Since I don't know either I would prefer to
learn only one for now.
Thanks
Vincent Davis



On Tue, Jan 27, 2009 at 10:39 AM, MRAB  wrote:

> Vincent Davis wrote:
>
>> I think there are two parts to this question and I am sure lots I am
>> missing. I am hoping an example will help me
>> I have a html doc that I am trying to use regular expressions to get a
>> value out of.
>> here is an example or the line
>> Parcel ID: 39-034-15-009 
>> I want to get the number "39-034-15-009" after "Parcel ID:" The number
>> will be different each time but always the same format.
>> I think I can match "Parcel ID:" but not sure how to get the number after.
>> "Parcel ID:" only occurs once in the document.
>>
>> is this how i need to start?
>> pid = re.compile('Parcel ID: ')
>>
>> Basically I am completely lost and am not finding examples I find helpful.
>>
>> I am getting the html using myurl=urllib.urlopen(). Can I use RE like this
>> thenum=pid.match(myurl)
>>
>> I think the two key things I need to know are
>> 1, how do I get the text after a match?
>> 2, when I use myurl=urllib.urlopen(http://...). can I use the myurl
>> as the string in a RE, thenum=pid.match(myurl)
>>
>>  Something like:
>
> pid = re.compile(r'Parcel ID: (\d+(?:-\d+)*)')
> myurl = urllib.urlopen(url)
> text = myurl.read()
> myurl.close()
> thenum = pid.search(text).group(1)
>
> Although BeautifulSoup is the preferred solution.
> --
> http://mail.python.org/mailman/listinfo/python-list
>
--
http://mail.python.org/mailman/listinfo/python-list


Re: regular expression, help

2009-01-27 Thread MRAB

Vincent Davis wrote:
I think there are two parts to this question and I am sure lots I am 
missing. I am hoping an example will help me
I have a html doc that I am trying to use regular expressions to get a 
value out of.

here is an example or the line
Parcel ID: 39-034-15-009 
I want to get the number "39-034-15-009" after "Parcel ID:" The number 
will be different each time but always the same format.
I think I can match "Parcel ID:" but not sure how to get the number 
after. "Parcel ID:" only occurs once in the document.


is this how i need to start?
pid = re.compile('Parcel ID: ')

Basically I am completely lost and am not finding examples I find helpful.

I am getting the html using myurl=urllib.urlopen(). 
Can I use RE like this
thenum=pid.match(myurl) 



I think the two key things I need to know are
1, how do I get the text after a match?
2, when I use myurl=urllib.urlopen(http://...). can I use the myurl 
as the string in a RE, thenum=pid.match(myurl)



Something like:

pid = re.compile(r'Parcel ID: (\d+(?:-\d+)*)')
myurl = urllib.urlopen(url)
text = myurl.read()
myurl.close()
thenum = pid.search(text).group(1)

Although BeautifulSoup is the preferred solution.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression Help

2008-02-26 Thread John Machin
On Feb 27, 6:28 am, [EMAIL PROTECTED] wrote:
> Hi All,
>
> I have a python utility which helps to generate an excel file for
> language translation. For any new language, we will generate the excel
> file which will have the English text and column for interested
> translation language. The translator  will provide the language string
> and again I will have python utility to read the excel file target
> language string and update/generate the resource file & database
> records. Our application is VC++ application, we use MS Access db.
>
> We have string table like this.
>
> "STRINGTABLE
> BEGIN
> IDS_CONTEXT_API_ "API Totalizer Control Dialog"
> IDS_CONTEXT "Gas Analyzer"
> END
>
> STRINGTABLE
> BEGIN
> ID_APITOTALIZER_CONTROL
> "Start, stop, and reset API volume flow
> \nTotalizer Control"
> END
> "
> this repeats.
>
> I read the file line by line and pick the contents inside the
> STRINGTABLE.
>
> I want to use the regular expression while should give me all the
> entries with in
> STRINGTABLE
> BEGIN
> <>
> END
>
> I tried little bit, but no luck. Note that it is multi-line string
> entries which we cannot make as single line
>

Looks to me like you have a very simple grammar:
entry ::= id quoted_string

id is matched by r'[A-Z]+[A-Z_]+'
quoted_string is matched by r'"[^"]*"'

So a pattern which will pick out one entry would be something like
r'([A-Z]+[A-Z_]+)\s+("[^"]*")'
Not that using \s+ (whitespace) allows for having \n etc between id
and quoted_string.

You need to build a string containing all the lines between BEGIN and
END, and then use re.findall.

If you still can't get it to work, ask again -- but do show the code
from your best attempt, and reduce ambiguity by showing your test
input as a Python expression e.g.
test1_in = """\
ID_F "fough"
ID_B_
"barre"
ID__Z
"zotte start
  zotte end"
"""
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression Help

2008-03-16 Thread Duncan Booth
"santhosh kumar" <[EMAIL PROTECTED]> wrote:

>  I have text like ,
> STRINGTABLE
> BEGIN
> ID_NEXT_PANE"Cambiar a la siguiente sección de laventana
> \nSiguiente sección"
> ID_PREV_PANE"Regresar a la sección anterior de
> laventana\nSección anterior"
> END
> STRINGTABLE
> BEGIN
> ID_VIEW_TOOLBAR "Mostrar u ocultar la barra de
> herramientas\nMostrar/Ocultar la barra de herramientas"
> ID_VIEW_STATUS_BAR  "Mostrar u ocultar la barra de
> estado\nMostrar/Ocultar la barra de estado"
> END
> 
> 
> ..
> and i need to parse from STRINGTABLE to END as a list object. whatkind of
> regular expression should i write.
> 

I doubt very much whether you want any regular expressions at all. I'd do 
something alone these lines:

find a line=="STRINGTABLE"
assert the next line=="BEGIN"
then until we find a line=="END":
idvalue = line.strip().split(None,1)
assert len(idvalue)==2
result.append(idvalue)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular expression help

2008-07-18 Thread Russell Blau
<[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> I am new to Python, with a background in scientific computing. I'm
> trying to write a script that will take a file with lines like
>
> c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
> 3pv=0
>
> extract the values of afrac and etot and plot them.
...

> What is being stored in energy is '<_sre.SRE_Match object at
> 0x2a955e4ed0>', not '-11.020107'. Why?

because the re.match() method returns a match object, as documented at 
http://www.python.org/doc/current/lib/match-objects.html

But this looks like a problem where regular expressions are overkill. 
Assuming all your lines are formatted as in the example above (every value 
you are interested in contains an equals sign and is surrounded by spaces), 
you could do this:

values = {}
for expression in line.split(" "):
if "=" in expression:
name, val = expression.split("=")
values[name] = val

I'd wager that this will run a fair bit faster than any regex-based 
solution.  Then you just use values['afrac'] and values['etot'] when you 
need them.

And when you get to be a really hard-core Pythonista, you could write the 
whole routine above in one line, but this seems clearer.  ;-)

Russ



--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular expression help

2008-07-18 Thread Brad

[EMAIL PROTECTED] wrote:

Hello,

I am new to Python, with a background in scientific computing. I'm
trying to write a script that will take a file with lines like

c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
3pv=0

extract the values of afrac and etot...


Why not just split them out instead of using REs?

fp = open("test.txt")
lines = fp.readlines()
fp.close()

for line in lines:
split = line.split()
for pair in split:
pair_split = pair.split("=")
if len(pair_split) == 2:
try:
print pair_split[0], "is", pair_split[1]
except:
pass

Results:

IDLE 1.2.2   No Subprocess 
>>>
afrac is .7
mmom is 0
sev is -9.56646
erep is 0
etot is -11.020107
emad is -3.597647
3pv is 0
>>>
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular expression help

2008-07-18 Thread Gerard flanagan

[EMAIL PROTECTED] wrote:

Hello,

I am new to Python, with a background in scientific computing. I'm
trying to write a script that will take a file with lines like

c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
3pv=0

extract the values of afrac and etot and plot them. I'm really
struggling with getting the values of efrac and etot. So far I have
come up with (small snippet of script just to get the energy, etot):

def get_data_points(filename):
file = open(filename,'r')
data_points = []
while 1:
line = file.readline()
if not line: break
energy = get_total_energy(line)
data_points.append(energy)
return data_points

def get_total_energy(line):
rawstr = r"""(?P.*?)=(?P.*?)\s"""
p = re.compile(rawstr)
return p.match(line,5)

What is being stored in energy is '<_sre.SRE_Match object at
0x2a955e4ed0>', not '-11.020107'. Why? 




1. Consider using the 'split' method on each line rather than regexes
2. In your code you are compiling the regex for every line in the file, 
you should lift it out of the 'get_total-energy' function so that the 
compilation is only done once.
3. A Match object has a 'groups' function which is what you need to 
retrieve the data

4. Also look at the findall method:

data = 'c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 
emad=-3.597647 3pv=0 '


import re

rx = re.compile(r'(\w+)=(\S+)')

data = dict(rx.findall(data))

print data

hth

G.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular expression help

2008-07-18 Thread Nick Dumas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I think you're over-complicating this. I'm assuming that you're going to
do a line graph of some sorta, and each new line of the file contains a
new set of data.

The problem you mentioned with your regex returning a match object
rather than a string is because you're simply using a re function that
doesn't return strings. re.findall() is what you want. That being said,
here is working code to mine data from your file.

[code]
line = 'c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107
mad=-3.597647 3pv=0'

energypat = r'\betot=(-?\d*?[.]\d*)'

#Note: To change the data grabbed from the line, you can change the
#'etot' to 'afrac' or 'emad' or anything that doesn't contain a regex
#special character.

energypat = re.compile(energypat)

re.findall(energypat, line)# returns a STRING containing '-12.020107'

[/code]

This returns a string, which is easy enough to convert to an int. After
that, you can datapoints.append() to your heart's content. Good luck
with your work.

[EMAIL PROTECTED] wrote:
> Hello,
> 
> I am new to Python, with a background in scientific computing. I'm
> trying to write a script that will take a file with lines like
> 
> c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
> 3pv=0
> 
> extract the values of afrac and etot and plot them. I'm really
> struggling with getting the values of efrac and etot. So far I have
> come up with (small snippet of script just to get the energy, etot):
> 
> def get_data_points(filename):
> file = open(filename,'r')
> data_points = []
> while 1:
> line = file.readline()
> if not line: break
> energy = get_total_energy(line)
> data_points.append(energy)
> return data_points
> 
> def get_total_energy(line):
> rawstr = r"""(?P.*?)=(?P.*?)\s"""
> p = re.compile(rawstr)
> return p.match(line,5)
> 
> What is being stored in energy is '<_sre.SRE_Match object at
> 0x2a955e4ed0>', not '-11.020107'. Why? I've been struggling with
> regular expressions for two days now, with no luck. Could someone
> please put me out of my misery and give me a clue as to what's going
> on? Apologies if it's blindingly obvious or if this question has been
> asked and answered before.
> 
> Thanks,
> 
> Nicole
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkiAqiAACgkQLMI5fndAv9h7HgCfU6a7v1nE5iLYcUPbXhC6sfU7
mpkAn1Q/DyOI4Zo7QJhF9zqfqCq6boXv
=L2VZ
-END PGP SIGNATURE-
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular expression help

2008-07-18 Thread nclbndk759
On Jul 18, 3:35 pm, Nick Dumas <[EMAIL PROTECTED]> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> I think you're over-complicating this. I'm assuming that you're going to
> do a line graph of some sorta, and each new line of the file contains a
> new set of data.
>
> The problem you mentioned with your regex returning a match object
> rather than a string is because you're simply using a re function that
> doesn't return strings. re.findall() is what you want. That being said,
> here is working code to mine data from your file.
>
> [code]
> line = 'c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107
> mad=-3.597647 3pv=0'
>
> energypat = r'\betot=(-?\d*?[.]\d*)'
>
> #Note: To change the data grabbed from the line, you can change the
> #'etot' to 'afrac' or 'emad' or anything that doesn't contain a regex
> #special character.
>
> energypat = re.compile(energypat)
>
> re.findall(energypat, line)# returns a STRING containing '-12.020107'
>
> [/code]
>
> This returns a string, which is easy enough to convert to an int. After
> that, you can datapoints.append() to your heart's content. Good luck
> with your work.
>
>
>
> [EMAIL PROTECTED] wrote:
> > Hello,
>
> > I am new to Python, with a background in scientific computing. I'm
> > trying to write a script that will take a file with lines like
>
> > c afrac=.7 mmom=0 sev=-9.56646 erep=0 etot=-11.020107 emad=-3.597647
> > 3pv=0
>
> > extract the values of afrac and etot and plot them. I'm really
> > struggling with getting the values of efrac and etot. So far I have
> > come up with (small snippet of script just to get the energy, etot):
>
> > def get_data_points(filename):
> >     file = open(filename,'r')
> >     data_points = []
> >     while 1:
> >         line = file.readline()
> >         if not line: break
> >         energy = get_total_energy(line)
> >         data_points.append(energy)
> >     return data_points
>
> > def get_total_energy(line):
> >     rawstr = r"""(?P.*?)=(?P.*?)\s"""
> >     p = re.compile(rawstr)
> >     return p.match(line,5)
>
> > What is being stored in energy is '<_sre.SRE_Match object at
> > 0x2a955e4ed0>', not '-11.020107'. Why? I've been struggling with
> > regular expressions for two days now, with no luck. Could someone
> > please put me out of my misery and give me a clue as to what's going
> > on? Apologies if it's blindingly obvious or if this question has been
> > asked and answered before.
>
> > Thanks,
>
> > Nicole
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.9 (MingW32)
> Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org
>
> iEYEARECAAYFAkiAqiAACgkQLMI5fndAv9h7HgCfU6a7v1nE5iLYcUPbXhC6sfU7
> mpkAn1Q/DyOI4Zo7QJhF9zqfqCq6boXv
> =L2VZ
> -END PGP SIGNATURE-

Thanks guys :-)
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular expression help

2008-07-18 Thread Marc 'BlackJack' Rintsch
On Fri, 18 Jul 2008 10:04:29 -0400, Russell Blau wrote:

> values = {}
> for expression in line.split(" "):
> if "=" in expression:
> name, val = expression.split("=")
> values[name] = val
> […]
>
> And when you get to be a really hard-core Pythonista, you could write
> the whole routine above in one line, but this seems clearer.  ;-)

I know it's a matter of taste but I think the one liner is still clear
(enough)::

  values = dict(s.split('=') for s in line.split() if '=' in s)

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python's regular expression help

2010-04-29 Thread Dodo

Le 29/04/2010 20:00, goldtech a écrit :

Hi,
Trying to start out with simple things but apparently there's some
basics I need help with. This works OK:

import re
p = re.compile('(ab*)(sss)')
m = p.match( 'absss' )
m.group(0)

'absss'

m.group(1)

'ab'

m.group(2)

'sss'
...
But two questions:

How can I operate a regex on a string variable?
I'm doing something wrong here:


f=r'abss'
f

'abss'

m = p.match( f )
m.group(0)

Traceback (most recent call last):
   File "", line 1, in
 m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'

How do I implement a regex on a multiline string?  I thought this
might work but there's problem:


p = re.compile('(ab*)(sss)', re.S)
m = p.match( 'ab\nsss' )
m.group(0)

Traceback (most recent call last):
   File "", line 1, in
 m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'




Thanks for the newbie regex help, Lee


for multiline, I use re.DOTALL

I do not know match(), findall is pretty efficient :
my = "LINK"
res = re.findall(">(.*?)<",my)
>>> res
['LINK']

Dorian
--
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression help

2010-04-29 Thread MRAB

goldtech wrote:

Hi,
Trying to start out with simple things but apparently there's some
basics I need help with. This works OK:

import re
p = re.compile('(ab*)(sss)')
m = p.match( 'absss' )
m.group(0)

'absss'

m.group(1)

'ab'

m.group(2)

'sss'
...
But two questions:

How can I operate a regex on a string variable?
I'm doing something wrong here:


f=r'abss'
f

'abss'

m = p.match( f )
m.group(0)

Traceback (most recent call last):
  File "", line 1, in 
m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'


Look closely: the regex contains 3 letter 's', but the string referred
to by f has only 2.


How do I implement a regex on a multiline string?  I thought this
might work but there's problem:


p = re.compile('(ab*)(sss)', re.S)
m = p.match( 'ab\nsss' )
m.group(0)

Traceback (most recent call last):
  File "", line 1, in 
m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'

Thanks for the newbie regex help, Lee


The string contains a newline between the 'b' and the 's', but the regex
isn't expecting any newline (or any other character) between the 'b' and
the 's', hence no match.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression help

2010-04-29 Thread Tim Chase

On 04/29/2010 01:00 PM, goldtech wrote:

Trying to start out with simple things but apparently there's some
basics I need help with. This works OK:

import re
p = re.compile('(ab*)(sss)')
m = p.match( 'absss' )



f=r'abss'
f

'abss'

m = p.match( f )
m.group(0)

Traceback (most recent call last):
   File "", line 1, in
 m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'


'absss' != 'abss'

Your regexp looks for 3 "s", your "f" contains only 2.  So the 
regexp object doesn't, well, match.  Try


  f = 'absss'

and it will work.  As an aside, using raw-strings for this text 
doesn't change anything, but if you want, you _can_ write it as


  f = r'absss'

if it will make you feel better :)


How do I implement a regex on a multiline string?  I thought this
might work but there's problem:


p = re.compile('(ab*)(sss)', re.S)
m = p.match( 'ab\nsss' )
m.group(0)

Traceback (most recent call last):
   File "", line 1, in
 m.group(0)
AttributeError: 'NoneType' object has no attribute 'group'


Well, it depends on what you want to do -- regexps are fairly 
precise, so if you want to allow whitespace between the two, you 
can use


  r = re.compile(r'(ab*)\s*(sss)')

If you want to allow whitespace anywhere, it gets uglier, and 
your capture/group results will contain that whitespace:


  r'(a\s*b*)\s*(s\s*s\s*s)'

Alternatively, if you don't want to allow arbitrary whitespace 
but only newlines, you can use "\n*" instead of "\s*"


-tkc



--
http://mail.python.org/mailman/listinfo/python-list


Re: Python's regular expression help

2010-04-29 Thread goldtech
On Apr 29, 11:49 am, Tim Chase  wrote:
> On 04/29/2010 01:00 PM, goldtech wrote:
>
> > Trying to start out with simple things but apparently there's some
> > basics I need help with. This works OK:
>  import re
>  p = re.compile('(ab*)(sss)')
>  m = p.match( 'absss' )
>
>  f=r'abss'
>  f
> > 'abss'
>  m = p.match( f )
>  m.group(0)
> > Traceback (most recent call last):
> >    File "", line 1, in
> >      m.group(0)
> > AttributeError: 'NoneType' object has no attribute 'group'
>
> 'absss' != 'abss'
>
> Your regexp looks for 3 "s", your "f" contains only 2.  So the
> regexp object doesn't, well, match.  Try
>
>    f = 'absss'
>
> and it will work.  As an aside, using raw-strings for this text
> doesn't change anything, but if you want, you _can_ write it as
>
>    f = r'absss'
>
> if it will make you feel better :)
>
> > How do I implement a regex on a multiline string?  I thought this
> > might work but there's problem:
>
>  p = re.compile('(ab*)(sss)', re.S)
>  m = p.match( 'ab\nsss' )
>  m.group(0)
> > Traceback (most recent call last):
> >    File "", line 1, in
> >      m.group(0)
> > AttributeError: 'NoneType' object has no attribute 'group'
>
> Well, it depends on what you want to do -- regexps are fairly
> precise, so if you want to allow whitespace between the two, you
> can use
>
>    r = re.compile(r'(ab*)\s*(sss)')
>
> If you want to allow whitespace anywhere, it gets uglier, and
> your capture/group results will contain that whitespace:
>
>    r'(a\s*b*)\s*(s\s*s\s*s)'
>
> Alternatively, if you don't want to allow arbitrary whitespace
> but only newlines, you can use "\n*" instead of "\s*"
>
> -tkc

Yes, most of my problem is w/my patterns not w/any python re syntax.

I thought re.S will take a multiline string with any spaces or
newlines and make it appear as one line to the regex. Make "/n" be
ignored in a way...still playing w/it. Thanks for the help!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python regular expression help

2007-04-11 Thread liupeng
pattern = re.compile(r'\w+\s*=\s*[0-9]*.[0-9]*\s*')
lists = pattern.findall(s)
print lists
['a=4 ', 'b=3.4 ', 'c=4.5']
On Wed, Apr 11, 2007 at 06:10:07PM -0700, Qilong Ren wrote:
> Hi, everyone,
> 
> I am extracting some information from a given string using python RE. The
> string is ,for example,
>s = 'a = 4 b =3.4 5.4 c = 4.5'
> What I want is :
>a = 4
> b = 3.4 5.4
>c = 4.5
> Right now I use :
>pattern = re.compile(r'\w+\s*=\s*.*?\s+')
>lists = pattern.findall(s)
> It works for the string like 'a = 4 b = 3.4 c = 4.5', but does not work with
> strings like 'a=4 b=3.4 5.4 c = 4.5'
> 
> Any suggestion?
> 
> Thanks,Qilong
> 
> ━━━
> Don't get soaked. Take a quick peak at the forecast
> with theYahoo! Search weather shortcut.

> -- 
> http://mail.python.org/mailman/listinfo/python-list


signature.asc
Description: Digital signature
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regular expression help

2007-04-11 Thread Qilong Ren
Hi,

Thanks for reply. That actually is not what I want. Strings I am dealing with 
may look like this:
 s = 'a = 4.5 b = 'h'  'd' c = 4.5 3.5'
What I want is
 a = 4.5
 b = 'h' 'd'
 c = 4.5 3.5


- Original Message 
From: liupeng <[EMAIL PROTECTED]>
To: python-list@python.org
Sent: Wednesday, April 11, 2007 6:41:30 PM
Subject: Re: python regular expression help

pattern = re.compile(r'\w+\s*=\s*[0-9]*.[0-9]*\s*')
lists = pattern.findall(s)
print lists
['a=4 ', 'b=3.4 ', 'c=4.5']
On Wed, Apr 11, 2007 at 06:10:07PM -0700, Qilong Ren wrote:
> Hi, everyone,
> 
> I am extracting some information from a given string using python RE. The
> string is ,for example,
>s = 'a = 4 b =3.4 5.4 c = 4.5'
> What I want is :
>a = 4
> b = 3.4 5.4
>c = 4.5
> Right now I use :
>pattern = re.compile(r'\w+\s*=\s*.*?\s+')
>lists = pattern.findall(s)
> It works for the string like 'a = 4 b = 3.4 c = 4.5', but does not work with
> strings like 'a=4 b=3.4 5.4 c = 4.5'
> 
> Any suggestion?
> 
> Thanks,Qilong
> 
> ━━━
> Don't get soaked. Take a quick peak at the forecast
> with theYahoo! Search weather shortcut.

> -- 
> http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list






   

Need Mail bonding?
Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.
http://answers.yahoo.com/dir/?link=list&sid=396546091-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regular expression help

2007-04-11 Thread 7stud
On Apr 11, 7:41 pm, liupeng <[EMAIL PROTECTED]> wrote:
> pattern = re.compile(r'\w+\s*=\s*[0-9]*.[0-9]*\s*')
> lists = pattern.findall(s)
> print lists
> ['a=4 ', 'b=3.4 ', 'c=4.5']
>
> On Wed, Apr 11, 2007 at 06:10:07PM -0700, Qilong Ren wrote:
> > Hi, everyone,
>
> > I am extracting some information from a given string using python RE. The
> > string is ,for example,
> >    s = 'a = 4 b =3.4 5.4 c = 4.5'
> > What I want is :
> >    a = 4
> >     b = 3.4 5.4
> >    c = 4.5
> > Right now I use :
> >    pattern = re.compile(r'\w+\s*=\s*.*?\s+')
> >    lists = pattern.findall(s)
> > It works for the string like 'a = 4 b = 3.4 c = 4.5', but does not work with
> > strings like 'a=4 b=3.4 5.4 c = 4.5'
>
> > Any suggestion?
>
> > Thanks,Qilong
>
> > ━━━ 
> > 
> > Don't get soaked. Take a quick peak at the forecast
> > with theYahoo! Search weather shortcut.
> > --
> >http://mail.python.org/mailman/listinfo/python-list
>
>  signature.asc
> 1KDownload

Try this:

import re

s = 'a = 4 b =3.4 5.4 c = 4.5'
r = re.compile("[a-z]+.*?(?=[a-z]|$)" )
l = r.findall(s)
print l
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regular expression help

2007-04-11 Thread Gabriel Genellina
En Wed, 11 Apr 2007 23:14:01 -0300, Qilong Ren <[EMAIL PROTECTED]>  
escribió:

> Thanks for reply. That actually is not what I want. Strings I am dealing  
> with may look like this:
>  s = 'a = 4.5 b = 'h'  'd' c = 4.5 3.5'
> What I want is
>  a = 4.5
>  b = 'h' 'd'
>  c = 4.5 3.5

That's a bit tricky. You have LHS = RHS where RHS includes all the  
following text *except* the very next word before the following = (which  
is the LHS of the next expression). Or something like that :)

py> import re
py> s = "a = 4.5 b = 'h'  'd' c = 4.5 3.5"
py> r = re.compile(r"\w+\s*=\s*.*?(?=\w+\s*=|$)")
py> for item in r.findall(s):
...   print item
...
a = 4.5
b = 'h'  'd'
c = 4.5 3.5

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python regular expression help

2007-04-11 Thread Qilong Ren

Hi,

I don't quite understand the regular expression:
  re.compile("[a-z]+.*?(?=[a-z]|$)" )
and I tried. In some cases it works. But if the string looks like:
   s = 'a = 3.4 b = 4.5 5.6 c = "h","d"'
it failed.

What I came up with is :
 names = re.compile(r'(\w+)\s*=').findall(s)
the corresponding values
values = re.split(r'\w+\s*=',s)[1:]
It dose not look good but it works. What do you think?

Thanks,Qilong



- Original Message 
From: 7stud <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Wednesday, April 11, 2007 8:27:57 PM
Subject: Re: python regular expression help

On Apr 11, 7:41 pm, liupeng <[EMAIL PROTECTED]> wrote:
> pattern = re.compile(r'\w+\s*=\s*[0-9]*.[0-9]*\s*')
> lists = pattern.findall(s)
> print lists
> ['a=4 ', 'b=3.4 ', 'c=4.5']
>
> On Wed, Apr 11, 2007 at 06:10:07PM -0700, Qilong Ren wrote:
> > Hi, everyone,
>
> > I am extracting some information from a given string using python RE. The
> > string is ,for example,
> >s = 'a = 4 b =3.4 5.4 c = 4.5'
> > What I want is :
> >a = 4
> > b = 3.4 5.4
> >c = 4.5
> > Right now I use :
> >pattern = re.compile(r'\w+\s*=\s*.*?\s+')
> >lists = pattern.findall(s)
> > It works for the string like 'a = 4 b = 3.4 c = 4.5', but does not work with
> > strings like 'a=4 b=3.4 5.4 c = 4.5'
>
> > Any suggestion?
>
> > Thanks,Qilong
>
> > ━━━ 
> > 
> > Don't get soaked. Take a quick peak at the forecast
> > with theYahoo! Search weather shortcut.
> > --
> >http://mail.python.org/mailman/listinfo/python-list
>
>  signature.asc
> 1KDownload

Try this:

import re

s = 'a = 4 b =3.4 5.4 c = 4.5'
r = re.compile("[a-z]+.*?(?=[a-z]|$)" )
l = r.findall(s)
print l
-- 
http://mail.python.org/mailman/listinfo/python-list






   

Be a PS3 game guru.
Get your game face on with the latest PS3 news and previews at Yahoo! Games.
http://videogames.yahoo.com/platform?platform=120121-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python regular expression help

2007-04-11 Thread 7stud
On Apr 11, 10:50 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
wrote:
> En Wed, 11 Apr 2007 23:14:01 -0300, Qilong Ren <[EMAIL PROTECTED]>  
> escribió:
>
> > Thanks for reply. That actually is not what I want. Strings I am dealing  
> > with may look like this:
> >  s = 'a = 4.5 b = 'h'  'd' c = 4.5 3.5'
> > What I want is
> >  a = 4.5
> >  b = 'h' 'd'
> >  c = 4.5 3.5

I suppose next you'll post your strings can also  look like this:

"[EMAIL PROTECTED]@[EMAIL PROTECTED]@%12341234qeerasdfdae"

and you want "A = 3"


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python regular expression help

2007-04-11 Thread attn . steven . kuo
On Apr 11, 9:50 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
wrote:
> En Wed, 11 Apr 2007 23:14:01 -0300, Qilong Ren <[EMAIL PROTECTED]>
> escribió:
>
> > Thanks for reply. That actually is not what I want. Strings I am dealing
> > with may look like this:
> >  s = 'a = 4.5 b = 'h'  'd' c = 4.5 3.5'
> > What I want is
> >  a = 4.5
> >  b = 'h' 'd'
> >  c = 4.5 3.5
>
> That's a bit tricky. You have LHS = RHS where RHS includes all the
> following text *except* the very next word before the following = (which
> is the LHS of the next expression). Or something like that :)
>
> py> import re
> py> s = "a = 4.5 b = 'h'  'd' c = 4.5 3.5"
> py> r = re.compile(r"\w+\s*=\s*.*?(?=\w+\s*=|$)")
> py> for item in r.findall(s):
> ...   print item
> ...
> a = 4.5
> b = 'h'  'd'
> c = 4.5 3.5
>


Another way is to use split:

import re

lhs = re.compile(r'\s*(\b\w+\s*=)')
for s in [ "a = 4 b =3.4 5.4 c = 4.5",
"a = 4.5 b = 'h'  'd' c = 4.5 3.5"]:
tokens = lhs.split(s)
results = [tokens[_] + tokens[_+1] for _ in range(1,len(tokens),
2)]
print s
print results

--
Regards,
Steven


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python regular expression help

2007-04-11 Thread Paul McGuire
On Apr 11, 11:50 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
wrote:
> En Wed, 11 Apr 2007 23:14:01 -0300, Qilong Ren <[EMAIL PROTECTED]>  
> escribió:
>
> > Thanks for reply. That actually is not what I want. Strings I am dealing  
> > with may look like this:
> >  s = 'a = 4.5 b = 'h'  'd' c = 4.5 3.5'
> > What I want is
> >  a = 4.5
> >  b = 'h' 'd'
> >  c = 4.5 3.5
>
> That's a bit tricky. You have LHS = RHS where RHS includes all the  
> following text *except* the very next word before the following = (which  
> is the LHS of the next expression). Or something like that :)
>
> py> import re
> py> s = "a = 4.5 b = 'h'  'd' c = 4.5 3.5"
> py> r = re.compile(r"\w+\s*=\s*.*?(?=\w+\s*=|$)")
> py> for item in r.findall(s):
> ...   print item
> ...
> a = 4.5
> b = 'h'  'd'
> c = 4.5 3.5
>
> --
> Gabriel Genellina

The pyparsing version is a bit more readable, probably simpler to come
back later to expand definition of varName, for example.

from pyparsing import
Word,alphas,nums,FollowedBy,sglQuotedString,OneOrMore

realNum = Word(nums,nums+".").setParseAction(lambda t:float(t[0]))
varName = Word(alphas)
LHS = varName + FollowedBy("=")
RHSval = sglQuotedString | realNum | varName
RHS = OneOrMore( ~LHS + RHSval )
assignment = LHS.setResultsName("LHS") + '=' +
RHS.setResultsName("RHS")

s = "a = 4.5 b = 'h'  'd' c = 4.5 3.5"
for a in assignment.searchString(s):
print a.LHS, '=', a.RHS

prints:
['a'] = [4.5]
['b'] = ["'h'", "'d'"]
['c'] = [4.5, 3.5]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python regular expression help

2007-04-12 Thread 7stud
On Apr 11, 11:15 pm, [EMAIL PROTECTED] wrote:
> On Apr 11, 9:50 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
> lhs = re.compile(r'\s*(\b\w+\s*=)')
> for s in [ "a = 4 b =3.4 5.4 c = 4.5",
> "a = 4.5 b = 'h'  'd' c = 4.5 3.5"]:
> tokens = lhs.split(s)
> results = [tokens[_] + tokens[_+1] for _ in range(1,len(tokens),

The only thing I can think when I look at that is: what a syntactic
abomination.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python regular expression help

2007-04-12 Thread Qilong Ren
Hi,

Yeah, a little bit tricky. Actually it is part of some Fortran input file.

Thanks for suggestion! It helps a lot!

Thanks,Qilong

- Original Message 
From: Gabriel Genellina <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Wednesday, April 11, 2007 9:50:00 PM
Subject: Re: python regular expression help

En Wed, 11 Apr 2007 23:14:01 -0300, Qilong Ren <[EMAIL PROTECTED]>  
escribió:

> Thanks for reply. That actually is not what I want. Strings I am dealing  
> with may look like this:
>  s = 'a = 4.5 b = 'h'  'd' c = 4.5 3.5'
> What I want is
>  a = 4.5
>  b = 'h' 'd'
>  c = 4.5 3.5

That's a bit tricky. You have LHS = RHS where RHS includes all the  
following text *except* the very next word before the following = (which  
is the LHS of the next expression). Or something like that :)

py> import re
py> s = "a = 4.5 b = 'h'  'd' c = 4.5 3.5"
py> r = re.compile(r"\w+\s*=\s*.*?(?=\w+\s*=|$)")
py> for item in r.findall(s):
...   print item
...
a = 4.5
b = 'h'  'd'
c = 4.5 3.5

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list







   

Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.
http://autos.yahoo.com/new_cars.html -- 
http://mail.python.org/mailman/listinfo/python-list

need some regular expression help

2006-10-07 Thread Chris
I need a pattern that  matches a string that has the same number of '('
as ')':
findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
'((2x+2)sin(x))', '(log(2)/log(5))' ]
Can anybody help me out?

Thanks for any help!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-07 Thread Diez B. Roggisch

Chris wrote:
> I need a pattern that  matches a string that has the same number of '('
> as ')':
> findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
> '((2x+2)sin(x))', '(log(2)/log(5))' ]
> Can anybody help me out?

This is not possible with regular expressions - they can't "remember"
how many parens they already encountered.

You will need a real parser for this - pyparsing seems to be the most
popular choice today, I personally like spark. I'm sure you find an
example-grammar that will parse simple arithmetical expressions like
the one above.

Diez

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-07 Thread John Machin
Chris wrote:
> I need a pattern that  matches a string that has the same number of '('
> as ')':
> findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
> '((2x+2)sin(x))', '(log(2)/log(5))' ]
> Can anybody help me out?
>

No, there is so such pattern. You will have to code up a function.

Consider what your spec really is: '42^((2x+2)sin(x)) +
(log(2)/log(5))' has the same number of left and right parentheses; so
does the zero-length string; so does ') + (' -- perhaps you need to add
'and starts with a "("'

Consider what you are going to do with input like this:

print '(' + some_text + ')'

Maybe you need to do some lexical analysis and work at the level of
tokens rather than individual characters.

Which then raises the usual question: you have a perception that
regular expressions are the solution -- to what problem??

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-07 Thread hanumizzle
On 7 Oct 2006 15:00:29 -0700, Diez B. Roggisch <[EMAIL PROTECTED]> wrote:
>
> Chris wrote:
> > I need a pattern that  matches a string that has the same number of '('
> > as ')':
> > findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
> > '((2x+2)sin(x))', '(log(2)/log(5))' ]
> > Can anybody help me out?
>
> This is not possible with regular expressions - they can't "remember"
> how many parens they already encountered.

Remember that regular expressions are used to represent regular
grammars. Most regex engines actually aren't regular in that they
support fancy things like look-behind/ahead and capture groups...IIRC,
these cannot be part of a true regular expression library.

With that said, the quote-unquote regexes in Lua have a special
feature that supports balanced expressions. I believe Python has a
PCRE lib somewhere; you may be able to use the experimental ??{ }
construct in that case.

-- Theerasak
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-07 Thread Roy Smith
In article <[EMAIL PROTECTED]>,
 "Chris" <[EMAIL PROTECTED]> wrote:

> I need a pattern that  matches a string that has the same number of '('
> as ')':
> findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
> '((2x+2)sin(x))', '(log(2)/log(5))' ]
> Can anybody help me out?
> 
> Thanks for any help!

Why does it need to be a regex?  There is a very simple and well-known 
algorithm which does what you want.

Start with i=0.  Walk the string one character at a time, incrementing i 
each time you see a '(', and decrementing it each time you see a ')'.  At 
the end of the string, the count should be back to 0.  If at any time 
during the process, the count goes negative, you've got mis-matched 
parentheses.

The algorithm runs in O(n), same as a regex.

Regex is a wonderful tool, but it's not the answer to all problems.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-07 Thread Tim Chase
> Why does it need to be a regex?  There is a very simple and well-known 
> algorithm which does what you want.
> 
> Start with i=0.  Walk the string one character at a time, incrementing i 
> each time you see a '(', and decrementing it each time you see a ')'.  At 
> the end of the string, the count should be back to 0.  If at any time 
> during the process, the count goes negative, you've got mis-matched 
> parentheses.
> 
> The algorithm runs in O(n), same as a regex.
> 
> Regex is a wonderful tool, but it's not the answer to all problems.

Following Roy's suggestion, one could use something like:

 >>> s = '42^((2x+2)sin(x)) + (log(2)/log(5))'
 >>> d = {'(':1, ')':-1}
 >>> sum(d.get(c, 0) for c in s)
0


If you get a sum() > 0, then you have too many "(", and if you 
have sum() < 0, you have too many ")" characters.  A sum() of 0 
means there's the same number of parens.  It still doesn't solve 
the aforementioned problem of things like ')))((('  which is 
balanced, but psychotic. :)

-tkc




-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread Diez B. Roggisch

hanumizzle wrote:
> On 7 Oct 2006 15:00:29 -0700, Diez B. Roggisch <[EMAIL PROTECTED]> wrote:
> >
> > Chris wrote:
> > > I need a pattern that  matches a string that has the same number of '('
> > > as ')':
> > > findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
> > > '((2x+2)sin(x))', '(log(2)/log(5))' ]
> > > Can anybody help me out?
> >
> > This is not possible with regular expressions - they can't "remember"
> > how many parens they already encountered.
>
> Remember that regular expressions are used to represent regular
> grammars. Most regex engines actually aren't regular in that they
> support fancy things like look-behind/ahead and capture groups...IIRC,
> these cannot be part of a true regular expression library.

Certainly true, and it always gives me a hard time because I don't know
to which extend a regular expression nowadays might do the job because
of these extensions. It was so much easier back in the old times

> With that said, the quote-unquote regexes in Lua have a special
> feature that supports balanced expressions. I believe Python has a
> PCRE lib somewhere; you may be able to use the experimental ??{ }
> construct in that case.

Even if it has - I'm not sure if it really does you good, for several
reasons:

 - regexes - even enhanced ones - don't build trees. But that is what
you ultimately want
   from an expression like sin(log(x))

 - even if they are more powerful these days, the theory of context
free grammars still applies.
   so if what you need isn't LL(k) but LR(k), how do you specify that
to the regex engine?

 - the regexes are useful because of their compact notations, parsers
allow for better structured outcome 


Diez

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread Theerasak Photha
On 8 Oct 2006 01:49:50 -0700, Diez B. Roggisch <[EMAIL PROTECTED]> wrote:

> Even if it has - I'm not sure if it really does you good, for several
> reasons:
>
>  - regexes - even enhanced ones - don't build trees. But that is what
> you ultimately want
>from an expression like sin(log(x))
>
>  - even if they are more powerful these days, the theory of context
> free grammars still applies.
>so if what you need isn't LL(k) but LR(k), how do you specify that
> to the regex engine?
>
>  - the regexes are useful because of their compact notations, parsers
> allow for better structured outcome

Just wait for Perl 6 :D

-- Theerasak
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread bearophileHUGS
Tim Chase:
> It still doesn't solve the aforementioned problem
> of things like ')))((('  which is balanced, but psychotic. :)

This may solve the problem:

def balanced(txt):
d = {'(':1, ')':-1}
tot = 0
for c in txt:
tot += d.get(c, 0)
if tot < 0:
return False
return tot == 0

print balanced("42^((2x+2)sin(x)) + (log(2)/log(5))") # True
print balanced("42^((2x+2)sin(x) + (log(2)/log(5))") # False
print balanced("42^((2x+2)sin(x))) + (log(2)/log(5))") # False
print balanced(")))(((") # False

A possibile alternative for Py 2.5. The dict solution looks better, but
this may be faster:

def balanced2(txt):
tot = 0
for c in txt:
tot += 1 if c=="(" else (-1 if c==")" else 0)
if tot < 0:
return False
return tot == 0

Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote:

 > The dict solution looks better, but this may be faster:

it's slightly faster, but both your alternatives are about 10x slower 
than a straightforward:

def balanced(txt):
 return txt.count("(") == txt.count(")")



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread Mirco Wahab
Thus spoke Diez B. Roggisch (on 2006-10-08 10:49):
> Certainly true, and it always gives me a hard time because I don't know
> to which extend a regular expression nowadays might do the job because
> of these extensions. It was so much easier back in the old times

Right, in perl, this would be a no-brainer,
its documented all over the place, like:

   my $re;

   $re = qr{
(?:
  (?> [^\\()]+ | \\. )
   |
 \( (??{ $re }) \)
)*
}xs;

where you have a 'delayed execution'
of the

  (??{ $re })

which in the end makes the whole a thing
recursive one, it gets expanded and
executed if the match finds its way
to it.

Above regex will match balanced parens,
as in:

   my $good = 'a + (b / (c - 2)) * (d ^ (e+f))  ';
   my $bad1 = 'a + (b / (c - 2)  * (d ^ (e+f))  ';
   my $bad2 = 'a + (b / (c - 2)) * (d) ^ (e+f) )';

if you do:

   print "ok \n" if $good =~ /^$re$/;
   print "ok \n" if $bad1 =~ /^$re$/;
   print "ok \n" if $bad2 =~ /^$re$/;


This in some depth documented e.g. in
http://japhy.perlmonk.org/articles/tpj/2004-summer.html
(topic: Recursive Regexes)

Regards

M.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread Diez B. Roggisch
Mirco Wahab schrieb:
> Thus spoke Diez B. Roggisch (on 2006-10-08 10:49):
>> Certainly true, and it always gives me a hard time because I don't know
>> to which extend a regular expression nowadays might do the job because
>> of these extensions. It was so much easier back in the old times
> 
> Right, in perl, this would be a no-brainer,
> its documented all over the place, like:
> 
>my $re;
> 
>$re = qr{
> (?:
>   (?> [^\\()]+ | \\. )
>|
>  \( (??{ $re }) \)
> )*
> }xs;
> 
> where you have a 'delayed execution'
> of the
> 
>   (??{ $re })
> 
> which in the end makes the whole a thing
> recursive one, it gets expanded and
> executed if the match finds its way
> to it.
> 
> Above regex will match balanced parens,
> as in:
> 
>my $good = 'a + (b / (c - 2)) * (d ^ (e+f))  ';
>my $bad1 = 'a + (b / (c - 2)  * (d ^ (e+f))  ';
>my $bad2 = 'a + (b / (c - 2)) * (d) ^ (e+f) )';
> 
> if you do:
> 
>print "ok \n" if $good =~ /^$re$/;
>print "ok \n" if $bad1 =~ /^$re$/;
>print "ok \n" if $bad2 =~ /^$re$/;
> 
> 
> This in some depth documented e.g. in
> http://japhy.perlmonk.org/articles/tpj/2004-summer.html
> (topic: Recursive Regexes)

That clearly is a recursive grammar rule, and thus it can't be regular 
anymore :) But first of all, I find it ugly - the clean separation of 
lexical and syntactical analysis is better here, IMHO - and secondly, 
what are the properties of that parsing? Is it LL(k), LR(k), backtracking?

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread bearophileHUGS
Fredrik Lundh wrote:

> it's slightly faster, but both your alternatives are about 10x slower
> than a straightforward:
> def balanced(txt):
>  return txt.count("(") == txt.count(")")

I know, but if you read my post again you see that I have shown those
solutions to mark ")))(((" as bad expressions. Just counting the parens
isn't enough.

Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread Roy Smith
"Diez B. Roggisch" <[EMAIL PROTECTED]> wrote:
> Certainly true, and it always gives me a hard time because I don't know
> to which extend a regular expression nowadays might do the job because
> of these extensions. It was so much easier back in the old times

What old times?  I've been working with regex for mumble years and there's 
always been the problem that every implementation supports a slightly 
different syntax.  Even back in the "good old days", grep, awk, sed, and ed 
all had slightly different flavors.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread Theerasak Photha
On 10/8/06, Roy Smith <[EMAIL PROTECTED]> wrote:
> "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote:
> > Certainly true, and it always gives me a hard time because I don't know
> > to which extend a regular expression nowadays might do the job because
> > of these extensions. It was so much easier back in the old times
>
> What old times?  I've been working with regex for mumble years and there's
> always been the problem that every implementation supports a slightly
> different syntax.  Even back in the "good old days", grep, awk, sed, and ed
> all had slightly different flavors.

Which grep? Which awk? :)

-- Theerasak
-- 
http://mail.python.org/mailman/listinfo/python-list


Regular Expression help for parsing html tables

2006-10-28 Thread steve551979
Hello,

I am having some difficulty creating a regular expression for the
following string situation in html. I want to find a table that has
specific text in it and then extract the html just for that immediate
table.

the string would look something like this:

...stuff here...

...stuff here...

...stuff here...

...
text i'm searching for
...

...stuff here...

...stuff here...

...stuff here...


My question:  is there a way in RE to say:   "when I find this text I'm
looking for, search backwards and find the immediate instance of the
string ""  and then search forwards and find the immediate
instance of the string "".  " ?

any help is appreciated.

Steve.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression help for parsing html tables

2006-10-28 Thread Stefan Behnel
Hi Steve,

[EMAIL PROTECTED] wrote:
> I am having some difficulty creating a regular expression for the
> following string situation in html. I want to find a table that has
> specific text in it and then extract the html just for that immediate
> table.

Any reason why you can't use a real HTML parser and API (e.g. the one provided
by lxml)? That can really make things easier here.

http://codespeak.net/lxml/
http://codespeak.net/lxml/api.html#parsers
http://codespeak.net/lxml/api.html#trees-and-documents
http://effbot.org/zone/element-index.htm

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression help for parsing html tables

2006-10-29 Thread Odalrick

[EMAIL PROTECTED] skrev:

> Hello,
>
> I am having some difficulty creating a regular expression for the
> following string situation in html. I want to find a table that has
> specific text in it and then extract the html just for that immediate
> table.
>
> the string would look something like this:
>
> ...stuff here...
> 
> ...stuff here...
> 
> ...stuff here...
> 
> ...
> text i'm searching for
> ...
> 
> ...stuff here...
> 
> ...stuff here...
> 
> ...stuff here...
>
>
> My question:  is there a way in RE to say:   "when I find this text I'm
> looking for, search backwards and find the immediate instance of the
> string ""  and then search forwards and find the immediate
> instance of the string "".  " ?
>
> any help is appreciated.
>
> Steve.

It would have been easier if you'd said what the text you are looking
for is, but I think:

regex = re.compile( r'(.*?text you are looking for.*?)',
re.DOTALL )
match = regex.search( html_string )
found_table = match.group( 1 )

would work.

/Odalrick

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expression help for parsing html tables

2006-10-29 Thread Paddy

[EMAIL PROTECTED] wrote:
> Hello,
>
> I am having some difficulty creating a regular expression for the
> following string situation in html. I want to find a table that has
> specific text in it and then extract the html just for that immediate
> table.
>
> the string would look something like this:
>
> ...stuff here...
> 
> ...stuff here...
> 
> ...stuff here...
> 
> ...
> text i'm searching for
> ...
> 
> ...stuff here...
> 
> ...stuff here...
> 
> ...stuff here...
>
>
> My question:  is there a way in RE to say:   "when I find this text I'm
> looking for, search backwards and find the immediate instance of the
> string ""  and then search forwards and find the immediate
> instance of the string "".  " ?
>
> any help is appreciated.
>
> Steve.

Might searching the output of BeautifulSoup(html).prettify() make
things easier?

http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsing%20HTML

- Paddy

-- 
http://mail.python.org/mailman/listinfo/python-list


Regular expression help: unable to search ' # ' character in the file

2008-09-27 Thread dudeja . rajat
Hi,

Can some help me with the regular expression. I'm looking to search #
character in my file?

My file has contents:

###

Hello World

###

length = 10
breadth = 20
height = 30

###



###

Hello World

###

length = 20
breadth = 30
height = 40

###


I used the following search :

import re

fd = open(file, 'r')
line = fd.readline
pat1 = re.compile("\#*")
while(line):
mat1 = pat1.search(line)
if mat1:
print line
line = fd.readline()


But the above prints the whole file instead of the hash lines only.


Please help


Regards,
Rajat
--
http://mail.python.org/mailman/listinfo/python-list

Re: Regular expression help: unable to search ' # ' character in the file

2008-09-27 Thread Fredrik Lundh

[EMAIL PROTECTED] wrote:


import re

fd = open(file, 'r')
line = fd.readline
pat1 = re.compile("\#*")
while(line):
mat1 = pat1.search(line)
if mat1:
print line
line = fd.readline()


I strongly doubt that this is the code you used.


But the above prints the whole file instead of the hash lines only.


"*" means zero or more matches.  all lines is a file contain zero or 
more # characters.


but using a RE is overkill in this case, of course.  to check for a 
character or substring, use the "in" operator:


for line in open(file):
if "#" in line:
print line



--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular expression help: unable to search ' # ' character in the file

2008-09-27 Thread dudeja . rajat
On Sat, Sep 27, 2008 at 1:58 PM, Fredrik Lundh <[EMAIL PROTECTED]>wrote:

> [EMAIL PROTECTED] wrote:
>
>  import re
>>
>> fd = open(file, 'r')
>> line = fd.readline
>> pat1 = re.compile("\#*")
>>while(line):
>>mat1 = pat1.search(line)
>>if mat1:
>>print line
>>line = fd.readline()
>>
>
> I strongly doubt that this is the code you used.
>
>  But the above prints the whole file instead of the hash lines only.
>>
>
> "*" means zero or more matches.  all lines is a file contain zero or more #
> characters.
>
> but using a RE is overkill in this case, of course.  to check for a
> character or substring, use the "in" operator:
>
>for line in open(file):
>if "#" in line:
>print line
>
> 
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

Thanks Fredrik, this works. Indeed  it is a much better and cleaner
approach.

-- 
Regards,
Rajat
--
http://mail.python.org/mailman/listinfo/python-list