Vincent Davis wrote:
I think there are two parts to this question and I am sure lots I am missing. I am hoping an example will help me I have a html doc that I am trying to use regular expressions to get a value out of.
here is an example or the line
<td colspan='2'>Parcel ID: 39-034-15-009 </td>
I want to get the number "39-034-15-009" after "Parcel ID:" The number will be different each time but always the same format. I think I can match "Parcel ID:" but not sure how to get the number after. "Parcel ID:" only occurs once in the document.

is this how i need to start?
pid = re.compile('Parcel ID: ')

Basically I am completely lost and am not finding examples I find helpful.

I am getting the html using myurl=urllib.urlopen(). Can I use RE like this thenum=pid.match(myurl)

I think the two key things I need to know are
1, how do I get the text after a match?
2, when I use myurl=urllib.urlopen(http://.......). can I use the myurl as the string in a RE, thenum=pid.match(myurl)

Something like:

pid = re.compile(r'Parcel ID: (\d+(?:-\d+)*)')
myurl = urllib.urlopen(url)
text = myurl.read()
myurl.close()
thenum = pid.search(text).group(1)

Although BeautifulSoup is the preferred solution.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to