Parsing HTML

mtuller Thu, 08 Feb 2007 11:41:03 -0800

I am trying to parse a webpage and extract information. I am trying to
use pyparser. Here is what I have:


from pyparsing import *
import urllib

# define basic text pattern
spanStart = Literal('<span class=\"hpPageText\">')

spanEnd = Literal('</span></td>')

printCount = spanStart + SkipTo(spanEnd) + spanEnd

# get printer addresses
printerURL = "http://printer.mydomain.com/hp/device/this.LCDispatcher?
nav=hp.Usage"
printerListPage = urllib.urlopen(printerURL)
printerListHTML = printerListPage.read()
printerListPage.close

for srvrtokens,startloc,endloc in
printCount.scanString(printerListHTML): print srvrtokens

print printCount


I have the last print statement to check what is being sent because I
am getting nothing back. What it sends is:
{"<span class="hpPageText">" SkipTo:("</span></td>") "</span></td>"}

If I pull out the "hpPageText" I get results back, but more than what
I want. I know it has something to do with escaping the quotation
marks, but I am puzzled as to how to do it.


Thanks,

Mike

-- 
http://mail.python.org/mailman/listinfo/python-list

Parsing HTML

Reply via email to