"Paul McGuire" <[EMAIL PROTECTED]> writes: > Here's a pyparsing program that reads my personal web page, and spits > out HTML with all of the HREF's reversed.
Parsing HTML isn't easy, which makes me wonder how good this solution really is. Not meant as a comment on the quality of this code or PyParsing, but as curiosity from someone who does a lot of [X}HTML herding. > -- Paul > (Download pyparsing at http://pyparsing.sourceforge.net.) If it were in the ports tree, I'd have grabbed it and tried it myself. But it isn't, so I'm going to be lazy and ask. If PyParsing really makes dealing with HTML this easy, I may package it as a port myself. > from pyparsing import Literal, quotedString > import urllib > > LT = Literal("<") > GT = Literal(">") > EQUALS = Literal("=") > htmlAnchor = LT + "A" + "HREF" + EQUALS + > quotedString.setResultsName("href") + GT > > def convertHREF(s,l,toks): > # do HREF conversion here - for demonstration, we will just reverse > them > print toks.href > return "<A HREF=%s>" % toks.href[::-1] > > htmlAnchor.setParseAction( convertHREF ) > > inputURL = "http://www.geocities.com/ptmcg" > inputPage = urllib.urlopen(inputURL) > inputHTML = inputPage.read() > inputPage.close() > > print htmlAnchor.transformString( inputHTML ) How well does it deal with other attributes in front of the href, like <A onClick="..." href="...">? How about if my HTML has things that look like HTML in attributes, like <TAG ATTRIBUTE="stuff<A HREF=stuff">? Thanks, <mike -- Mike Meyer <[EMAIL PROTECTED]> http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. -- http://mail.python.org/mailman/listinfo/python-list