[EMAIL PROTECTED] wrote: > Hi all, > I am writing a script to visualize (and print) > the web references hidden in the html files as: > '<a href="web reference"> underlined reference</a>' > Optimizing my code, I found that an essential step is: > splitting on a word (in this case 'href'). > > I am asking if there is some alternative (more pythonic...):
Sure. The htmllib module provides HTMLparser. Here's an example, run it with your HTML file as argument and you'll see a list of all href's in the document. #------------------------------------------------ #!/usr/bin/python import htmllib def test(): import sys, formatter file = sys.argv[1] f = open(file, 'r') data = f.read() f.close() f = formatter.NullFormatter() p = htmllib.HTMLParser(f) p.feed(data) for a_link in p.anchorlist: print a_link p.close() test() #------------------------------------------------ I'm sure that this is far more Pythonic! Bernhard -- http://mail.python.org/mailman/listinfo/python-list