Re: Splitting on a word

Bernhard Holzmayer Thu, 14 Jul 2005 03:55:29 -0700

[EMAIL PROTECTED] wrote:

> Hi all,
> I am writing a script to visualize (and print)
> the web references hidden in the html files as:
> '<a href="web reference"> underlined reference</a>'
> Optimizing my code, I found that an essential step is:
> splitting on a word (in this case 'href').
> 
> I am asking if there is some alternative (more pythonic...):


Sure. The htmllib module provides HTMLparser.
Here's an example, run it with your HTML file as argument
and you'll see a list of all href's in the document.

#------------------------------------------------
#!/usr/bin/python
import htmllib

def test():
        import sys, formatter

        file = sys.argv[1]
        f = open(file, 'r')
        data = f.read()
        f.close()

        f = formatter.NullFormatter()
        p = htmllib.HTMLParser(f)
        p.feed(data)

        for a_link in p.anchorlist:
                print a_link

        p.close()

test()
#------------------------------------------------

I'm sure that this is far more Pythonic!

Bernhard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Splitting on a word

Reply via email to