elsa wrote: > I'm new to both this forum and Python, and I've got a bit stuck trying > to learn how to parse HTML...
If what you want to do is *parse* the HTML instead of trying to *learn* how to parse it, you might want to give the existing (external) HTML parser libraries a try. There's lxml.html (extremely fast and fixes up broken HTML), html5lib (very slow, but very browser-like parse results) and BeautifulSoup (slow, but good encoding detection if you need that). Here are a couple of (only slightly biased) comparisons: http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/ http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/ > python sgmllib.py "path/to/my/file.html" .... example (1) > > this doesn't work for me. I think I have figured out the problem - > the error says > > "/System/Library/Frameworks/Python.framework/Versions/2.5/Resources/ > Python.app/Contents/MacOS/Python: can't open file 'sgmllib.py': [Errno > 2] No such file or directory" > > the problem is that this path is wrong. My sgmllib.py is in: > > "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/ > python2.5/sgmllib.py" You can use "python -m sgmllib" to call a module from the stdlib (or the PYTHONPATH, to be more accurate). But note that sgmllib is a particularly cumbersome way to deal with HTML. Stefan -- http://mail.python.org/mailman/listinfo/python-list