[EMAIL PROTECTED] wrote:
Hi everyone,
I am using Python's re module to extract some data from html. The
following code never returns, and I was wondering if someone can
explain to me why. Is this a problem with my regexp (I tried really
hard to find it?)?
[snip] html/xml string
regexp = re.compile("<li class=\"post\".*?<h4 class=\"desc\"><a href=
\"(.*?)\" rel=\"nofollow\">(.*?)</a>.*?</div>\s*(?:<p class=\"notes
\">(.*?)</p>)?.*?<div class=\"meta\">(?:to ((?:<a class=\"tag\".*?> )
+))*.*?<span class=\"date\" title=\"(.*?)\">.*?</span>\s*</div>.*?</
li>", re.DOTALL)
re.findall(regexp, s)
Python have several modules for parsing and working with xml. Do you
not know of them or is there some reason they won't work?
--
http://mail.python.org/mailman/listinfo/python-list