In article <[EMAIL PROTECTED]>, anjesh <[EMAIL PROTECTED]> wrote: >On Apr 2, 12:54 am, "Dotan Cohen" <[EMAIL PROTECTED]> wrote: >> On 1 Apr 2007 07:56:04 -0700, Ulysse <[EMAIL PROTECTED]> wrote: >> >> > I have seen the Beautiful Soup online help and tried to apply that to >> > my problem. But it seems to be a little bit hard. I will rather try to >> > do this with regular expressions... >> >> If you think that Beautiful Soup is difficult than wait till you try >> to do this with regexes. Granted you know the exact format of the HTML >> you are scraping will help, if you ever need to parse HTML from an >> unknown source than Beautiful Soup is the only way to go. Not all HTML >> authors close their td and tr tags, and sometimes there are attributes >> to those tags. If you plan on ever reusing the code or the format of >> the HTML may change, then you are best off sticking with Beautiful >> Soup. >> >> Dotan Cohen >> >> http://lyricslist.com/http://what-is-what.com/ > > >Have you tried HTMLParser. It can do the task you want to perform >http://docs.python.org/lib/module-HTMLParser.html > >-anjesh >
Yes, except that these last two follow-ups UNDERstate the difficulty--in fact, the impossibility--of achieving adequate results on this problem with regular expressions. We'll help with the documentation for HTMLParser and BeautifulSoup. REs are an invitation to madness. <URL: http://www.unixreview.com/documents/s=10121/ur0702e/ > might amuse those who want to think more about REs. -- http://mail.python.org/mailman/listinfo/python-list