[Tutor] htmllib vs re question
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I want to parse some text from an HTML file that contains blocks of pre-formatted text. All I'm after is what's between the pre and /pre tags. My first thought was to use re for this, but looking through the Library Reference, I see the htmllib module. Is htmllib overkill for this job? The HTML file size varies, but I don't expect the size to exceed 150-200k. Speed is not a bug concern. What is the Pythonic way and why? Any recommendations or comments? Thanks, - -- Terry tvbareATsocketDOTnet -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.7 (GNU/Linux) iD8DBQFEELfcQvSnsfFzkV0RAreaAJ9qvD5GoA5a0qD15Wr0hJ4XLLNhiQCeKd1R XIqBMZWoIY66y8r5Rtgevqc= =cUhn -END PGP SIGNATURE- ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] htmllib vs re question
-Terry- wrote: I want to parse some text from an HTML file that contains blocks of pre-formatted text. All I'm after is what's between the pre and /pre tags. The HTML file size varies, but I don't expect the size to exceed 150-200k. Speed is not a bug concern. What is the Pythonic way and why? Any recommendations or comments? Try Beautiful Soup http://www.crummy.com/software/BeautifulSoup/ Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] htmllib
You're like some kind of god! That's exactly what I need. Thanks Ed On 05/10/05, Kent Johnson [EMAIL PROTECTED] wrote: Ed Singleton wrote: I want to dump a html file into a python object. Each nested tag would be a sub-object, attributes would be properties. So that I can use Python in a similar way to the way I use JavaScript within a web page. I don't know of a way to run Python from within a web page. But if you want to fetch an HTML page from a server and work with it (for example a web-scraping app), many people use BeautifulSoup for this. If you have well-formed HTML or XHTML you can use an XML parser as well but BS has the advantage of coping with badly-formed HTML. http://www.crummy.com/software/BeautifulSoup/ Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor