subject:"\[Tutor\] htmllib"

[Tutor] htmllib vs re question

2006-03-09 Thread -Terry-

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


I want to parse some text from an HTML file that contains
blocks of pre-formatted text. All I'm after is what's between
the pre and /pre tags.

My first thought was to use re for this, but looking through
the Library Reference, I see the htmllib module. Is htmllib
overkill for this job?

The HTML file size varies, but I don't expect the size to exceed
150-200k. Speed is not a bug concern.

What is the Pythonic way and why?

Any recommendations or comments?

Thanks,
- -- 
 Terry tvbareATsocketDOTnet

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.7 (GNU/Linux)

iD8DBQFEELfcQvSnsfFzkV0RAreaAJ9qvD5GoA5a0qD15Wr0hJ4XLLNhiQCeKd1R
XIqBMZWoIY66y8r5Rtgevqc=
=cUhn
-END PGP SIGNATURE-

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] htmllib vs re question

2006-03-09 Thread Kent Johnson

-Terry- wrote:
 I want to parse some text from an HTML file that contains
 blocks of pre-formatted text. All I'm after is what's between
 the pre and /pre tags.
 
 The HTML file size varies, but I don't expect the size to exceed
 150-200k. Speed is not a bug concern.
 
 What is the Pythonic way and why?
 
 Any recommendations or comments?

Try Beautiful Soup
http://www.crummy.com/software/BeautifulSoup/

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] htmllib

2005-10-05 Thread Ed Singleton

You're like some kind of god!

That's exactly what I need.

Thanks

Ed

On 05/10/05, Kent Johnson [EMAIL PROTECTED] wrote:
 Ed Singleton wrote:
  I want to dump a html file into a python object.  Each nested tag
  would be a sub-object, attributes would be properties.  So that I can
  use Python in a similar way to the way I use JavaScript within a web
  page.

 I don't know of a way to run Python from within a web page. But if you want 
 to fetch an HTML page from a server and work with it (for example a 
 web-scraping app), many people use BeautifulSoup for this. If you have 
 well-formed HTML or XHTML you can use an XML parser as well but BS has the 
 advantage of coping with badly-formed HTML.
 http://www.crummy.com/software/BeautifulSoup/

 Kent

 ___
 Tutor maillist  -  Tutor@python.org
 http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] htmllib vs re question

Re: [Tutor] htmllib vs re question

Re: [Tutor] htmllib

3 matches

Site Navigation

Mail list logo

Footer information