Re: [Tutor] Titles from a web page
"louis leichtnam" wrote I'm trying to write a program that looks in a webpage in find the titles of a subsection of the page: Can you help me out? I tried using regular expression but I keep hitting walls and I don't know what to do... Regular expressions are the wrong tool for parsing HTML unless you are searching for something very simple. There is an html parser in the Python standard library (*) that you can use if the HTML is reasonably well formed. If its sloppy you would be better with something like BeautifulSoup or lxml. If the page is written in XHTML then you could also use the element tree module which is designed for XML parsing. (*)In fact there are two! - htmllib and HTMLParser. The former is more powerful but more complex. Some brief examples can be found in my tutor here: http://www.alan-g.me.uk/tutor/tutwebc.htm Note, the topic is not complete, the last few sections are placeholders only... HTH, Alan G. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Titles from a web page
On May 5, 2011, at 07:16, James Mills wrote: > On Thu, May 5, 2011 at 1:52 PM, Modulok wrote: >> You might look into the third party module, 'BeautifulSoup'. It's designed to >> help you interrogate markup (even poor markup), extracting nuggets of data >> based >> on various criteria. > > lxml is also work looking into which provides similar functionality. For especially broken markup you might even consider version 3.07a of BeautifulSoup. The parser in later versions got slightly less forgiving. Greetings, -- "Control over the use of one's ideas really constitutes control over other people's lives; and it is usually used to make their lives more difficult." - Richard Stallman ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Titles from a web page
On Thu, May 5, 2011 at 1:52 PM, Modulok wrote: > You might look into the third party module, 'BeautifulSoup'. It's designed to > help you interrogate markup (even poor markup), extracting nuggets of data > based > on various criteria. lxml is also work looking into which provides similar functionality. -- -- James Mills -- -- "Problems are solved by method" ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Titles from a web page
You might look into the third party module, 'BeautifulSoup'. It's designed to help you interrogate markup (even poor markup), extracting nuggets of data based on various criteria. -Modulok- On 5/4/11, louis leichtnam wrote: > Hello Everyone, > > I'm trying to write a program that looks in a webpage in find the titles of > a subsection of the page: > > For example you have the list of the title of stories in a newspaper under > the section "World" and you you click on it you have the entire story. > > I want a program that print the title only of this special section of the > page. > > Can you help me out? I tried using regular expression but I keep hitting > walls and I don't know what to do... > > Thank you > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor