Re: Parsing HTML
Thanks, that's good to know. I'm just a few months into using Python (and weeks with Django), hence the familiarity with that one book and not real-world application just yet. On Jan 28, 9:45 am, Masklinn wrote: > On 2012-01-27, at 23:40 , jondbaker wrote: > > > Chapter 8 of Dive Into Python demonstrates what you're describing > > using sgmllib. > >http://www.diveintopython.net/ > > None of these libraries is very good at parsing "real-world" (broken) HTML > though, for that you'd better go with html5lib, lxml.html or BeautifulSoup > (in decreasing order of recommendation, lxml.html is probably the fastest but > I don't think it implements the HTML5 parsing rules) -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Parsing HTML
On 2012-01-27, at 23:40 , jondbaker wrote: > Chapter 8 of Dive Into Python demonstrates what you're describing > using sgmllib. > http://www.diveintopython.net/ None of these libraries is very good at parsing "real-world" (broken) HTML though, for that you'd better go with html5lib, lxml.html or BeautifulSoup (in decreasing order of recommendation, lxml.html is probably the fastest but I don't think it implements the HTML5 parsing rules) -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Parsing HTML
Chapter 8 of Dive Into Python demonstrates what you're describing using sgmllib. http://www.diveintopython.net/ On Jan 27, 3:31 pm, Dennis Lee Bieber wrote: > On Fri, 27 Jan 2012 13:35:42 +0700, ddtopgun wrote: > >i'am new to django and i want to try get the content of HTML. > >can help me how to get the content of html. > > > > Django is meant to generate HTML pages, not parse HTML content. > > >f=urllib.request.urlopen("http://site_name.com";) > >s=f.read() > >f.close() > > >but the code is display all code html. i want to just take the contents > >of tag html. > > You'll have to do better to define "contents". Only stuff inside > tags (and you then may have to worry about old HTML that doesn't > using closing tags)? Is an image reference ( src="somefile.name"> ) content or only the text between the tags? > > If the HTML is well-formed, you might be able to use ElementTree to > traverse the nodes. Or define callbacks for HTMLParser or htmllib (see > section 19 [for Python 2.7]: Structured Markup Processing in the > Standard Library reference manual) to capture the portion in which you > are interested. > . > -- > Wulfraed Dennis Lee Bieber AF6VN > wlfr...@ix.netcom.com HTTP://wlfraed.home.netcom.com/ -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Parsing HTML
i'am new to django and i want to try get the content of HTML. can help me how to get the content of html. f=urllib.request.urlopen("http://site_name.com";) s=f.read() f.close() but the code is display all code html. i want to just take the contents of tag html. thanks. -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.