Re: Parsing HTML

2012-01-28 Thread jondbaker
Thanks, that's good to know. I'm just a few months into using Python
(and weeks with Django), hence the familiarity with that one book and
not real-world application just yet.

On Jan 28, 9:45 am, Masklinn  wrote:
> On 2012-01-27, at 23:40 , jondbaker wrote:
>
> > Chapter 8 of Dive Into Python demonstrates what you're describing
> > using sgmllib.
> >http://www.diveintopython.net/
>
> None of these libraries is very good at parsing "real-world" (broken) HTML 
> though, for that you'd better go with html5lib, lxml.html or BeautifulSoup 
> (in decreasing order of recommendation, lxml.html is probably the fastest but 
> I don't think it implements the HTML5 parsing rules)

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Parsing HTML

2012-01-28 Thread Masklinn
On 2012-01-27, at 23:40 , jondbaker wrote:
> Chapter 8 of Dive Into Python demonstrates what you're describing
> using sgmllib.
> http://www.diveintopython.net/

None of these libraries is very good at parsing "real-world" (broken) HTML 
though, for that you'd better go with html5lib, lxml.html or BeautifulSoup (in 
decreasing order of recommendation, lxml.html is probably the fastest but I 
don't think it implements the HTML5 parsing rules)

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: Parsing HTML

2012-01-28 Thread jondbaker
Chapter 8 of Dive Into Python demonstrates what you're describing
using sgmllib.
http://www.diveintopython.net/

On Jan 27, 3:31 pm, Dennis Lee Bieber  wrote:
> On Fri, 27 Jan 2012 13:35:42 +0700, ddtopgun  wrote:
> >i'am new to django and i want to try get the content of HTML.
> >can help me how to get the content of html.
>
>         
>
>         Django is meant to generate HTML pages, not parse HTML content.
>
> >f=urllib.request.urlopen("http://site_name.com";)
> >s=f.read()
> >f.close()
>
> >but the code is display all code html. i want to just take the contents
> >of tag html.
>
>         You'll have to do better to define "contents". Only stuff inside
>  tags (and you then may have to worry about old HTML that doesn't
> using closing  tags)? Is an image reference (  src="somefile.name"> ) content or only the text between the tags?
>
>         If the HTML is well-formed, you might be able to use ElementTree to
> traverse the nodes. Or define callbacks for HTMLParser or htmllib (see
> section 19 [for Python 2.7]: Structured Markup Processing in the
> Standard Library reference manual) to capture the portion in which you
> are interested.
> .
> --
>         Wulfraed                 Dennis Lee Bieber         AF6VN
>         wlfr...@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Parsing HTML

2012-01-27 Thread ddtopgun

i'am new to django and i want to try get the content of HTML.
can help me how to get the content of html.

f=urllib.request.urlopen("http://site_name.com";)
s=f.read()
f.close()

but the code is display all code html. i want to just take the contents 
of tag html.


thanks.

--
You received this message because you are subscribed to the Google Groups "Django 
users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.