Re: [BangPypers] HTML Parsing in python

Puneet Aggarwal Thu, 10 Sep 2009 07:19:51 -0700

Hi Dhananjay,

My requirement is simple. I need to extract information from a page. But the
pages can be malformed html or it can be any junk html. So the tolerance
required.


Thanks,
Puneet


On Thu, Sep 10, 2009 at 7:33 PM, Dhananjay Nene <[email protected]>wrote:

> Do you require tolerance for non well formed xml / html ? If y, you may
> consider sgmlop http://effbot.org/zone/sgmlop-index.htm
>
>
> On Thu, Sep 10, 2009 at 7:07 PM, Baishampayan Ghose <[email protected]>wrote:
>
>> > Can anyone suggest me a good library for html parsing in python ?
>> > I googled a found few libararies BeautifulSoup, HTMLParser, SGMLParser
>> etc.
>> >
>> > Can anyone suggest me which should I go for from your experience.
>>
>> BeautifulSoup was OK, but now it's broken. Use lxml, it's very good.
>>
>> http://codespeak.net/lxml/
>>
>> Regards,
>> BG
>>
>>
>> --
>> Baishampayan Ghose
>> b.ghose at gmail.com
>> _______________________________________________
>> BangPypers mailing list
>> [email protected]
>> http://mail.python.org/mailman/listinfo/bangpypers
>>
>
>
>
> --
> --------------------------------------------------------
> blog: http://blog.dhananjaynene.com
> twitter: http://twitter.com/dnene
>
> _______________________________________________
> BangPypers mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/bangpypers
>
>

_______________________________________________
BangPypers mailing list
[email protected]
http://mail.python.org/mailman/listinfo/bangpypers

Re: [BangPypers] HTML Parsing in python

Reply via email to