Rama Vadakattu wrote:
> Is there any python library to solve the below problem?
>
> FOr the below URL :
> --------------------------
> http://tinyurl.com/dzcwbg
>
> Summarized text is :
> ---------------------------
> By Roy Mark With sales plummeting and its smart phones failing to woo
> new customers, Sony Ericsson follows its warning that first quarter
> sales will be disappointing with the announcement that Najmi Jarwala,
> president of Sony Ericsson USA and head of ...
>
> ~~~~~~~~~~~~~~
> Usually summarized text is a 2 to 3 line description of the URL which
> we usually obtain by fetching that html page , examining the content
> and figuring out short description from that html markup.
> ~~~~~~~~~~~~~
>
> Are there any python libraries which give summarized text for a given
> url ?
BeautifulSoup makes it easy to access parts of a web page.
import urllib2
from BeautifulSoup import BeautifulSoup
data = urllib2.urlopen("http://tinyurl.com/dzcwbg").read()
bs = BeautifulSoup(data)
print bs.find("meta", dict(name="description"))["content"]
> It is ok even if the library just gives intial two lines of text
> from the given URL Instead of summarization.
The problem is how you identify the summary. Different web sites will put it
in different places using different markup.
Peter
--
http://mail.python.org/mailman/listinfo/python-list