Dear Pravin,

Source code will answer everybody's question, you will not have to explain
it.
Share it. If you don't want to share, remember this is foss mailing list
not a place to advertise your work.

Regards,
Chandan

On Mon, May 21, 2012 at 2:47 PM, pravin joshi <prav...@gmail.com> wrote:

> Answers to your questions Prakash:
> >> Are you using your own crawler/spider?
> Initially used scrapy. Now just a combo of urllib2 and beautifulsoup.
> Beautifulsoup scraps first page, gets url. Then urllib2 gets individual
> pages. Then Beautifulsoup scraps the main text from returned pages.
> >> Is it search by url or keyword?
> Get headlines from main page, find associated url for it and then get main
> content.
> >> how do you extract the news title are you using Named Entity Extraction
> for extraction of some main info?
> Titles already given on eKantipur and Nagarik sites. Use Beautifulsoup to
> extract them. Words for the title can also be generated by doing a
> frequency count of words in the article and getting a combo of the most
> highly used words (except for the stopwords.) One caution though, sometimes
> gives funny result compared to titles from the site itself as titles on the
> site may be based on rare words.
>  >> What is the basis of summary?
> Sentence clustering around the title.
>
> >> Do you use any classification or clustering technique for grouping the
> past similar news?
> find masi distance between one headline and another. I use 0.65 based on
> trial and error.
> >> Have you made any corpus for it?
> Why would you need a corpus for it?
>
> Pravin
>
> Pravin
>
>  --
> FOSS Nepal mailing list: foss-nepal@googlegroups.com
> http://groups.google.com/group/foss-nepal
> To unsubscribe, e-mail: foss-nepal+unsubscr...@googlegroups.com
>
> Mailing List Guidelines:
> http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
> Community website: http://www.fossnepal.org/
>

-- 
FOSS Nepal mailing list: foss-nepal@googlegroups.com
http://groups.google.com/group/foss-nepal
To unsubscribe, e-mail: foss-nepal+unsubscr...@googlegroups.com

Mailing List Guidelines: 
http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines
Community website: http://www.fossnepal.org/

Reply via email to