Dear Pravin, Source code will answer everybody's question, you will not have to explain it. Share it. If you don't want to share, remember this is foss mailing list not a place to advertise your work.
Regards, Chandan On Mon, May 21, 2012 at 2:47 PM, pravin joshi <prav...@gmail.com> wrote: > Answers to your questions Prakash: > >> Are you using your own crawler/spider? > Initially used scrapy. Now just a combo of urllib2 and beautifulsoup. > Beautifulsoup scraps first page, gets url. Then urllib2 gets individual > pages. Then Beautifulsoup scraps the main text from returned pages. > >> Is it search by url or keyword? > Get headlines from main page, find associated url for it and then get main > content. > >> how do you extract the news title are you using Named Entity Extraction > for extraction of some main info? > Titles already given on eKantipur and Nagarik sites. Use Beautifulsoup to > extract them. Words for the title can also be generated by doing a > frequency count of words in the article and getting a combo of the most > highly used words (except for the stopwords.) One caution though, sometimes > gives funny result compared to titles from the site itself as titles on the > site may be based on rare words. > >> What is the basis of summary? > Sentence clustering around the title. > > >> Do you use any classification or clustering technique for grouping the > past similar news? > find masi distance between one headline and another. I use 0.65 based on > trial and error. > >> Have you made any corpus for it? > Why would you need a corpus for it? > > Pravin > > Pravin > > -- > FOSS Nepal mailing list: foss-nepal@googlegroups.com > http://groups.google.com/group/foss-nepal > To unsubscribe, e-mail: foss-nepal+unsubscr...@googlegroups.com > > Mailing List Guidelines: > http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines > Community website: http://www.fossnepal.org/ > -- FOSS Nepal mailing list: foss-nepal@googlegroups.com http://groups.google.com/group/foss-nepal To unsubscribe, e-mail: foss-nepal+unsubscr...@googlegroups.com Mailing List Guidelines: http://wiki.fossnepal.org/index.php?title=Mailing_List_Guidelines Community website: http://www.fossnepal.org/