My shneckel: 1. Have a simple cull list (take the 5 minutes to write it, and it will do 80% of the work)2. Use TF/IDF
On Mon, Jul 13, 2009 at 7:02 PM, Refael <[email protected]> wrote: > > I've run the data trough Whoosh, and now the hardest part is to cull > the words. > For example these are the top 10 word counts: > (u'django', 15051), > (u'have', 4066), > (u'your', 3770), > (u'us', 3311), > (u'python', 2738), > (u'some', 2713), > (u'site', 2501), > (u'code', 2359), > (u'like', 2335), > (u'project', 2327), > > Any ideas how to sort out relevant tags? > > > > On Jun 25, 4:36 pm, benny daon <[email protected]> wrote: > > Hi all,I've got a project going with the aim of improving > djangoproject.com. > > So far I've forked the original code, cleaned it up, added buildout so > > installation will be a breeze, and added django-south so we can easily > > upgrade the database. > > Jacob KM sent me a link to a dump of the current database which I > included > > in the migration script so the code pulls the dump and use it to create > the > > database and add all the rows. There are almost 5000 rows in the model, > > pointing to django related posts. The next step is to extract common tags > > from the title and summary fields of the FeedItem. > > A friend recommended I use Solr or Lucene for this job which makes sense. > My > > issue is that I never used them before. If you know what needs to be done > > and have some time, please assign this ticket - > http://bitbucket.org/daonb/django-website/issue/3/- to yourself, fork the > > code, do it, and send me a 'pull request'. > > > > Thanks, > > > > Benny. > > > > BTW - there's much more to do in this project. Please feel free to open > > tickets with suggestions/bugs or better yet - send code. Jacob said he > will > > use it in the live site. > > > -- Imri Goldberg -------------------------------------- www.algorithm.co.il/blogs/ -------------------------------------- -- insert signature here ---- --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "PyWeb-IL" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/pyweb-il?hl=en -~----------~----~----~----~------~----~------~--~---
_______________________________________________ Python-il mailing list [email protected] http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
