My shneckel:
1. Have a simple cull list (take the 5 minutes to write it, and it will do
80% of the work)2. Use TF/IDF

On Mon, Jul 13, 2009 at 7:02 PM, Refael <[email protected]> wrote:

>
> I've run the data trough Whoosh, and now the hardest part is to cull
> the words.
> For example these are the top 10 word counts:
> (u'django', 15051),
> (u'have', 4066),
> (u'your', 3770),
> (u'us', 3311),
> (u'python', 2738),
> (u'some', 2713),
> (u'site', 2501),
> (u'code', 2359),
> (u'like', 2335),
> (u'project', 2327),
>
> Any ideas how to sort out relevant tags?
>
>
>
> On Jun 25, 4:36 pm, benny daon <[email protected]> wrote:
> > Hi all,I've got a project going with the aim of improving
> djangoproject.com.
> > So far I've forked the original code, cleaned it up, added buildout so
> > installation will be a breeze, and added django-south so we can easily
> > upgrade the database.
> > Jacob KM sent me a link to a dump of the current database which I
> included
> > in the migration script so the code pulls the dump and use it to create
> the
> > database and add all the rows. There are almost 5000 rows in the model,
> > pointing to django related posts. The next step is to extract common tags
> > from  the title and summary fields of the FeedItem.
> > A friend recommended I use Solr or Lucene for this job which makes sense.
> My
> > issue is that I never used them before. If you know what needs to be done
> > and have some time, please assign this ticket -
> http://bitbucket.org/daonb/django-website/issue/3/- to yourself, fork the
> > code, do it, and send me a 'pull request'.
> >
> > Thanks,
> >
> > Benny.
> >
> > BTW - there's much more to do in this project. Please feel free to open
> > tickets with suggestions/bugs or better yet - send code. Jacob said he
> will
> > use it in the live site.
> >
>


-- 
Imri Goldberg
--------------------------------------
www.algorithm.co.il/blogs/
--------------------------------------
-- insert signature here ----

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"PyWeb-IL" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pyweb-il?hl=en
-~----------~----~----~----~------~----~------~--~---

_______________________________________________
Python-il mailing list
[email protected]
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il

לענות