A couple of questions about term frequencies and stemming:

- What's the best way to get the most common unstemmed form of a Porter-stemmed word from the index? For example given the stem 'walk', find that 'walking' is the most common full word in the index.

- Is there a way to get a list of all the terms in the index (or maybe just the top n) ordered by descending frequency of usage? I imagine it's related to docFreq, but can't see how to get a list of terms in all documents.

I'm using PyLucene and Solr, so if there are easy solutions in either of those that would be ideal.

Thanks,
alf.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to