Ulrich Mayring wrote:

Hello,

does anyone know of good stopword lists for use with Lucene? I'm interested in English and German lists.

What does mean ``good''? It depends on your corpus IMHO. The best way, how one can get a ``good'' stop-list, is an analysis that's based on idf. Thus, index your documents, list all the terms with low idf out, save them in a file and use them in next indexing round.


Just a thought...

-g-



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to