Hello,
does anyone know of good stopword lists for use with Lucene? I'm interested in English and German lists.
What does mean ``good''? It depends on your corpus IMHO. The best way, how one can get a ``good'' stop-list, is an analysis that's based on idf. Thus, index your documents, list all the terms with low idf out, save them in a file and use them in next indexing round.
Just a thought...
-g-
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]