There is a handy class in contrib/misc.../ that will show you the most frequent terms in an index. Handy dandy.
Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share ----- Original Message ---- From: Lukas Vlcek <[EMAIL PROTECTED]> To: [email protected] Sent: Thursday, May 10, 2007 2:39:35 PM Subject: Stop words (how to create ideal set of stop words?) Hi, Can anybody point me to some references how to create an ideal set of stop words? I konw that this is more like a theoretical question but how do Luceners determine which words shuold be excluded when creating Analyzers for a new languages? And which technique was used for validation of stop word lists in current Analyzers? More specificaly I am interested in situations when there is a need to build a search engine around specific corpus (for example when we need to search set of articles related to programming languages only). Given a specific corpus is there any recommended technique of stop words derivation? Thanks, Lukas --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
