Hi,

Can anybody point me to some references how to create an ideal set of stop
words? I konw that this is more like a theoretical question but how do
Luceners determine which words shuold be excluded when creating Analyzers
for a new languages? And which technique was used for validation of stop
word lists in current Analyzers?

More specificaly I am interested in situations when there is a need to build
a search engine around specific corpus (for example when we need to search
set of articles related to programming languages only). Given a specific
corpus is there any recommended technique of stop words derivation?

Thanks,
Lukas

Reply via email to