Not sure, you might want to ask on Nutch. From a strict language standpoint, the notion of a stopword in my mind is a bit dubious. If the word really has no meaning, then why does the language have it to begin with? In a search context, it has been treated as of minimal use in the early days mostly because of space and memory considerations. Now a days, as we get more sophisticated in our search capabilities, I think it can be useful for doing better phrase matching, etc. as well as more advanced NLP search. Now it seems like the general response is disk is cheap, why throw away information?
To limit writing on disk, to simplify merge ?

I don't know the ratio of stop word in current texts.

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to