Yes, this makes sense to me. I think I'll just keep all words, including stop words, and if performance ever becomes an issue, I'll look at bigrams again. But I think there's a good chance that I'll never see significant impact either way.

Thanks guys!

Grant Ingersoll wrote:
Yep, still good reasons like I said, but becoming less important as the hardware, etc. gets faster and cheaper, IMO, especially in the context of more advanced search capabilities.

On Mar 3, 2008, at 10:49 AM, Mathieu Lecarme wrote:


Not sure, you might want to ask on Nutch. From a strict language standpoint, the notion of a stopword in my mind is a bit dubious. If the word really has no meaning, then why does the language have it to begin with? In a search context, it has been treated as of minimal use in the early days mostly because of space and memory considerations. Now a days, as we get more sophisticated in our search capabilities, I think it can be useful for doing better phrase matching, etc. as well as more advanced NLP search. Now it seems like the general response is disk is cheap, why throw away information?
To limit writing on disk, to simplify merge ?

I don't know the ratio of stop word in current texts.

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to