[freenet-dev] Should the spider ignore common words?

Mike Bush Wed, 10 Jun 2009 11:49:07 +0100

XMLLibrarian doesn't currently support searching for phrases or rating
relevance of results based on proximity so I don't think common words
could be of any use in searches now.


Also, I'm not sure but I think the current index doesn't include words
under 4 letters at all.



2009/6/10 Matthew Toseland <toad at amphibian.dyndns.org>:
> On Wednesday 10 June 2009 06:54:03 Daniel Cheng wrote:
>> On Wed, Jun 10, 2009 at 12:02 PM, Evan Daniel<evanbd at gmail.com> wrote:
>> > On my (incomplete) spider index, the index file for the word "the" (it
>> > indexes no other words) is 17MB. ?This seems rather large. ?It might
>> > make sense to have the spider not even bother creating an index on a
>> > handful of very common words (the, be, to, of, and, a, in, I, etc).
>> > Of course, this presents the occasional difficulty:
>> > http://bash.org/?514353 ?I think I'm in favor of not indexing common
>> > words even so.
>>
>> Yes, it should ignore common words.
>> This is called "stopword" in search engine termology.
>
> How do you propose to implement a search for "doctor who" if "who" is a 
> stopword?
>
> _______________________________________________
> Devl mailing list
> Devl at freenetproject.org
> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
>

[freenet-dev] Should the spider ignore common words?

Reply via email to