On my (incomplete) spider index, the index file for the word "the" (it indexes no other words) is 17MB. This seems rather large. It might make sense to have the spider not even bother creating an index on a handful of very common words (the, be, to, of, and, a, in, I, etc). Of course, this presents the occasional difficulty: http://bash.org/?514353 I think I'm in favor of not indexing common words even so.
Also, on a related note, the index splitting policy should be a bit more sophisticated: in an attempt to fit within the max index size as configured, it split all the way down to index_8fc42.xml. As a result, the file index_8fc4b.xml sits all by itself at 3KiB. It contains the two words "vergessene" and "txjmnsm". I suspect it would have reliability issues should anyone actually want to search either of those. It would make more sense to have all of index_8fc4 in one file, since it would be only trivially larger. (I have a patch that I thought did that, but it has a bug; I'll test once my indexwriter is finished writing, since I don't want to interrupt it by reloading the plugin.) Evan Daniel
