I'd like to try some experiments to see if I can improve search
performance by changing analysis (e.g. adding/removing word bigrams or
commongrams), or by changing how I map my source records into Lucene
documents. The problem is that my index currently is about 1TB in size
and takes about 2-3 weeks to build, so if I have to rebuild the entire
index in order to test each potential improvement, then I'm going to
be waiting around a lot.

One option is to test potential performance improvements by building
indexes not for the full dataset, but rather for, say, a 1% sample of
the full dataset. (That is, I'll just index 1% of the source records.)
I would build one small control index, and then n small test indexes,
one for each intervention I wish to try. The hope would be that, if an
indexing intervention significantly improves performance for the small
indexes, then it would also significantly improve performance of the
full dataset. (Similarly, you'd hope that if an intervention *didn't*
significantly improve performance on the small indexes, then it would
*not* significantly improve performance of the full dataset.) This
would allow me to quickly accept and reject interventions (as least
provisionally), and only try applying the most obviously promising
ones to the full dataset.

Any thoughts on how naive this is? Does it sound more like a way to
save time, or like a way to waste time misleading myself?

Cheers,
Chris

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to