We periodically optimize large indexes (100 - 200gb) by calling IndexWriter.optimize(). It takes a heck of a long time, and I'm wondering if a more efficient solution might be the following:
- Create a new empty index on a different filesystem - Set a merge policy for the new index so it puts everything into one giant segment (not sure how to do this off-hand, but I assume it's possible) - Enumerate all documents in the unoptimized index and add them to the new index Having the reads and writes happening on different disks obviously helps. But I don't if merging is inherently a lot more efficient compared to just adding new docs -- if so, that could outweigh the I/O gains. Thanks, Chris