On Sat, 2009-06-27 at 00:00 +0200, Marcus Herou wrote: > We currently have about 90M documents and it is increasing rapidly so > getting into the G+ document range is not going to be too far away.
We've performed fairly extensive tests regarding hardware for searches and some minor tests on hardware for indexing. The tests were primarily with regard to cores, RAM and storage (so no focus on CPU-speed or bus-speed). Our "standard" index was 37GB with 9 million documents, although we did try our hands with running 40 million documents on a single machine. You might want to take a look at some unordered notes and graphs from our tests: http://wiki.statsbiblioteket.dk/summa/Hardware > 2. What is the most important hardware aspect when it comes to searching > documents in my setup ? (result-set is limited to return only the top 10 > matches with page handling) > 2.1 Is it disk read throughput ? (sequential or random-io ?) > 2.2 Is it RAM ? > 2.3 Is is CPU ? For searches, random access is king, so go for Solid State Drives. As there is a lot of crap our there, be sure to read some reviews. The Intel X25 seems like a safe bet right now. While not quite on par with holding the full index in RAM, SSDs comes quite close (744 searches/second vs. 951 searches/second in one of our tests with a standard RAMDirectory). The same test for 2 * 15.000 RPM conventional harddisks in RAID 1 gave us ~200 searches/second. This is of course highly dependent of the index. As opposed to conventional harddisks, SSDs aren't nearly as reliant on RAM for caching. On the other hand, SSDs are capable of serving larger indexes than conventional harddisks and as such, more RAM will be needed for the JVM with the Lucene searcher. Our pick for the 50 million documents, 150-200GB of indexes per machine range was 4 core Intel Xeons, 16GB RAM, 4*64GB SSDs for the index (RAID0ing them does not change the speed significantly, we just do it to get a single volume) and conventional harddisks for storage. Just as Eric Bowman discovered, processing power easily becomes the bottleneck when switching for SSDs. This happened for us too and triggered a great deal of profiling (VisualVM is free, very_ easy to use and helps tremendously with this) to pinpoint where the CPUs used their energy. Regards, Toke Eskildsen --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org