Toke, this is *super* juicy information, very useful and educational. Please do put this on the Wiki. There doesn't seem to be a benchmarking page on the Wiki yet, so I suggest you go to http://wiki.apache.org/lucene-java/LuceneBenchmarks, create that page, and put everything you want and can share there.
Thanks! Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- From: Toke Eskildsen <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, March 13, 2008 7:03:44 AM Subject: Solid State Drives vs. RAMDirectory Time for another dose of inspiration for investigating Solid State Drives. And no, I don't get percentages from the chip manufacturers :-) This time I'll argue that there's little gain in using a RAMDirectory over SSDs, when performing searches. At least for our setting. We've taken our production index of about 10 million documents / 37GB and reduced it to 14GB by removing documents uniformly across the index. A test with fairly simple searches were performed, using logged queries from our production system (see the thread "Multiple Searchers" on this mail list for details) and extracting the content of a stored field for the first 20 hits for each search. On a dual-core Xeon machine with 24GB of RAM, the full index can be loaded into RAM with a RAMDirectory. The following is the average speed over 340.000 queries. In the log names, t2 signifies 2 threads with a shared searcher, t2u signifies 2 threads with separate searchers. metis_RAM_24GB_i14_v23_t1_l23.log 530.0 q/sec metis_RAM_24GB_i14_v23_t2_l23.log 888.2 q/sec metis_RAM_24GB_i14_v23_t2u_l23.log 983.9 q/sec metis_RAM_24GB_i14_v23_t3_l23.log 843.1 q/sec metis_RAM_24GB_i14_v23_t3u_l23.log 996.1 q/sec metis_RAM_24GB_i14_v23_t4_l23.log 869.8 q/sec metis_RAM_24GB_i14_v23_t4u_l23.log 943.4 q/sec As can be seen, the best performing configuration was 3 threads with separate searchers. The time for loading the index into RAM was ignored. Now for the interesting part: Reducing the amount of available RAM to 3GB and using SSDs instead. metis_MTRONSSD_RAID0_3GB_i14_v23_t1_l23.log 433.7 q/sec metis_MTRONSSD_RAID0_3GB_i14_v23_t2_l23.log 573.4 q/sec metis_MTRONSSD_RAID0_3GB_i14_v23_t2u_l23.log 783.4 q/sec metis_MTRONSSD_RAID0_3GB_i14_v23_t3_l23.log 459.7 q/sec metis_MTRONSSD_RAID0_3GB_i14_v23_t3u_l23.log 808.5 q/sec metis_MTRONSSD_RAID0_3GB_i14_v23_t4_l23.log 455.3 q/sec metis_MTRONSSD_RAID0_3GB_i14_v23_t4u_l23.log 809.0 q/sec metis_MTRONSSD_RAID0_3GB_i14_v23_t5_l23.log 454.4 q/sec In comparison, the same test with 3GB of RAM on 15.000 RPM harddisks in RAID 1 gave these numbers: metis_15000RPM_RAID1_3GB_i14_v23_t1_l23.log 176.6 q/sec metis_15000RPM_RAID1_3GB_i14_v23_t2_l23.log 188.6 q/sec metis_15000RPM_RAID1_3GB_i14_v23_t2u_l23.log 247.1 q/sec metis_15000RPM_RAID1_3GB_i14_v23_t3_l23.log 178.4 q/sec metis_15000RPM_RAID1_3GB_i14_v23_t3u_l23.log 276.1 q/sec metis_15000RPM_RAID1_3GB_i14_v23_t4_l23.log 177.8 q/sec metis_15000RPM_RAID1_3GB_i14_v23_t4u_l23.log 259.3 q/sec metis_15000RPM_RAID1_3GB_i14_v23_t5_l23.log 178.5 q/sec SSDs does not equal RAMDirectory in speed for this setup, but 81% is not bad, especially not when compared to the 28% for conventional harddisks. Performing the same tests with 8GB of available RAM on the machine gave the following results: metis_MTRONSSD_RAID0_8GB_i14_v23_t1_l23.log 431.9 q/sec metis_MTRONSSD_RAID0_8GB_i14_v23_t2_l23.log 594.3 q/sec metis_MTRONSSD_RAID0_8GB_i14_v23_t2u_l23.log 807.7 q/sec metis_MTRONSSD_RAID0_8GB_i14_v23_t3_l23.log 472.3 q/sec metis_MTRONSSD_RAID0_8GB_i14_v23_t3u_l23.log 817.6 q/sec metis_MTRONSSD_RAID0_8GB_i14_v23_t4_l23.log 464.4 q/sec metis_MTRONSSD_RAID0_8GB_i14_v23_t4u_l23.log 828.8 q/sec metis_MTRONSSD_RAID0_8GB_i14_v23_t5_l23.log 471.2 q/sec metis_15000RPM_RAID1_8GB_i14_v23_t1_l23.log 199.4 q/sec metis_15000RPM_RAID1_8GB_i14_v23_t2_l23.log 220.4 q/sec metis_15000RPM_RAID1_8GB_i14_v23_t2u_l23.log 312.4 q/sec metis_15000RPM_RAID1_8GB_i14_v23_t3_l23.log 203.8 q/sec metis_15000RPM_RAID1_8GB_i14_v23_t3u_l23.log 370.9 q/sec metis_15000RPM_RAID1_8GB_i14_v23_t4_l23.log 203.1 q/sec metis_15000RPM_RAID1_8GB_i14_v23_t4u_l23.log 408.1 q/sec metis_15000RPM_RAID1_8GB_i14_v23_t5_l23.log 202.5 q/sec Switching to 12GB... metis_MTRONSSD_RAID0_12GB_i14_v23_t1_l23.log 438.8 q/sec metis_MTRONSSD_RAID0_12GB_i14_v23_t2_l23.log 587.8 q/sec metis_MTRONSSD_RAID0_12GB_i14_v23_t2u_l23.log 819.9 q/sec metis_MTRONSSD_RAID0_12GB_i14_v23_t3_l23.log 476.4 q/sec metis_MTRONSSD_RAID0_12GB_i14_v23_t3u_l23.log 833.7 q/sec metis_MTRONSSD_RAID0_12GB_i14_v23_t4_l23.log 465.4 q/sec metis_MTRONSSD_RAID0_12GB_i14_v23_t4u_l23.log 835.2 q/sec metis_MTRONSSD_RAID0_12GB_i14_v23_t5_l23.log 467.1 q/sec metis_15000RPM_RAID1_12GB_i14_v23_t1_l23.log 198.6 q/sec metis_15000RPM_RAID1_12GB_i14_v23_t2_l23.log 219.1 q/sec metis_15000RPM_RAID1_12GB_i14_v23_t2u_l23.log 309.4 q/sec metis_15000RPM_RAID1_12GB_i14_v23_t3_l23.log 204.1 q/sec metis_15000RPM_RAID1_12GB_i14_v23_t3u_l23.log 362.4 q/sec metis_15000RPM_RAID1_12GB_i14_v23_t4_l23.log 202.3 q/sec metis_15000RPM_RAID1_12GB_i14_v23_t4u_l23.log 406.6 q/sec metis_15000RPM_RAID1_12GB_i14_v23_t5_l23.log 201.2 q/sec Extracting the fastest configurations for the different RAM amounts: metis_RAM_24GB_i14_v23_t3u_l23.log 996.1 q/sec 3GB of RAM: metis_MTRONSSD_RAID0_3GB_i14_v23_t4u_l23.log 809.0 q/sec metis_15000RPM_RAID1_3GB_i14_v23_t3u_l23.log 276.1 q/sec 8GB of RAM: metis_MTRONSSD_RAID0_8GB_i14_v23_t4u_l23.log 828.8 q/sec metis_15000RPM_RAID1_8GB_i14_v23_t4u_l23.log 408.1 q/sec 12GB of RAM: metis_MTRONSSD_RAID0_12GB_i14_v23_t4u_l23.log 835.2 q/sec metis_15000RPM_RAID1_12GB_i14_v23_t4u_l23.log 406.6 q/sec As can be seen, the SSDs benefit somewhat from running at 8GB, while the harddrives benefit a lot. Plotting a graph with queries/second over time shows clearly that the performance of the harddrives relative to the RAM speed is steadily climbing, while the SSD speed is not (or at least very little). This tells me that the speed of SSD-stored indexes is fairly independent of the amount of RAM available for cache. Upping the amount to 12GB doesn't change much. Clearly 8GB is "enough" for our 14GB index with our queries. With the fear of making all this unclear, let's try and ignore the first 5000 queries and cut off the statistics after 50,000 queries. This mimics a setting with warm-up and a not-so-stale index that gets replaced once in a while. Extracting the fastest configurations for the different RAM amounts gives us: RAMDirectory (24GB of RAM): metis_RAM_24GB_i14_v23_t2u_l23.log 867.3 q/sec 3GB of RAM: metis_MTRONSSD_RAID0_3GB_i14_v23_t3u_l23.log 663.2 q/sec metis_15000RPM_RAID1_3GB_i14_v23_t4u_l23.log 163.4 q/sec 8GB of RAM: metis_MTRONSSD_RAID0_8GB_i14_v23_t4u_l23.log 653.6 q/sec metis_15000RPM_RAID1_8GB_i14_v23_t4u_l23.log 163.4 q/sec 12GB of RAM: metis_MTRONSSD_RAID0_12GB_i14_v23_t3u_l23.log 653.6 q/sec metis_15000RPM_RAID1_12GB_i14_v23_t4u_l23.log 163.4 q/sec Yes, the 3*163.4 is a funny coincidence, I double-checked and looked at the graphs: Up till about 60,000 queries, the graphs are virtually identical for the 15000RPM, then the one for 3GB RAM stabilizes and the ones for 8 and 12GB continue being virtually identical and climbing. For SSDs, the graph for 3GB is a bit higher than the other ones until about 50-60.000 queries, then a bit lower for the rest. For this scenario, the speed of SSDs compared to RAMDirectory drops to 75-76% while the speed of harddisks drops to 19%, fairly independent of RAM. In other words: Upping the amount of RAM does not help us when the index is replaced before we pass the 50.000 queries. Another observation: The faster we change our index, the better SSD looks compared to harddisks. On the flip side - for long run-times with unchanged index, harddisks seems the better choice, at least from an economically point of view. Grand conclusion? Getting 3/4 of the performance of RAMDirectory by using SSDs on a machine with much less RAM seems like a good deal if high performance / machine is needed. Remember, this is all searches with an optimized index. This is on the corpus from the Danish State and University Library and should be seen as nothing else than inspiration. Still pending is experiments with updating large indexes on SSDs. My guess is that there won't be anywhere near the same speed-increase as for the pure searches. It'll have to wait a bit though, as it requires Real Work, as opposed to just starting a script. NB: I'd like to post my findings on the Lucene wiki, but I have been unable to locate the appropriate page. Could someone please point me in the right direction? --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]