RE: Hardware Specs Question
From: Dennis Gearon [gear...@sbcglobal.net]: I wouldn't have thought that CPU was a big deal with the speed/cores of CPU's continuously growing according to Moore's law and the change in Disk Speed barely changine 50% in 15 years. Must have a lot to do with caching. I am not sure I follow you? When seek times are suddenly a 100 times faster (slight exaggeration, but only slight) why wouldn't it cause the bottleneck to move? Yes, CPU's has increased tremendously in speed, but so has our processing needs. Lucene (and by extension Solr) was made with long seek times in mind and looking at the current marked, it makes sense to continue supporting this for some years. If the software was optimized for sub-ms seek times, it might lower CPU usage or at the very least lower the need for caching (internal as well as external). What size indexes are you working with? Around 40GB for our primary index. 9 million documents, AFAIR. Are you saying you can get the whole thing in memory? No. For that test we had to reduce the index to 14GB on our 24GB test machine with Lucene's RAMDirectory. In order to avoid the everything is cached and thus everything is the same speed-problem, we lowered the amount of available memory to 3GB when we measured harddisk SSD speed against the 14GB index. The Cliff notes is harddisks 200 raw queries/second, SSDs 774 q/sec and RAM 952 q/s, but as always it is not so simple to extract a single number for performance when warm up and caching comes into play. Let me be quick to add that this was with Lucene + custom code, not with Solr. That would negate almost any disk benefits. That depends very much on your setup. It takes a fair amount of time to copy 14GB from storage into RAM so an index fully in RAM would either be very static or require some logic to handle updates and sync data in case of outages. I know there's some interesting work being done with this, but as SSDs are a lot cheaper than RAM and fulfill our needs, it is not something we pursue.
RE: Hardware Specs Question
Very interesting stuff! I'm pretty sure everything will be non hard disk for intense applications FRONT line use by 10 years or sooner, with hard disk as backup/boot up. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Mon, 9/6/10, Toke Eskildsen t...@statsbiblioteket.dk wrote: From: Toke Eskildsen t...@statsbiblioteket.dk Subject: RE: Hardware Specs Question To: Dennis Gearon gear...@sbcglobal.net, solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Monday, September 6, 2010, 12:35 PM From: Dennis Gearon [gear...@sbcglobal.net]: I wouldn't have thought that CPU was a big deal with the speed/cores of CPU's continuously growing according to Moore's law and the change in Disk Speed barely changine 50% in 15 years. Must have a lot to do with caching. I am not sure I follow you? When seek times are suddenly a 100 times faster (slight exaggeration, but only slight) why wouldn't it cause the bottleneck to move? Yes, CPU's has increased tremendously in speed, but so has our processing needs. Lucene (and by extension Solr) was made with long seek times in mind and looking at the current marked, it makes sense to continue supporting this for some years. If the software was optimized for sub-ms seek times, it might lower CPU usage or at the very least lower the need for caching (internal as well as external). What size indexes are you working with? Around 40GB for our primary index. 9 million documents, AFAIR. Are you saying you can get the whole thing in memory? No. For that test we had to reduce the index to 14GB on our 24GB test machine with Lucene's RAMDirectory. In order to avoid the everything is cached and thus everything is the same speed-problem, we lowered the amount of available memory to 3GB when we measured harddisk SSD speed against the 14GB index. The Cliff notes is harddisks 200 raw queries/second, SSDs 774 q/sec and RAM 952 q/s, but as always it is not so simple to extract a single number for performance when warm up and caching comes into play. Let me be quick to add that this was with Lucene + custom code, not with Solr. That would negate almost any disk benefits. That depends very much on your setup. It takes a fair amount of time to copy 14GB from storage into RAM so an index fully in RAM would either be very static or require some logic to handle updates and sync data in case of outages. I know there's some interesting work being done with this, but as SSDs are a lot cheaper than RAM and fulfill our needs, it is not something we pursue.
Re: Hardware Specs Question
If you really want to see performance, try external DRAM disks. Whew! 800X faster than a disk. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 9/2/10, Shawn Heisey s...@elyograg.org wrote: From: Shawn Heisey s...@elyograg.org Subject: Re: Hardware Specs Question To: solr-user@lucene.apache.org Date: Thursday, September 2, 2010, 6:45 PM On 9/2/2010 2:54 AM, Toke Eskildsen wrote: We've done a fair amount of experimentation in this area (1997-era SSDs vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in RAID 0). The harddisk setups never stood a chance for searching. With current SSD's being faster than harddisks for writes too, they'll also be better for index building, although not as impressive as for searches. Old notes at http://wiki.statsbiblioteket.dk/summa/Hardware How does it compare to six SATA drives in a Dell hardware RAID10? That's what my VM hosts have, which each run three large shards and a couple of supporting systems.
Re: Hardware Specs Question
On Fri, 2010-09-03 at 03:45 +0200, Shawn Heisey wrote: On 9/2/2010 2:54 AM, Toke Eskildsen wrote: We've done a fair amount of experimentation in this area (1997-era SSDs vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in RAID 0). The harddisk setups never stood a chance for searching. With current SSD's being faster than harddisks for writes too, they'll also be better for index building, although not as impressive as for searches. Old notes at http://wiki.statsbiblioteket.dk/summa/Hardware How does it compare to six SATA drives in a Dell hardware RAID10? I'll have to extrapolate a lot here (also known as guessing). You don't mention what kind of harddrives you're using, so let's say 15.000 RPM to err on the high-end side. Compared to the 2 drives @ 15.000 RPM in RAID 1 we've experimented with, the difference is that the striping allows for concurrency when the different reads are on different physical drives (sorry if this is basic, I'm just trying to establish a common understanding here). The chance for 2 concurrent reads to be on different drives with 3 harddrives is 5/6, the chance for 3 concurrent reads is 1/6 and the chance for 3 concurrent reads to be on at least 2 drives is 5/6. For the sake of argument, let's say that the 3 * striping gives us double the concurrency I/O. Taking my old measurements at face value and doubling the numbers for the 15.000 RPM measurements, this would bring six 15.000 RPM SATA 10 drives up to a throughput that is 1/3 - 2/3 of the SSD, depending on how we measure. Some general observations: With long runtimes, the throughput for harddisk rises relative to the SSD as the disk cache gets warmed. If there is frequent index updates with deletions, the SSD gains more ground as it is not nearly as dependent on disk cache as harddisks. With small indexes, the difference between harddisks and SSD is relatively small as the disk cache quickly gets filled. Consequently the difference increases for large indexes. One point to note for RAID is that they do not improve the speed of single searches on a single index: They do not lower the seek time for a single small I/O request and searching on a single index is done with a number of small successive requests. If the performance problem is long search time, RAID does not help (but in combination with sharding or similar it will). If the problem is the number of concurrent searches, RAID helps.
Re: Hardware Specs Question
On Fri, 2010-09-03 at 11:07 +0200, Dennis Gearon wrote: If you really want to see performance, try external DRAM disks. Whew! 800X faster than a disk. As sexy as they are, the DRAM drives does not buy much more extra performance. At least not at the search stage. For searching, SSDs are not that far from holding the index fully in RAM (about 3/4 the speed in our tests but YMMV). The CPU is the bottleneck. That was with Lucene 2.4 so the relative numbers might have changed, but the old lesson still stands: A well balanced system is key.
Re: Hardware Specs Question
well balanced system = Agree. Here we'll start a performance load test this month. I've defined a test criteria of 'qps', 'RTpQ' worse case according to our use case past experience. Our goal is pursuing this criteria adjust hardware system configuration to find a well balanced scalable Solr aritecture. However, the past discussion of this thread has several good suggestion for our test. Thanks to all who provides their experience suggestion. Scott - Original Message - From: Toke Eskildsen t...@statsbiblioteket.dk To: solr-user@lucene.apache.org Sent: Friday, September 03, 2010 6:43 PM Subject: Re: Hardware Specs Question On Fri, 2010-09-03 at 11:07 +0200, Dennis Gearon wrote: If you really want to see performance, try external DRAM disks. Whew! 800X faster than a disk. As sexy as they are, the DRAM drives does not buy much more extra performance. At least not at the search stage. For searching, SSDs are not that far from holding the index fully in RAM (about 3/4 the speed in our tests but YMMV). The CPU is the bottleneck. That was with Lucene 2.4 so the relative numbers might have changed, but the old lesson still stands: A well balanced system is key.
Re: Hardware Specs Question
I wouldn't have thought that CPU was a big deal with the speed/cores of CPU's continuously growing according to Moore's law and the change in Disk Speed barely changine 50% in 15 years. Must have a lot to do with caching. What size indexes are you working with? Are you saying you can get the whole thing in memory? That would negate almost any disk benefits. I'm guessing that keeping shards small enough to fit into memory must be one of the big tricks. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 9/3/10, Toke Eskildsen t...@statsbiblioteket.dk wrote: From: Toke Eskildsen t...@statsbiblioteket.dk Subject: Re: Hardware Specs Question To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Friday, September 3, 2010, 3:43 AM On Fri, 2010-09-03 at 11:07 +0200, Dennis Gearon wrote: If you really want to see performance, try external DRAM disks. Whew! 800X faster than a disk. As sexy as they are, the DRAM drives does not buy much more extra performance. At least not at the search stage. For searching, SSDs are not that far from holding the index fully in RAM (about 3/4 the speed in our tests but YMMV). The CPU is the bottleneck. That was with Lucene 2.4 so the relative numbers might have changed, but the old lesson still stands: A well balanced system is key.
Re: Hardware Specs Question
On 9/3/2010 3:39 AM, Toke Eskildsen wrote: I'll have to extrapolate a lot here (also known as guessing). You don't mention what kind of harddrives you're using, so let's say 15.000 RPM to err on the high-end side. Compared to the 2 drives @ 15.000 RPM in RAID 1 we've experimented with, the difference is that the striping allows for concurrency when the different reads are on different physical drives (sorry if this is basic, I'm just trying to establish a common understanding here). The chance for 2 concurrent reads to be on different drives with 3 harddrives is 5/6, the chance for 3 concurrent reads is 1/6 and the chance for 3 concurrent reads to be on at least 2 drives is 5/6. For the sake of argument, let's say that the 3 * striping gives us double the concurrency I/O. I actually didn't know that there were 15,000 RPM SATA drives until just now when I googled. I knew that Western Digital made some 10,000 RPM, but most SATA drives are 7200. Dell doesn't sell any SATA drives faster than 7200, and the 500GB drives in my servers are 7200. I'm using the maximum 1MB stripe size to increase the likelihood of concurrent reads. Our query rate is quite low (less than 1 per second), so any concurrency that's achieved will be limited to possibly allowing all three VMs on the server to access the disk at the exact same time. With three stripes and two copies of each of those stripes, the chance of that is fair to good. So with all that, I probably only see around a third (and possibly maybe up to half) the performance of SSDs. Thanks!
Re: Hardware Specs Question
On Thu, 2010-09-02 at 03:37 +0200, Lance Norskog wrote: I don't know how much SSD disks cost, but they will certainly cure the disk i/o problem. We've done a fair amount of experimentation in this area (1997-era SSDs vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in RAID 0). The harddisk setups never stood a chance for searching. With current SSD's being faster than harddisks for writes too, they'll also be better for index building, although not as impressive as for searches. Old notes at http://wiki.statsbiblioteket.dk/summa/Hardware With consumer level SSD's, there is more bang-for-the-buck than RAIDing up with high-end harddisks. They should be the first choice when IO is an issue. There are of course opposing views on this issue. Some people think enterprise: Expensive and very reliable systems where consumer hardware is a big no-no. The price point for pro SSDs might make them unfeasible in such a setup. Other go for cheaper setups and handle the reliability issues with redundancy. I'm firmly in the second camp, but it is obviously not an option for all people. A point of concern is writes. Current consumer SSDs uses wear leveling and they can take a lot of punishment (as a rough measurement: The amount of free space times 10.000). They might not be suitable for holding massive databases with thousands of writes/second, but they can surely handle the measly amount of writes required for Lucene index updating and searching. A long story short: Put a quality consumer SSD in each server and be happy.
Re: Hardware Specs Question
On 9/2/2010 2:54 AM, Toke Eskildsen wrote: We've done a fair amount of experimentation in this area (1997-era SSDs vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in RAID 0). The harddisk setups never stood a chance for searching. With current SSD's being faster than harddisks for writes too, they'll also be better for index building, although not as impressive as for searches. Old notes at http://wiki.statsbiblioteket.dk/summa/Hardware How does it compare to six SATA drives in a Dell hardware RAID10? That's what my VM hosts have, which each run three large shards and a couple of supporting systems.
Re: Hardware Specs Question
I was just reading about configuring mass computation grids: hardware writes on 2 striped disks take 10% than writes on a single disk, because you have to wait for the slower disk to finish. So, single disks without RAID are faster. I don't know how much SSD disks cost, but they will certainly cure the disk i/o problem. On Tue, Aug 31, 2010 at 1:35 AM, scott chu (朱炎詹) scott@udngroup.com wrote: In our current lab project, we already built a Chinese newspaper index with 18 millions documents. The index size is around 51GB. So I am very concerned about the memory issue you guys mentioned. I also look up the Hathitrust report on SolrPerformanceData page: http://wiki.apache.org/solr/SolrPerformanceData. They said their main bottleneck is Disk-I/O even they have 10 shards spread over 4 servers. Can you guys give me some helpful suggestion about hardward spec memory configuration on our project? Thanks in advance. Scott - Original Message - From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 31, 2010 1:01 PM Subject: Re: Hardware Specs Question There are synchronization points, which become chokepoints at some number of cores. I don't know where they cause Lucene to top out. Lucene apps are generally disk-bound, not CPU-bound, but yours will be. There are so many variables that it's really not possible to give any numbers. Lance On Mon, Aug 30, 2010 at 8:34 PM, Amit Nithian anith...@gmail.com wrote: Lance, makes sense and I have heard about the long GC times on large heaps but I personally haven't experienced a slowdown but that doesn't mean anything either :-). Agreed that tuning the SOLR caching is the way to go. I haven't followed all the solr/lucene changes but from what I remember there are synchronization points that could be a bottleneck where adding more cores won't help this problem? Or am I completely missing something. Thanks again Amit On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹) scott@udngroup.comwrote: I am also curious as Amit does. Can you make an example about the garbage collection problem you mentioned? - Original Message - From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 31, 2010 9:14 AM Subject: Re: Hardware Specs Question It generally works best to tune the Solr caches and allocate enough RAM to run comfortably. Linux Windows et. al. have their own cache of disk blocks. They use very good algorithms for managing this cache. Also, they do not make long garbage collection passes. On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian anith...@gmail.com wrote: Lance, Thanks for your help. What do you mean by that the OS can keep the index in memory better than Solr? Do you mean that you should use another means to keep the index in memory (i.e. ramdisk)? Is there a generally accepted heap size/index size that you follow? Thanks Amit On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com wrote: The price-performance knee for small servers is 32G ram, 2-6 SATA disks on a raid, 8/16 cores. You can buy these servers and half-fill them, leaving room for expansion. I have not done benchmarks about the max # of processors that can be kept busy during indexing or querying, and the total numbers: QPS, response time averages variability, etc. If your index file size is 8G, and your Java heap is 8G, you will do long garbage collection cycles. The operating system is very good at keeping your index in memory- better than Solr can. Lance On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com wrote: Hi all, I am curious to know get some opinions on at what point having more CPU cores shows diminishing returns in terms of QPS. Our index size is about 8GB and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216. Currently I have the heap to 8GB. We are looking to get more servers to increase capacity and because the warranty is set to expire on our old servers and so I was curious before asking for a certain spec what others run and at what point does having more cores cease to matter? Mainly looking at somewhere between 4-12 cores per server. Thanks! Amit -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com ___b___J_T_f_r_C Checked by AVG - www.avg.com Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10 14:35:00 -- Lance Norskog goks...@gmail.com ___b___J_T_f_r_C Checked by AVG - www.avg.com Version: 9.0.851 / Virus Database: 271.1.1/3103 - Release Date: 08/31/10 02:34:00 -- Lance Norskog goks...@gmail.com
Re: Hardware Specs Question
In our current lab project, we already built a Chinese newspaper index with 18 millions documents. The index size is around 51GB. So I am very concerned about the memory issue you guys mentioned. I also look up the Hathitrust report on SolrPerformanceData page: http://wiki.apache.org/solr/SolrPerformanceData. They said their main bottleneck is Disk-I/O even they have 10 shards spread over 4 servers. Can you guys give me some helpful suggestion about hardward spec memory configuration on our project? Thanks in advance. Scott - Original Message - From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 31, 2010 1:01 PM Subject: Re: Hardware Specs Question There are synchronization points, which become chokepoints at some number of cores. I don't know where they cause Lucene to top out. Lucene apps are generally disk-bound, not CPU-bound, but yours will be. There are so many variables that it's really not possible to give any numbers. Lance On Mon, Aug 30, 2010 at 8:34 PM, Amit Nithian anith...@gmail.com wrote: Lance, makes sense and I have heard about the long GC times on large heaps but I personally haven't experienced a slowdown but that doesn't mean anything either :-). Agreed that tuning the SOLR caching is the way to go. I haven't followed all the solr/lucene changes but from what I remember there are synchronization points that could be a bottleneck where adding more cores won't help this problem? Or am I completely missing something. Thanks again Amit On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹) scott@udngroup.comwrote: I am also curious as Amit does. Can you make an example about the garbage collection problem you mentioned? - Original Message - From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 31, 2010 9:14 AM Subject: Re: Hardware Specs Question It generally works best to tune the Solr caches and allocate enough RAM to run comfortably. Linux Windows et. al. have their own cache of disk blocks. They use very good algorithms for managing this cache. Also, they do not make long garbage collection passes. On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian anith...@gmail.com wrote: Lance, Thanks for your help. What do you mean by that the OS can keep the index in memory better than Solr? Do you mean that you should use another means to keep the index in memory (i.e. ramdisk)? Is there a generally accepted heap size/index size that you follow? Thanks Amit On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com wrote: The price-performance knee for small servers is 32G ram, 2-6 SATA disks on a raid, 8/16 cores. You can buy these servers and half-fill them, leaving room for expansion. I have not done benchmarks about the max # of processors that can be kept busy during indexing or querying, and the total numbers: QPS, response time averages variability, etc. If your index file size is 8G, and your Java heap is 8G, you will do long garbage collection cycles. The operating system is very good at keeping your index in memory- better than Solr can. Lance On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com wrote: Hi all, I am curious to know get some opinions on at what point having more CPU cores shows diminishing returns in terms of QPS. Our index size is about 8GB and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216. Currently I have the heap to 8GB. We are looking to get more servers to increase capacity and because the warranty is set to expire on our old servers and so I was curious before asking for a certain spec what others run and at what point does having more cores cease to matter? Mainly looking at somewhere between 4-12 cores per server. Thanks! Amit -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com ___b___J_T_f_r_C Checked by AVG - www.avg.com Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10 14:35:00 -- Lance Norskog goks...@gmail.com ___b___J_T_f_r_C Checked by AVG - www.avg.com Version: 9.0.851 / Virus Database: 271.1.1/3103 - Release Date: 08/31/10 02:34:00
Hardware Specs Question
Hi all, I am curious to know get some opinions on at what point having more CPU cores shows diminishing returns in terms of QPS. Our index size is about 8GB and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216. Currently I have the heap to 8GB. We are looking to get more servers to increase capacity and because the warranty is set to expire on our old servers and so I was curious before asking for a certain spec what others run and at what point does having more cores cease to matter? Mainly looking at somewhere between 4-12 cores per server. Thanks! Amit
Re: Hardware Specs Question
The price-performance knee for small servers is 32G ram, 2-6 SATA disks on a raid, 8/16 cores. You can buy these servers and half-fill them, leaving room for expansion. I have not done benchmarks about the max # of processors that can be kept busy during indexing or querying, and the total numbers: QPS, response time averages variability, etc. If your index file size is 8G, and your Java heap is 8G, you will do long garbage collection cycles. The operating system is very good at keeping your index in memory- better than Solr can. Lance On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com wrote: Hi all, I am curious to know get some opinions on at what point having more CPU cores shows diminishing returns in terms of QPS. Our index size is about 8GB and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216. Currently I have the heap to 8GB. We are looking to get more servers to increase capacity and because the warranty is set to expire on our old servers and so I was curious before asking for a certain spec what others run and at what point does having more cores cease to matter? Mainly looking at somewhere between 4-12 cores per server. Thanks! Amit -- Lance Norskog goks...@gmail.com
Re: Hardware Specs Question
Lance, Thanks for your help. What do you mean by that the OS can keep the index in memory better than Solr? Do you mean that you should use another means to keep the index in memory (i.e. ramdisk)? Is there a generally accepted heap size/index size that you follow? Thanks Amit On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com wrote: The price-performance knee for small servers is 32G ram, 2-6 SATA disks on a raid, 8/16 cores. You can buy these servers and half-fill them, leaving room for expansion. I have not done benchmarks about the max # of processors that can be kept busy during indexing or querying, and the total numbers: QPS, response time averages variability, etc. If your index file size is 8G, and your Java heap is 8G, you will do long garbage collection cycles. The operating system is very good at keeping your index in memory- better than Solr can. Lance On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com wrote: Hi all, I am curious to know get some opinions on at what point having more CPU cores shows diminishing returns in terms of QPS. Our index size is about 8GB and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216. Currently I have the heap to 8GB. We are looking to get more servers to increase capacity and because the warranty is set to expire on our old servers and so I was curious before asking for a certain spec what others run and at what point does having more cores cease to matter? Mainly looking at somewhere between 4-12 cores per server. Thanks! Amit -- Lance Norskog goks...@gmail.com
Re: Hardware Specs Question
It generally works best to tune the Solr caches and allocate enough RAM to run comfortably. Linux Windows et. al. have their own cache of disk blocks. They use very good algorithms for managing this cache. Also, they do not make long garbage collection passes. On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian anith...@gmail.com wrote: Lance, Thanks for your help. What do you mean by that the OS can keep the index in memory better than Solr? Do you mean that you should use another means to keep the index in memory (i.e. ramdisk)? Is there a generally accepted heap size/index size that you follow? Thanks Amit On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com wrote: The price-performance knee for small servers is 32G ram, 2-6 SATA disks on a raid, 8/16 cores. You can buy these servers and half-fill them, leaving room for expansion. I have not done benchmarks about the max # of processors that can be kept busy during indexing or querying, and the total numbers: QPS, response time averages variability, etc. If your index file size is 8G, and your Java heap is 8G, you will do long garbage collection cycles. The operating system is very good at keeping your index in memory- better than Solr can. Lance On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com wrote: Hi all, I am curious to know get some opinions on at what point having more CPU cores shows diminishing returns in terms of QPS. Our index size is about 8GB and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216. Currently I have the heap to 8GB. We are looking to get more servers to increase capacity and because the warranty is set to expire on our old servers and so I was curious before asking for a certain spec what others run and at what point does having more cores cease to matter? Mainly looking at somewhere between 4-12 cores per server. Thanks! Amit -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com
Re: Hardware Specs Question
I am also curious as Amit does. Can you make an example about the garbage collection problem you mentioned? - Original Message - From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 31, 2010 9:14 AM Subject: Re: Hardware Specs Question It generally works best to tune the Solr caches and allocate enough RAM to run comfortably. Linux Windows et. al. have their own cache of disk blocks. They use very good algorithms for managing this cache. Also, they do not make long garbage collection passes. On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian anith...@gmail.com wrote: Lance, Thanks for your help. What do you mean by that the OS can keep the index in memory better than Solr? Do you mean that you should use another means to keep the index in memory (i.e. ramdisk)? Is there a generally accepted heap size/index size that you follow? Thanks Amit On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com wrote: The price-performance knee for small servers is 32G ram, 2-6 SATA disks on a raid, 8/16 cores. You can buy these servers and half-fill them, leaving room for expansion. I have not done benchmarks about the max # of processors that can be kept busy during indexing or querying, and the total numbers: QPS, response time averages variability, etc. If your index file size is 8G, and your Java heap is 8G, you will do long garbage collection cycles. The operating system is very good at keeping your index in memory- better than Solr can. Lance On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com wrote: Hi all, I am curious to know get some opinions on at what point having more CPU cores shows diminishing returns in terms of QPS. Our index size is about 8GB and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216. Currently I have the heap to 8GB. We are looking to get more servers to increase capacity and because the warranty is set to expire on our old servers and so I was curious before asking for a certain spec what others run and at what point does having more cores cease to matter? Mainly looking at somewhere between 4-12 cores per server. Thanks! Amit -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com ___b___J_T_f_r_C Checked by AVG - www.avg.com Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10 14:35:00
Re: Hardware Specs Question
Lance, makes sense and I have heard about the long GC times on large heaps but I personally haven't experienced a slowdown but that doesn't mean anything either :-). Agreed that tuning the SOLR caching is the way to go. I haven't followed all the solr/lucene changes but from what I remember there are synchronization points that could be a bottleneck where adding more cores won't help this problem? Or am I completely missing something. Thanks again Amit On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹) scott@udngroup.comwrote: I am also curious as Amit does. Can you make an example about the garbage collection problem you mentioned? - Original Message - From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 31, 2010 9:14 AM Subject: Re: Hardware Specs Question It generally works best to tune the Solr caches and allocate enough RAM to run comfortably. Linux Windows et. al. have their own cache of disk blocks. They use very good algorithms for managing this cache. Also, they do not make long garbage collection passes. On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian anith...@gmail.com wrote: Lance, Thanks for your help. What do you mean by that the OS can keep the index in memory better than Solr? Do you mean that you should use another means to keep the index in memory (i.e. ramdisk)? Is there a generally accepted heap size/index size that you follow? Thanks Amit On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com wrote: The price-performance knee for small servers is 32G ram, 2-6 SATA disks on a raid, 8/16 cores. You can buy these servers and half-fill them, leaving room for expansion. I have not done benchmarks about the max # of processors that can be kept busy during indexing or querying, and the total numbers: QPS, response time averages variability, etc. If your index file size is 8G, and your Java heap is 8G, you will do long garbage collection cycles. The operating system is very good at keeping your index in memory- better than Solr can. Lance On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com wrote: Hi all, I am curious to know get some opinions on at what point having more CPU cores shows diminishing returns in terms of QPS. Our index size is about 8GB and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216. Currently I have the heap to 8GB. We are looking to get more servers to increase capacity and because the warranty is set to expire on our old servers and so I was curious before asking for a certain spec what others run and at what point does having more cores cease to matter? Mainly looking at somewhere between 4-12 cores per server. Thanks! Amit -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com ___b___J_T_f_r_C Checked by AVG - www.avg.com Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10 14:35:00
Re: Hardware Specs Question
There are synchronization points, which become chokepoints at some number of cores. I don't know where they cause Lucene to top out. Lucene apps are generally disk-bound, not CPU-bound, but yours will be. There are so many variables that it's really not possible to give any numbers. Lance On Mon, Aug 30, 2010 at 8:34 PM, Amit Nithian anith...@gmail.com wrote: Lance, makes sense and I have heard about the long GC times on large heaps but I personally haven't experienced a slowdown but that doesn't mean anything either :-). Agreed that tuning the SOLR caching is the way to go. I haven't followed all the solr/lucene changes but from what I remember there are synchronization points that could be a bottleneck where adding more cores won't help this problem? Or am I completely missing something. Thanks again Amit On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹) scott@udngroup.comwrote: I am also curious as Amit does. Can you make an example about the garbage collection problem you mentioned? - Original Message - From: Lance Norskog goks...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, August 31, 2010 9:14 AM Subject: Re: Hardware Specs Question It generally works best to tune the Solr caches and allocate enough RAM to run comfortably. Linux Windows et. al. have their own cache of disk blocks. They use very good algorithms for managing this cache. Also, they do not make long garbage collection passes. On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian anith...@gmail.com wrote: Lance, Thanks for your help. What do you mean by that the OS can keep the index in memory better than Solr? Do you mean that you should use another means to keep the index in memory (i.e. ramdisk)? Is there a generally accepted heap size/index size that you follow? Thanks Amit On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com wrote: The price-performance knee for small servers is 32G ram, 2-6 SATA disks on a raid, 8/16 cores. You can buy these servers and half-fill them, leaving room for expansion. I have not done benchmarks about the max # of processors that can be kept busy during indexing or querying, and the total numbers: QPS, response time averages variability, etc. If your index file size is 8G, and your Java heap is 8G, you will do long garbage collection cycles. The operating system is very good at keeping your index in memory- better than Solr can. Lance On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com wrote: Hi all, I am curious to know get some opinions on at what point having more CPU cores shows diminishing returns in terms of QPS. Our index size is about 8GB and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216. Currently I have the heap to 8GB. We are looking to get more servers to increase capacity and because the warranty is set to expire on our old servers and so I was curious before asking for a certain spec what others run and at what point does having more cores cease to matter? Mainly looking at somewhere between 4-12 cores per server. Thanks! Amit -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com ___b___J_T_f_r_C Checked by AVG - www.avg.com Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10 14:35:00 -- Lance Norskog goks...@gmail.com