Re: SolrCloud loadbalancing, replication, and failover
On 7/31/2014 12:58 AM, shuss...@del.aithent.com wrote: > Thanks for giving great explanation about the memory requirements. Could you > tell be what all parameters that I need to change in my SolrConfig.xml to > handle large index size. What are the optimal values that I need to use. > > My indexed data size is 65 GB (for 8.6 million documents) and I am having 48 > GB RAM on my server. Whenever I perform delta-indexing, the server become > unresponsive while updating the index. > > Following are the changes that I did in solrconfig.xml after going through net > 6 > 256 > false > 1000 > > > 10 > 10 > > > 10 > > > simple > true > > > > 15000 > true > > > ${solr.data.dir:} > > > > So, please provide your valuable suggestion on this problem You replied directly to me, not to the list. I am redirecting this back to the list. One of the first things that I would do is change openSearcher to false in your autoCommit settings. This will mean that you must take care of commits yourself when you index, to make documents visible. If you want any more suggestions, we'll need to see the entire solrconfig.xml file. The fact that you don't have enough RAM to cache your whole index could be a problem. If 8.6 million documents results in 65GB of index, then your documents are probably quite large, and that can lead to other possible challenges, because it usually means that a lot of work must be done to index a single document. There are also probably a lot of terms to match when querying. I do not know how much of your 48GB has been allocated to the java heap, which takes away from memory that the operating system can use to cache index files. Thanks, Shawn
Re: SolrCloud loadbalancing, replication, and failover
One note to add. There's been lots of discussion here about "index size", which is a slippery concept. To whit: Look at your index directory, specifically the *.fdt and *.fdx files. That's where the verbatim copy of your data is held, i.e. whenever you specify 'stored="true"', and is almost totally irrelevant to memory needs for searching, that data is accessed when the final set of documents have been assembled and the fl list is being populated for them. So, an index with 39G of stored data and 1G for the rest has much different memory requirements that 1G of stored data and 39G for the rest, where "the rest" == "the searchable part that can be held in RAM". Then there's the fact that the actual data in the index doesn't include dynamic structures required for navigating that data, so just because your non-stored data consumes 10G of data on your disk doesn't mean it'll actually all fit in 10G of memory. Quick example. Your filter cache consists of a key that is the filter query and maxDoc/8 bytes. So I can configure a doc with 64M docs will require 8M bytes (ignoring some overhead). Not bad so far. But now I keep doing unfortunate filter queries that use NOW, so each one requires an additional 8M of memory. And this is a static index so we never open new readers. And I've configured my filter cache to hold 1,000,000 entries (I have seen this). Works fine in my test environment where I'm bouncing the server pretty frequently, but now I put it in my production environment and it starts blowing up with OOM errors after running for a while. So try. Measure. Rinse, Repeat Best Erick On Fri, Apr 19, 2013 at 10:33 PM, David Parks wrote: > Again, thank you for this incredible information, I feel on much firmer > footing now. I'm going to test distributing this across 10 servers, > borrowing a Hadoop cluster temporarily, and see how it does with enough > memory to have the whole index cached. But I'm thinking that we'll try the > SSD route as our index will probably rest in the 1/2 terabyte range > eventually, there's still a lot of active development. > > I guess the RAM disk would work in our case also, as we only index in > batches, and eventually I'd like to do that off of Solr and just update the > index (I'm presuming this is doable in solr cloud, but I haven't put it to > task yet). If I could purpose Hadoop to index the shards, that would be > ideal, though I haven't quite figured out how to go about it yet. > > David > > > -Original Message----- > From: Shawn Heisey [mailto:s...@elyograg.org] > Sent: Friday, April 19, 2013 9:42 PM > To: solr-user@lucene.apache.org > Subject: Re: SolrCloud loadbalancing, replication, and failover > > On 4/19/2013 3:48 AM, David Parks wrote: >> The Physical Memory is 90% utilized (21.18GB of 23.54GB). Solr has >> dark grey allocation of 602MB, and light grey of an additional 108MB, >> for a JVM total of 710MB allocated. If I understand correctly, Solr >> memory utilization is >> *not* for caching (unless I configured document caches or some of the >> other cache options in Solr, which don't seem to apply in this case, >> and I haven't altered from their defaults). > > Right. Solr does have caches, but they serve specific purposes. The OS is > much better at general large-scale caching than Solr is. Solr caches get > cleared (and possibly re-warmed) whenever you issue a commit on your index > that makes new documents visible. > >> So assuming this box was dedicated to 1 solr instance/shard. What JVM >> heap should I set? Does that matter? 24GB JVM heap? Or keep it lower >> and ensure the OS cache has plenty of room to operate? (this is an >> Ubuntu 12.10 server instance). > > The JVM heap to use is highly dependent on the nature of your queries, the > number of documents, the number of unique terms, etc. The best thing to do > is try it out with a relatively large heap, see how much memory actually > gets used inside the JVM. The jvisualvm and jconsole tools will give you > nice graphs of JVM memory usage. The jstat program will give you raw > numbers on the commandline that you'll need to add to get the full picture. > Due to the garbage collection model that Java uses, what you'll see is a > sawtooth pattern - memory usage goes up to max heap, then garbage collection > reduces it to the actual memory used. > Generally speaking, you want to have more heap available than the "low" > point of that sawtooth pattern. If that low point is around 3GB when you > are hitting your index hard with queries and updates, then you would want to > give Solr a heap of 4 to 6 GB. > >> Would I be wise to jus
RE: SolrCloud loadbalancing, replication, and failover
Again, thank you for this incredible information, I feel on much firmer footing now. I'm going to test distributing this across 10 servers, borrowing a Hadoop cluster temporarily, and see how it does with enough memory to have the whole index cached. But I'm thinking that we'll try the SSD route as our index will probably rest in the 1/2 terabyte range eventually, there's still a lot of active development. I guess the RAM disk would work in our case also, as we only index in batches, and eventually I'd like to do that off of Solr and just update the index (I'm presuming this is doable in solr cloud, but I haven't put it to task yet). If I could purpose Hadoop to index the shards, that would be ideal, though I haven't quite figured out how to go about it yet. David -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Friday, April 19, 2013 9:42 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud loadbalancing, replication, and failover On 4/19/2013 3:48 AM, David Parks wrote: > The Physical Memory is 90% utilized (21.18GB of 23.54GB). Solr has > dark grey allocation of 602MB, and light grey of an additional 108MB, > for a JVM total of 710MB allocated. If I understand correctly, Solr > memory utilization is > *not* for caching (unless I configured document caches or some of the > other cache options in Solr, which don't seem to apply in this case, > and I haven't altered from their defaults). Right. Solr does have caches, but they serve specific purposes. The OS is much better at general large-scale caching than Solr is. Solr caches get cleared (and possibly re-warmed) whenever you issue a commit on your index that makes new documents visible. > So assuming this box was dedicated to 1 solr instance/shard. What JVM > heap should I set? Does that matter? 24GB JVM heap? Or keep it lower > and ensure the OS cache has plenty of room to operate? (this is an > Ubuntu 12.10 server instance). The JVM heap to use is highly dependent on the nature of your queries, the number of documents, the number of unique terms, etc. The best thing to do is try it out with a relatively large heap, see how much memory actually gets used inside the JVM. The jvisualvm and jconsole tools will give you nice graphs of JVM memory usage. The jstat program will give you raw numbers on the commandline that you'll need to add to get the full picture. Due to the garbage collection model that Java uses, what you'll see is a sawtooth pattern - memory usage goes up to max heap, then garbage collection reduces it to the actual memory used. Generally speaking, you want to have more heap available than the "low" point of that sawtooth pattern. If that low point is around 3GB when you are hitting your index hard with queries and updates, then you would want to give Solr a heap of 4 to 6 GB. > Would I be wise to just put the index on a RAM disk and guarantee > performance? Assuming I installed sufficient RAM? A RAM disk is a very good way to guarantee performance - but RAM disks are ephemeral. Reboot or have an OS crash and it's gone, you'll have to reindex. Also remember that you actually need at *least* twice the size of your index so that Solr (Lucene) has enough room to do merges, and the worst-case scenario is *three* times the index size. Merging happens during normal indexing, not just when you optimize. If you have enough RAM for three times your index size and it takes less than an hour or two to rebuild the index, then a RAM disk might be a viable way to go. I suspect that this won't work for you. Thanks, Shawn
Re: SolrCloud loadbalancing, replication, and failover
On 4/19/2013 3:48 AM, David Parks wrote: > The Physical Memory is 90% utilized (21.18GB of 23.54GB). Solr has dark grey > allocation of 602MB, and light grey of an additional 108MB, for a JVM total > of 710MB allocated. If I understand correctly, Solr memory utilization is > *not* for caching (unless I configured document caches or some of the other > cache options in Solr, which don't seem to apply in this case, and I haven't > altered from their defaults). Right. Solr does have caches, but they serve specific purposes. The OS is much better at general large-scale caching than Solr is. Solr caches get cleared (and possibly re-warmed) whenever you issue a commit on your index that makes new documents visible. > So assuming this box was dedicated to 1 solr instance/shard. What JVM heap > should I set? Does that matter? 24GB JVM heap? Or keep it lower and ensure > the OS cache has plenty of room to operate? (this is an Ubuntu 12.10 server > instance). The JVM heap to use is highly dependent on the nature of your queries, the number of documents, the number of unique terms, etc. The best thing to do is try it out with a relatively large heap, see how much memory actually gets used inside the JVM. The jvisualvm and jconsole tools will give you nice graphs of JVM memory usage. The jstat program will give you raw numbers on the commandline that you'll need to add to get the full picture. Due to the garbage collection model that Java uses, what you'll see is a sawtooth pattern - memory usage goes up to max heap, then garbage collection reduces it to the actual memory used. Generally speaking, you want to have more heap available than the "low" point of that sawtooth pattern. If that low point is around 3GB when you are hitting your index hard with queries and updates, then you would want to give Solr a heap of 4 to 6 GB. > Would I be wise to just put the index on a RAM disk and guarantee > performance? Assuming I installed sufficient RAM? A RAM disk is a very good way to guarantee performance - but RAM disks are ephemeral. Reboot or have an OS crash and it's gone, you'll have to reindex. Also remember that you actually need at *least* twice the size of your index so that Solr (Lucene) has enough room to do merges, and the worst-case scenario is *three* times the index size. Merging happens during normal indexing, not just when you optimize. If you have enough RAM for three times your index size and it takes less than an hour or two to rebuild the index, then a RAM disk might be a viable way to go. I suspect that this won't work for you. Thanks, Shawn
RE: SolrCloud loadbalancing, replication, and failover
Wow, thank you for those benchmarks Toke, that really gives me some firm footing to stand on in knowing what to expect and thinking out which path to venture down. It's tremendously appreciated! Dave -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Friday, April 19, 2013 5:17 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud loadbalancing, replication, and failover On Fri, 2013-04-19 at 06:51 +0200, Shawn Heisey wrote: > Using SSDs for storage can speed things up dramatically and may reduce > the total memory requirement to some degree, We have been using SSDs for several years in our servers. It is our clear experience that "to some degree" should be replaced with "very much" in the above. Our current SSD-equipped servers each holds a total of 127GB of index data spread ever 3 instances. The machines each have 16GB of RAM, of which about 7GB are left for disk cache. "We" are the State and University Library, Denmark and our search engine is the primary (and arguably only) way to locate resources for our users. The average raw search time is 32ms for non-faceted queries and 616ms for heavy faceted (which is much too slow. Dang! I thought I fixed that). > but even an SSD is slower than RAM. The transfer speed of RAM is > faster, and from what I understand, the latency is at least an order > of magnitude quicker - nanoseconds vs microseconds. True, but you might as well argue that everyone should go for the fastest CPU possible, as it will be, well, faster than the slower ones. The question is almost never to get the fastest possible, but to get a good price/performance tradeoff. I would argue that SSDs fit that bill very well for a great deal of the "My search is too slow"-threads that are spun on this mailing list. Especially for larger indexes. Regards, Toke Eskildsen
Re: SolrCloud loadbalancing, replication, and failover
On Fri, 2013-04-19 at 06:51 +0200, Shawn Heisey wrote: > Using SSDs for storage can speed things up dramatically and may reduce > the total memory requirement to some degree, We have been using SSDs for several years in our servers. It is our clear experience that "to some degree" should be replaced with "very much" in the above. Our current SSD-equipped servers each holds a total of 127GB of index data spread ever 3 instances. The machines each have 16GB of RAM, of which about 7GB are left for disk cache. "We" are the State and University Library, Denmark and our search engine is the primary (and arguably only) way to locate resources for our users. The average raw search time is 32ms for non-faceted queries and 616ms for heavy faceted (which is much too slow. Dang! I thought I fixed that). > but even an SSD is slower than RAM. The transfer speed of RAM is faster, > and from what I understand, the latency is at least an order of > magnitude quicker - nanoseconds vs microseconds. True, but you might as well argue that everyone should go for the fastest CPU possible, as it will be, well, faster than the slower ones. The question is almost never to get the fastest possible, but to get a good price/performance tradeoff. I would argue that SSDs fit that bill very well for a great deal of the "My search is too slow"-threads that are spun on this mailing list. Especially for larger indexes. Regards, Toke Eskildsen
RE: SolrCloud loadbalancing, replication, and failover
Ok, I understand better now. The Physical Memory is 90% utilized (21.18GB of 23.54GB). Solr has dark grey allocation of 602MB, and light grey of an additional 108MB, for a JVM total of 710MB allocated. If I understand correctly, Solr memory utilization is *not* for caching (unless I configured document caches or some of the other cache options in Solr, which don't seem to apply in this case, and I haven't altered from their defaults). So assuming this box was dedicated to 1 solr instance/shard. What JVM heap should I set? Does that matter? 24GB JVM heap? Or keep it lower and ensure the OS cache has plenty of room to operate? (this is an Ubuntu 12.10 server instance). Would I be wise to just put the index on a RAM disk and guarantee performance? Assuming I installed sufficient RAM? Dave -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Friday, April 19, 2013 4:19 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud loadbalancing, replication, and failover On 4/19/2013 2:15 AM, David Parks wrote: > Interesting. I'm trying to correlate this new understanding to what I > see on my servers. I've got one server with 5GB dedicated to solr, > solr dashboard reports a 167GB index actually. > > When I do many typical queries I see between 3MB and 9MB of disk reads > (watching iostat). > > But solr's dashboard only shows 710MB of memory in use (this box has > had many hundreds of queries put through it, and has been up for 1 > week). That doesn't quite correlate with my understanding that Solr > would cache the index as much as possible. There are two memory sections on the dashboard. The one at the top shows the operating system view of physical memory. That is probably showing virtually all of it in use. Most UNIX platforms will show you the same info with 'top' or 'free'. Some of them, like Solaris, require different tools. I assume you're not using Windows, because you mention iostat. The other memory section is for the JVM, and that only covers the memory used by Solr. The dark grey section is the amount of Java heap memory currently utilized by Solr and its servlet container. The light grey section represents the memory that the JVM has allocated from system memory. If any part of that bar is white, then Java has not yet requested the maximum configured heap. Typically a long-running Solr install will have only dark and light grey, no white. The operating system is what caches your index, not Solr. The bulk of your RAM should be unallocated. With your index size, the OS will use all unallocated RAM for the disk cache. If a program requests some of that RAM, the OS will instantly give it up. Thanks, Shawn
Re: SolrCloud loadbalancing, replication, and failover
On 4/19/2013 2:15 AM, David Parks wrote: > Interesting. I'm trying to correlate this new understanding to what I see on > my servers. I've got one server with 5GB dedicated to solr, solr dashboard > reports a 167GB index actually. > > When I do many typical queries I see between 3MB and 9MB of disk reads > (watching iostat). > > But solr's dashboard only shows 710MB of memory in use (this box has had > many hundreds of queries put through it, and has been up for 1 week). That > doesn't quite correlate with my understanding that Solr would cache the > index as much as possible. There are two memory sections on the dashboard. The one at the top shows the operating system view of physical memory. That is probably showing virtually all of it in use. Most UNIX platforms will show you the same info with 'top' or 'free'. Some of them, like Solaris, require different tools. I assume you're not using Windows, because you mention iostat. The other memory section is for the JVM, and that only covers the memory used by Solr. The dark grey section is the amount of Java heap memory currently utilized by Solr and its servlet container. The light grey section represents the memory that the JVM has allocated from system memory. If any part of that bar is white, then Java has not yet requested the maximum configured heap. Typically a long-running Solr install will have only dark and light grey, no white. The operating system is what caches your index, not Solr. The bulk of your RAM should be unallocated. With your index size, the OS will use all unallocated RAM for the disk cache. If a program requests some of that RAM, the OS will instantly give it up. Thanks, Shawn
Re: SolrCloud loadbalancing, replication, and failover
On 4/19/2013 1:34 AM, John Nielsen wrote: > Well, to consume 120GB of RAM with a 120GB index, you would have to query > over every single GB of data. > > If you only actually query over, say, 500MB of the 120GB data in your dev > environment, you would only use 500MB worth of RAM for caching. Not 120GB What you are saying is essentially true, although I would not be surprised to learn that even a single query would read a few gigabytes from a 120GB index, assuming that you start after a server reboot. The next query would re-use a lot of the data cached by the first query and return much faster. > On Fri, Apr 19, 2013 at 7:55 AM, David Parks wrote: >> Question: if I had 1 server with 60GB of memory and 120GB index, would solr >> make full use of the 60GB of memory? Thus trimming disk access in half. Or >> is it an all-or-nothing thing? In a dev environment, I didn't notice SOLR >> consuming the full 5GB of RAM assigned to it with a 120GB index. Solr would likely cause the OS to use most or all of that memory. It's not an all or nothing thing. The first few queries will load a big chunk, then each additional query will load a little more. 60GB of RAM will be significantly better than 12GB. With only 12GB, it is extremely likely that a given query will read a section of the index that will push the data required for the next query out of the disk cache, so it will have to re-read it from the disk on the next query, and so on in a never-ending cycle. That is far less likely if you have enough RAM for half your index rather than a tenth. Operating system disk caches are pretty good at figuring out which data is needed frequently. If the cache is big enough, that data can be kept in the cache easily. An ideal setup would have enough RAM to cache the entire index. Depending on your schema, you may find that the disk cache in production only ends up caching somewhere between half and two thirds of your index. The 60GB figure you have quoted above *MIGHT* be enough to make things work really well with a 120GB index, but I always tell people that if they want top performance, they will buy enough RAM to cache the whole thing. You might have a combination of query pattern and data that results in more of the index needing cache than I have seen on my setup. You are likely to add documents continuously. You may learn that your schema doesn't cover your needs, so you have to modify it to tokenize more aggressively, or you may need to copy fields so you can analyze the same data more than one way. These things will make your index bigger. If your query volume grows or gets more varied, more of your index is likely to end up in the disk cache. I would not recommend going into production with an index that has no redundancy. If you buy quality hardware with redundancy in storage, dual power supplies, and ECC memory, catastrophic failures are rare, but they DO happen. The motherboard or an entire RAM chip could suddenly die. Someone might accidentally hit the power switch on the server and cause it to shut down. They might be working in the rack, fall down, and pull out both power cords in an attempt to catch themselves. The latter scenarios are a temporary problem, but your users will probably notice. Thanks, Shawn
RE: SolrCloud loadbalancing, replication, and failover
Interesting. I'm trying to correlate this new understanding to what I see on my servers. I've got one server with 5GB dedicated to solr, solr dashboard reports a 167GB index actually. When I do many typical queries I see between 3MB and 9MB of disk reads (watching iostat). But solr's dashboard only shows 710MB of memory in use (this box has had many hundreds of queries put through it, and has been up for 1 week). That doesn't quite correlate with my understanding that Solr would cache the index as much as possible. Should I be thinking that things aren't configured correctly here? Dave -Original Message- From: John Nielsen [mailto:j...@mcb.dk] Sent: Friday, April 19, 2013 2:35 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud loadbalancing, replication, and failover Well, to consume 120GB of RAM with a 120GB index, you would have to query over every single GB of data. If you only actually query over, say, 500MB of the 120GB data in your dev environment, you would only use 500MB worth of RAM for caching. Not 120GB On Fri, Apr 19, 2013 at 7:55 AM, David Parks wrote: > Wow! That was the most pointed, concise discussion of hardware > requirements I've seen to date, and it's fabulously helpful, thank you > Shawn! We currently have 2 servers that I can dedicate about 12GB of > ram to Solr on (we're moving to these 2 servers now). I can upgrade > further if it's needed & justified, and your discussion helps me > justify that such an upgrade is the right thing to do. > > So... If I move to 3 servers with 50GB of RAM each, using 3 shards, I > should be in the free and clear then right? This seems reasonable and > doable. > > In this more extreme example the failover properties of solr cloud > become more clear. I couldn't possibly run a replica shard without > doubling the memory, so really replication isn't reasonable until I > have double the hardware, then the load balancing scheme makes perfect > sense. With 3 servers, 50GB of RAM and 120GB index I should just > backup the index directory I think. > > My previous though to run replication just for failover would have > actually resulted in LOWER performance because I would have halved the > memory available to the master & replica. So the previous question is > answered as well now. > > Question: if I had 1 server with 60GB of memory and 120GB index, would > solr make full use of the 60GB of memory? Thus trimming disk access in > half. Or is it an all-or-nothing thing? In a dev environment, I > didn't notice SOLR consuming the full 5GB of RAM assigned to it with a 120GB index. > > Dave > > > -Original Message----- > From: Shawn Heisey [mailto:s...@elyograg.org] > Sent: Friday, April 19, 2013 11:51 AM > To: solr-user@lucene.apache.org > Subject: Re: SolrCloud loadbalancing, replication, and failover > > On 4/18/2013 8:12 PM, David Parks wrote: > > I think I still don't understand something here. > > > > My concern right now is that query times are very slow for 120GB > > index (14s on avg), I've seen a lot of disk activity when running queries. > > > > I'm hoping that distributing that query across 2 servers is going to > > improve the query time, specifically I'm hoping that we can > > distribute that disk activity because we don't have great disks on there (yet). > > > > So, with disk IO being a factor in mind, running the query on one > > box, > vs. > > across 2 *should* be a concern right? > > > > Admittedly, this is the first step in what will probably be many to > > try to work our query times down from 14s to what I want to be around 1s. > > I went through my mailing list archive to see what all you've said > about your setup. One thing that I can't seem to find is a mention of > how much total RAM is in each of your servers. I apologize if it was > actually there and I overlooked it. > > In one email thread, you wanted to know whether Solr is CPU-bound or > IO-bound. Solr is heavily reliant on the index on disk, and disk I/O > is the slowest piece of the puzzle. The way to get good performance > out of Solr is to have enough memory that you can take the disk mostly > out of the equation by having the operating system cache the index in > RAM. If you don't have enough RAM for that, then Solr becomes > IO-bound, and your CPUs will be busy in iowait, unable to do much real > work. If you DO have enough RAM to cache all (or most) of your index, > then Solr will be CPU-bound. > > With 120GB of total index data on each server, you would want at least > 128GB of RAM per server, assuming you are only giving 8-16GB of RAM
Re: SolrCloud loadbalancing, replication, and failover
Well, to consume 120GB of RAM with a 120GB index, you would have to query over every single GB of data. If you only actually query over, say, 500MB of the 120GB data in your dev environment, you would only use 500MB worth of RAM for caching. Not 120GB On Fri, Apr 19, 2013 at 7:55 AM, David Parks wrote: > Wow! That was the most pointed, concise discussion of hardware requirements > I've seen to date, and it's fabulously helpful, thank you Shawn! We > currently have 2 servers that I can dedicate about 12GB of ram to Solr on > (we're moving to these 2 servers now). I can upgrade further if it's needed > & justified, and your discussion helps me justify that such an upgrade is > the right thing to do. > > So... If I move to 3 servers with 50GB of RAM each, using 3 shards, I > should > be in the free and clear then right? This seems reasonable and doable. > > In this more extreme example the failover properties of solr cloud become > more clear. I couldn't possibly run a replica shard without doubling the > memory, so really replication isn't reasonable until I have double the > hardware, then the load balancing scheme makes perfect sense. With 3 > servers, 50GB of RAM and 120GB index I should just backup the index > directory I think. > > My previous though to run replication just for failover would have actually > resulted in LOWER performance because I would have halved the memory > available to the master & replica. So the previous question is answered as > well now. > > Question: if I had 1 server with 60GB of memory and 120GB index, would solr > make full use of the 60GB of memory? Thus trimming disk access in half. Or > is it an all-or-nothing thing? In a dev environment, I didn't notice SOLR > consuming the full 5GB of RAM assigned to it with a 120GB index. > > Dave > > > -Original Message----- > From: Shawn Heisey [mailto:s...@elyograg.org] > Sent: Friday, April 19, 2013 11:51 AM > To: solr-user@lucene.apache.org > Subject: Re: SolrCloud loadbalancing, replication, and failover > > On 4/18/2013 8:12 PM, David Parks wrote: > > I think I still don't understand something here. > > > > My concern right now is that query times are very slow for 120GB index > > (14s on avg), I've seen a lot of disk activity when running queries. > > > > I'm hoping that distributing that query across 2 servers is going to > > improve the query time, specifically I'm hoping that we can distribute > > that disk activity because we don't have great disks on there (yet). > > > > So, with disk IO being a factor in mind, running the query on one box, > vs. > > across 2 *should* be a concern right? > > > > Admittedly, this is the first step in what will probably be many to > > try to work our query times down from 14s to what I want to be around 1s. > > I went through my mailing list archive to see what all you've said about > your setup. One thing that I can't seem to find is a mention of how much > total RAM is in each of your servers. I apologize if it was actually there > and I overlooked it. > > In one email thread, you wanted to know whether Solr is CPU-bound or > IO-bound. Solr is heavily reliant on the index on disk, and disk I/O is > the > slowest piece of the puzzle. The way to get good performance out of Solr is > to have enough memory that you can take the disk mostly out of the equation > by having the operating system cache the index in RAM. If you don't have > enough RAM for that, then Solr becomes IO-bound, and your CPUs will be busy > in iowait, unable to do much real work. If you DO have enough RAM to cache > all (or most) of your index, then Solr will be CPU-bound. > > With 120GB of total index data on each server, you would want at least > 128GB > of RAM per server, assuming you are only giving 8-16GB of RAM to Solr, and > that Solr is the only thing running on the machine. If you have more > servers and shards, you can reduce the per-server memory requirement > because > the amount of index data on each server would go down. I am aware of the > cost associated with this kind of requirement - each of my Solr servers has > 64GB. > > If you are sharing the server with another program, then you want to have > enough RAM available for Solr's heap, Solr's data, the other program's > heap, > and the other program's data. Some programs (like > MySQL) completely skip the OS disk cache and instead do that caching > themselves with heap memory that's actually allocated to the program. > If you're using a program like that, then you wouldn't need to count its > data. > > Using SSDs
RE: SolrCloud loadbalancing, replication, and failover
Wow! That was the most pointed, concise discussion of hardware requirements I've seen to date, and it's fabulously helpful, thank you Shawn! We currently have 2 servers that I can dedicate about 12GB of ram to Solr on (we're moving to these 2 servers now). I can upgrade further if it's needed & justified, and your discussion helps me justify that such an upgrade is the right thing to do. So... If I move to 3 servers with 50GB of RAM each, using 3 shards, I should be in the free and clear then right? This seems reasonable and doable. In this more extreme example the failover properties of solr cloud become more clear. I couldn't possibly run a replica shard without doubling the memory, so really replication isn't reasonable until I have double the hardware, then the load balancing scheme makes perfect sense. With 3 servers, 50GB of RAM and 120GB index I should just backup the index directory I think. My previous though to run replication just for failover would have actually resulted in LOWER performance because I would have halved the memory available to the master & replica. So the previous question is answered as well now. Question: if I had 1 server with 60GB of memory and 120GB index, would solr make full use of the 60GB of memory? Thus trimming disk access in half. Or is it an all-or-nothing thing? In a dev environment, I didn't notice SOLR consuming the full 5GB of RAM assigned to it with a 120GB index. Dave -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Friday, April 19, 2013 11:51 AM To: solr-user@lucene.apache.org Subject: Re: SolrCloud loadbalancing, replication, and failover On 4/18/2013 8:12 PM, David Parks wrote: > I think I still don't understand something here. > > My concern right now is that query times are very slow for 120GB index > (14s on avg), I've seen a lot of disk activity when running queries. > > I'm hoping that distributing that query across 2 servers is going to > improve the query time, specifically I'm hoping that we can distribute > that disk activity because we don't have great disks on there (yet). > > So, with disk IO being a factor in mind, running the query on one box, vs. > across 2 *should* be a concern right? > > Admittedly, this is the first step in what will probably be many to > try to work our query times down from 14s to what I want to be around 1s. I went through my mailing list archive to see what all you've said about your setup. One thing that I can't seem to find is a mention of how much total RAM is in each of your servers. I apologize if it was actually there and I overlooked it. In one email thread, you wanted to know whether Solr is CPU-bound or IO-bound. Solr is heavily reliant on the index on disk, and disk I/O is the slowest piece of the puzzle. The way to get good performance out of Solr is to have enough memory that you can take the disk mostly out of the equation by having the operating system cache the index in RAM. If you don't have enough RAM for that, then Solr becomes IO-bound, and your CPUs will be busy in iowait, unable to do much real work. If you DO have enough RAM to cache all (or most) of your index, then Solr will be CPU-bound. With 120GB of total index data on each server, you would want at least 128GB of RAM per server, assuming you are only giving 8-16GB of RAM to Solr, and that Solr is the only thing running on the machine. If you have more servers and shards, you can reduce the per-server memory requirement because the amount of index data on each server would go down. I am aware of the cost associated with this kind of requirement - each of my Solr servers has 64GB. If you are sharing the server with another program, then you want to have enough RAM available for Solr's heap, Solr's data, the other program's heap, and the other program's data. Some programs (like MySQL) completely skip the OS disk cache and instead do that caching themselves with heap memory that's actually allocated to the program. If you're using a program like that, then you wouldn't need to count its data. Using SSDs for storage can speed things up dramatically and may reduce the total memory requirement to some degree, but even an SSD is slower than RAM. The transfer speed of RAM is faster, and from what I understand, the latency is at least an order of magnitude quicker - nanoseconds vs microseconds. In another thread, you asked about how Google gets such good response times. Although Google's software probably works differently than Solr/Lucene, when it comes right down to it, all search engines do similar jobs and have similar requirements. I would imagine that Google gets incredible response time because they have incredible amounts of RAM at their disposal that keep the important bits of their index instantly availabl
Re: SolrCloud loadbalancing, replication, and failover
On 4/18/2013 8:12 PM, David Parks wrote: > I think I still don't understand something here. > > My concern right now is that query times are very slow for 120GB index (14s > on avg), I've seen a lot of disk activity when running queries. > > I'm hoping that distributing that query across 2 servers is going to improve > the query time, specifically I'm hoping that we can distribute that disk > activity because we don't have great disks on there (yet). > > So, with disk IO being a factor in mind, running the query on one box, vs. > across 2 *should* be a concern right? > > Admittedly, this is the first step in what will probably be many to try to > work our query times down from 14s to what I want to be around 1s. I went through my mailing list archive to see what all you've said about your setup. One thing that I can't seem to find is a mention of how much total RAM is in each of your servers. I apologize if it was actually there and I overlooked it. In one email thread, you wanted to know whether Solr is CPU-bound or IO-bound. Solr is heavily reliant on the index on disk, and disk I/O is the slowest piece of the puzzle. The way to get good performance out of Solr is to have enough memory that you can take the disk mostly out of the equation by having the operating system cache the index in RAM. If you don't have enough RAM for that, then Solr becomes IO-bound, and your CPUs will be busy in iowait, unable to do much real work. If you DO have enough RAM to cache all (or most) of your index, then Solr will be CPU-bound. With 120GB of total index data on each server, you would want at least 128GB of RAM per server, assuming you are only giving 8-16GB of RAM to Solr, and that Solr is the only thing running on the machine. If you have more servers and shards, you can reduce the per-server memory requirement because the amount of index data on each server would go down. I am aware of the cost associated with this kind of requirement - each of my Solr servers has 64GB. If you are sharing the server with another program, then you want to have enough RAM available for Solr's heap, Solr's data, the other program's heap, and the other program's data. Some programs (like MySQL) completely skip the OS disk cache and instead do that caching themselves with heap memory that's actually allocated to the program. If you're using a program like that, then you wouldn't need to count its data. Using SSDs for storage can speed things up dramatically and may reduce the total memory requirement to some degree, but even an SSD is slower than RAM. The transfer speed of RAM is faster, and from what I understand, the latency is at least an order of magnitude quicker - nanoseconds vs microseconds. In another thread, you asked about how Google gets such good response times. Although Google's software probably works differently than Solr/Lucene, when it comes right down to it, all search engines do similar jobs and have similar requirements. I would imagine that Google gets incredible response time because they have incredible amounts of RAM at their disposal that keep the important bits of their index instantly available. They have thousands of servers in each data center. I once got a look at the extent of Google's hardware in one data center - it was HUGE. I couldn't get in to examine things closely, they keep that stuff very locked down. Thanks, Shawn
RE: SolrCloud loadbalancing, replication, and failover
I think I still don't understand something here. My concern right now is that query times are very slow for 120GB index (14s on avg), I've seen a lot of disk activity when running queries. I'm hoping that distributing that query across 2 servers is going to improve the query time, specifically I'm hoping that we can distribute that disk activity because we don't have great disks on there (yet). So, with disk IO being a factor in mind, running the query on one box, vs. across 2 *should* be a concern right? Admittedly, this is the first step in what will probably be many to try to work our query times down from 14s to what I want to be around 1s. Dave -Original Message- From: Timothy Potter [mailto:thelabd...@gmail.com] Sent: Thursday, April 18, 2013 9:16 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud loadbalancing, replication, and failover Hi Dave, This sounds more like a budget / deployment issue vs. anything architectural. You want 2 shards with replication so you either need sufficient capacity on each of your 2 servers to host 2 Solr instances or you need 4 servers. You need to avoid starving Solr of necessary RAM, disk performance, and CPU regardless of how you lay out the cluster otherwise performance will suffer. My guess is if each Solr had sufficient resources, you wouldn't actually notice much difference in query performance. Tim On Thu, Apr 18, 2013 at 8:03 AM, David Parks wrote: > But my concern is this, when we have just 2 servers: > - I want 1 to be able to take over in case the other fails, as you > point out. > - But when *both* servers are up I don't want the SolrCloud load > balancer to have Shard1 and Replica2 do the work (as they would both > reside on the same physical server). > > Does that make sense? I want *both* server1 & server2 sharing the > processing of every request, *and* I want the failover capability. > > I'm probably missing some bit of logic here, but I want to be sure I > understand the architecture. > > Dave > > > > -Original Message- > From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] > Sent: Thursday, April 18, 2013 8:13 PM > To: solr-user@lucene.apache.org > Subject: Re: SolrCloud loadbalancing, replication, and failover > > Correct. This is what you want if server 2 goes down. > > Otis > Solr & ElasticSearch Support > http://sematext.com/ > On Apr 18, 2013 3:11 AM, "David Parks" wrote: > > > Step 1: distribute processing > > > > We have 2 servers in which we'll run 2 SolrCloud instances on. > > > > We'll define 2 shards so that both servers are busy for each request > > (improving response time of the request). > > > > > > > > Step 2: Failover > > > > We would now like to ensure that if either of the servers goes down > > (we're very unlucky with disks), that the other will be able to take > > over automatically. > > > > So we define 2 shards with a replication factor of 2. > > > > > > > > So we have: > > > > . Server 1: Shard 1, Replica 2 > > > > . Server 2: Shard 2, Replica 1 > > > > > > > > Question: > > > > But in SolrCloud, replicas are active right? So isn't it now > > possible that the load balancer will have Server 1 process *both* > > parts of a request, after all, it has both shards due to the replication, right? > > > > > >
Re: SolrCloud loadbalancing, replication, and failover
Hi Dave, This sounds more like a budget / deployment issue vs. anything architectural. You want 2 shards with replication so you either need sufficient capacity on each of your 2 servers to host 2 Solr instances or you need 4 servers. You need to avoid starving Solr of necessary RAM, disk performance, and CPU regardless of how you lay out the cluster otherwise performance will suffer. My guess is if each Solr had sufficient resources, you wouldn't actually notice much difference in query performance. Tim On Thu, Apr 18, 2013 at 8:03 AM, David Parks wrote: > But my concern is this, when we have just 2 servers: > - I want 1 to be able to take over in case the other fails, as you point > out. > - But when *both* servers are up I don't want the SolrCloud load balancer > to have Shard1 and Replica2 do the work (as they would both reside on the > same physical server). > > Does that make sense? I want *both* server1 & server2 sharing the > processing > of every request, *and* I want the failover capability. > > I'm probably missing some bit of logic here, but I want to be sure I > understand the architecture. > > Dave > > > > -Original Message- > From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] > Sent: Thursday, April 18, 2013 8:13 PM > To: solr-user@lucene.apache.org > Subject: Re: SolrCloud loadbalancing, replication, and failover > > Correct. This is what you want if server 2 goes down. > > Otis > Solr & ElasticSearch Support > http://sematext.com/ > On Apr 18, 2013 3:11 AM, "David Parks" wrote: > > > Step 1: distribute processing > > > > We have 2 servers in which we'll run 2 SolrCloud instances on. > > > > We'll define 2 shards so that both servers are busy for each request > > (improving response time of the request). > > > > > > > > Step 2: Failover > > > > We would now like to ensure that if either of the servers goes down > > (we're very unlucky with disks), that the other will be able to take > > over automatically. > > > > So we define 2 shards with a replication factor of 2. > > > > > > > > So we have: > > > > . Server 1: Shard 1, Replica 2 > > > > . Server 2: Shard 2, Replica 1 > > > > > > > > Question: > > > > But in SolrCloud, replicas are active right? So isn't it now possible > > that the load balancer will have Server 1 process *both* parts of a > > request, after all, it has both shards due to the replication, right? > > > > > >
RE: SolrCloud loadbalancing, replication, and failover
But my concern is this, when we have just 2 servers: - I want 1 to be able to take over in case the other fails, as you point out. - But when *both* servers are up I don't want the SolrCloud load balancer to have Shard1 and Replica2 do the work (as they would both reside on the same physical server). Does that make sense? I want *both* server1 & server2 sharing the processing of every request, *and* I want the failover capability. I'm probably missing some bit of logic here, but I want to be sure I understand the architecture. Dave -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Thursday, April 18, 2013 8:13 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud loadbalancing, replication, and failover Correct. This is what you want if server 2 goes down. Otis Solr & ElasticSearch Support http://sematext.com/ On Apr 18, 2013 3:11 AM, "David Parks" wrote: > Step 1: distribute processing > > We have 2 servers in which we'll run 2 SolrCloud instances on. > > We'll define 2 shards so that both servers are busy for each request > (improving response time of the request). > > > > Step 2: Failover > > We would now like to ensure that if either of the servers goes down > (we're very unlucky with disks), that the other will be able to take > over automatically. > > So we define 2 shards with a replication factor of 2. > > > > So we have: > > . Server 1: Shard 1, Replica 2 > > . Server 2: Shard 2, Replica 1 > > > > Question: > > But in SolrCloud, replicas are active right? So isn't it now possible > that the load balancer will have Server 1 process *both* parts of a > request, after all, it has both shards due to the replication, right? > >
Re: SolrCloud loadbalancing, replication, and failover
Correct. This is what you want if server 2 goes down. Otis Solr & ElasticSearch Support http://sematext.com/ On Apr 18, 2013 3:11 AM, "David Parks" wrote: > Step 1: distribute processing > > We have 2 servers in which we'll run 2 SolrCloud instances on. > > We'll define 2 shards so that both servers are busy for each request > (improving response time of the request). > > > > Step 2: Failover > > We would now like to ensure that if either of the servers goes down (we're > very unlucky with disks), that the other will be able to take over > automatically. > > So we define 2 shards with a replication factor of 2. > > > > So we have: > > . Server 1: Shard 1, Replica 2 > > . Server 2: Shard 2, Replica 1 > > > > Question: > > But in SolrCloud, replicas are active right? So isn't it now possible that > the load balancer will have Server 1 process *both* parts of a request, > after all, it has both shards due to the replication, right? > >