Re: Approximately needed RAM for 5000 query/second at a Solr machine?
bq: disk space is three times True, I keep forgetting about compound since I never use it... On Wed, Apr 10, 2013 at 11:05 AM, Walter Underwood wun...@wunderwood.org wrote: Correct, except the worst case maximum for disk space is three times. --wunder On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote: You're mixing up disk and RAM requirements when you talk about having twice the disk size. Solr does _NOT_ require twice the index size of RAM to optimize, it requires twice the size on _DISK_. In terms of RAM requirements, you need to create an index, run realistic queries at the installation and measure. Best Erick On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote: On 4/9/2013 7:03 PM, Furkan KAMACI wrote: These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? What we have been discussing is the OS cache, which is memory that is not used by programs. The OS uses that memory to make everything run faster. The OS will instantly give that memory up if a program requests it. Solr is a java program, and java uses memory a little differently, so Solr most likely will NOT use more memory when it is available. In a normal directly executable program, memory can be allocated at any time, and given back to the system at any time. With Java, you tell it the maximum amount of memory the program is ever allowed to use. Because of how memory is used inside Java, most long-running Java programs (like Solr) will allocate up to the configured maximum even if they don't really need that much memory. Most Java virtual machines will never give the memory back to the system even if it is not required. Thanks, Shawn Furkan KAMACI furkankam...@gmail.com writes: I am sorry but you said: *you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use* I have made an assumption my indexes at my machine. Let's assume that it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up to how much I define it as a Java processes. When we think about the indexes at storage and caching them at RAM by OS, is that what you talk about: having more than 5 GB - or - 10 GB RAM for my machine? 2013/4/10 Shawn Heisey s...@elyograg.org 10 GB. Because when Solr shuffles the data around, it could use up to twice the size of the index in order to optimize the index on disk. -- Justin -- Walter Underwood wun...@wunderwood.org
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
Hi Jack; Due to I am new to Solr, can you explain this two things that you said: 1) when most people say index size they are referring to all fields, collectively, not individual fields (what do you mean with Segments are on a per-field basis and all fields, individual fields.) 2) more cores might make the worst case scenario worse since it will maximize the amount of data processed at a given moment 2013/4/13 Erick Erickson erickerick...@gmail.com bq: disk space is three times True, I keep forgetting about compound since I never use it... On Wed, Apr 10, 2013 at 11:05 AM, Walter Underwood wun...@wunderwood.org wrote: Correct, except the worst case maximum for disk space is three times. --wunder On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote: You're mixing up disk and RAM requirements when you talk about having twice the disk size. Solr does _NOT_ require twice the index size of RAM to optimize, it requires twice the size on _DISK_. In terms of RAM requirements, you need to create an index, run realistic queries at the installation and measure. Best Erick On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote: On 4/9/2013 7:03 PM, Furkan KAMACI wrote: These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? What we have been discussing is the OS cache, which is memory that is not used by programs. The OS uses that memory to make everything run faster. The OS will instantly give that memory up if a program requests it. Solr is a java program, and java uses memory a little differently, so Solr most likely will NOT use more memory when it is available. In a normal directly executable program, memory can be allocated at any time, and given back to the system at any time. With Java, you tell it the maximum amount of memory the program is ever allowed to use. Because of how memory is used inside Java, most long-running Java programs (like Solr) will allocate up to the configured maximum even if they don't really need that much memory. Most Java virtual machines will never give the memory back to the system even if it is not required. Thanks, Shawn Furkan KAMACI furkankam...@gmail.com writes: I am sorry but you said: *you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use* I have made an assumption my indexes at my machine. Let's assume that it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up to how much I define it as a Java processes. When we think about the indexes at storage and caching them at RAM by OS, is that what you talk about: having more than 5 GB - or - 10 GB RAM for my machine? 2013/4/10 Shawn Heisey s...@elyograg.org 10 GB. Because when Solr shuffles the data around, it could use up to twice the size of the index in order to optimize the index on disk. -- Justin -- Walter Underwood wun...@wunderwood.org
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
Hi Walter; Is there any document or something else says that worst case is three times of disk space? Twice times or three times. It is really different when we talk about GB's of disk spaces. 2013/4/10 Walter Underwood wun...@wunderwood.org Correct, except the worst case maximum for disk space is three times. --wunder On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote: You're mixing up disk and RAM requirements when you talk about having twice the disk size. Solr does _NOT_ require twice the index size of RAM to optimize, it requires twice the size on _DISK_. In terms of RAM requirements, you need to create an index, run realistic queries at the installation and measure. Best Erick On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote: On 4/9/2013 7:03 PM, Furkan KAMACI wrote: These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? What we have been discussing is the OS cache, which is memory that is not used by programs. The OS uses that memory to make everything run faster. The OS will instantly give that memory up if a program requests it. Solr is a java program, and java uses memory a little differently, so Solr most likely will NOT use more memory when it is available. In a normal directly executable program, memory can be allocated at any time, and given back to the system at any time. With Java, you tell it the maximum amount of memory the program is ever allowed to use. Because of how memory is used inside Java, most long-running Java programs (like Solr) will allocate up to the configured maximum even if they don't really need that much memory. Most Java virtual machines will never give the memory back to the system even if it is not required. Thanks, Shawn Furkan KAMACI furkankam...@gmail.com writes: I am sorry but you said: *you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use* I have made an assumption my indexes at my machine. Let's assume that it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up to how much I define it as a Java processes. When we think about the indexes at storage and caching them at RAM by OS, is that what you talk about: having more than 5 GB - or - 10 GB RAM for my machine? 2013/4/10 Shawn Heisey s...@elyograg.org 10 GB. Because when Solr shuffles the data around, it could use up to twice the size of the index in order to optimize the index on disk. -- Justin -- Walter Underwood wun...@wunderwood.org
RE: Approximately needed RAM for 5000 query/second at a Solr machine?
I've investigated this in the past. The worst case is 2*indexSize additional disk space (3*indexSize total) during an optimize. In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor of 10. We would see the worst case happen when there were exactly 20 segments (or some other multiple of 10, I believe) at the start of the optimize. IIRC, it would merge those 20 segments down to 2 segments, and then merge those 2 segments down to 1 segment. 1*indexSize space was used by the original index (because there is still a reader open on it), 1*indexSpace was used by the 2 segments, and 1*indexSize space was used by the 1 segment. This is the worst case because there are two full additional copies of the index on disk. Normally, when the number of segments is not a multiple of the mergeFactor, there will be some part of the index that was not part of both merges (and this part that is excluded usually would be the largest segments). We worked around this by doing multiple optimize passes, where the first pass merges down to between 2 and 2*mergeFactor-1 segments (based on a great tip from Lance Norskog on the mailing list a couple years ago). I'm not sure if the current merge policy implementations still have this issue. -Michael -Original Message- From: Furkan KAMACI [mailto:furkankam...@gmail.com] Sent: Thursday, April 11, 2013 2:44 AM To: solr-user@lucene.apache.org Subject: Re: Approximately needed RAM for 5000 query/second at a Solr machine? Hi Walter; Is there any document or something else says that worst case is three times of disk space? Twice times or three times. It is really different when we talk about GB's of disk spaces. 2013/4/10 Walter Underwood wun...@wunderwood.org Correct, except the worst case maximum for disk space is three times. --wunder On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote: You're mixing up disk and RAM requirements when you talk about having twice the disk size. Solr does _NOT_ require twice the index size of RAM to optimize, it requires twice the size on _DISK_. In terms of RAM requirements, you need to create an index, run realistic queries at the installation and measure. Best Erick On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote: On 4/9/2013 7:03 PM, Furkan KAMACI wrote: These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? What we have been discussing is the OS cache, which is memory that is not used by programs. The OS uses that memory to make everything run faster. The OS will instantly give that memory up if a program requests it. Solr is a java program, and java uses memory a little differently, so Solr most likely will NOT use more memory when it is available. In a normal directly executable program, memory can be allocated at any time, and given back to the system at any time. With Java, you tell it the maximum amount of memory the program is ever allowed to use. Because of how memory is used inside Java, most long-running Java programs (like Solr) will allocate up to the configured maximum even if they don't really need that much memory. Most Java virtual machines will never give the memory back to the system even if it is not required. Thanks, Shawn Furkan KAMACI furkankam...@gmail.com writes: I am sorry but you said: *you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use* I have made an assumption my indexes at my machine. Let's assume that it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up to how much I define it as a Java processes. When we think about the indexes at storage and caching them at RAM by OS, is that what you talk about: having more than 5 GB - or - 10 GB RAM for my machine? 2013/4/10 Shawn Heisey s...@elyograg.org 10 GB. Because when Solr shuffles the data around, it could use up to twice the size of the index in order to optimize the index on disk. -- Justin -- Walter Underwood wun...@wunderwood.org
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
Here is the situation where merging can require 3X space. It can only happen if you force merge, then index with merging turned off, but we had Ultraseek customers do that. * All documents are merged into a single segment. * Without a merge, all documents are replaced. * This results in one segment of deleted documents and one of new documents (2X). * A merge takes place, creating a new segment of the same size, thus 3X. For normal operation, 2X is plenty of room. wunder On Apr 11, 2013, at 6:46 AM, Michael Ryan wrote: I've investigated this in the past. The worst case is 2*indexSize additional disk space (3*indexSize total) during an optimize. In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor of 10. We would see the worst case happen when there were exactly 20 segments (or some other multiple of 10, I believe) at the start of the optimize. IIRC, it would merge those 20 segments down to 2 segments, and then merge those 2 segments down to 1 segment. 1*indexSize space was used by the original index (because there is still a reader open on it), 1*indexSpace was used by the 2 segments, and 1*indexSize space was used by the 1 segment. This is the worst case because there are two full additional copies of the index on disk. Normally, when the number of segments is not a multiple of the mergeFactor, there will be some part of the index that was not part of both merges (and this part that is excluded usually would be the largest segments). We worked around this by doing multiple optimize passes, where the first pass merges down to between 2 and 2*mergeFactor-1 segments (based on a great tip from Lance Norskog on the mailing list a couple years ago). I'm not sure if the current merge policy implementations still have this issue. -Michael -Original Message- From: Furkan KAMACI [mailto:furkankam...@gmail.com] Sent: Thursday, April 11, 2013 2:44 AM To: solr-user@lucene.apache.org Subject: Re: Approximately needed RAM for 5000 query/second at a Solr machine? Hi Walter; Is there any document or something else says that worst case is three times of disk space? Twice times or three times. It is really different when we talk about GB's of disk spaces. 2013/4/10 Walter Underwood wun...@wunderwood.org Correct, except the worst case maximum for disk space is three times. --wunder On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote: You're mixing up disk and RAM requirements when you talk about having twice the disk size. Solr does _NOT_ require twice the index size of RAM to optimize, it requires twice the size on _DISK_. In terms of RAM requirements, you need to create an index, run realistic queries at the installation and measure. Best Erick On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote: On 4/9/2013 7:03 PM, Furkan KAMACI wrote: These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? What we have been discussing is the OS cache, which is memory that is not used by programs. The OS uses that memory to make everything run faster. The OS will instantly give that memory up if a program requests it. Solr is a java program, and java uses memory a little differently, so Solr most likely will NOT use more memory when it is available. In a normal directly executable program, memory can be allocated at any time, and given back to the system at any time. With Java, you tell it the maximum amount of memory the program is ever allowed to use. Because of how memory is used inside Java, most long-running Java programs (like Solr) will allocate up to the configured maximum even if they don't really need that much memory. Most Java virtual machines will never give the memory back to the system even if it is not required. Thanks, Shawn Furkan KAMACI furkankam...@gmail.com writes: I am sorry but you said: *you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use* I have made an assumption my indexes at my machine. Let's assume that it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up to how much I define it as a Java processes. When we think about the indexes at storage and caching them at RAM by OS, is that what you talk about: having more than 5 GB - or - 10 GB RAM for my machine? 2013/4/10 Shawn Heisey s...@elyograg.org 10 GB. Because when Solr shuffles the data around, it could use up to twice the size of the index in order to optimize the index on disk. -- Justin -- Walter
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
Thanks Walter, you guys gave me really nice ideas about RAM approximation. 2013/4/11 Walter Underwood wun...@wunderwood.org Here is the situation where merging can require 3X space. It can only happen if you force merge, then index with merging turned off, but we had Ultraseek customers do that. * All documents are merged into a single segment. * Without a merge, all documents are replaced. * This results in one segment of deleted documents and one of new documents (2X). * A merge takes place, creating a new segment of the same size, thus 3X. For normal operation, 2X is plenty of room. wunder On Apr 11, 2013, at 6:46 AM, Michael Ryan wrote: I've investigated this in the past. The worst case is 2*indexSize additional disk space (3*indexSize total) during an optimize. In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor of 10. We would see the worst case happen when there were exactly 20 segments (or some other multiple of 10, I believe) at the start of the optimize. IIRC, it would merge those 20 segments down to 2 segments, and then merge those 2 segments down to 1 segment. 1*indexSize space was used by the original index (because there is still a reader open on it), 1*indexSpace was used by the 2 segments, and 1*indexSize space was used by the 1 segment. This is the worst case because there are two full additional copies of the index on disk. Normally, when the number of segments is not a multiple of the mergeFactor, there will be some part of the index that was not part of both merges (and this part that is excluded usually would be the largest segments). We worked around this by doing multiple optimize passes, where the first pass merges down to between 2 and 2*mergeFactor-1 segments (based on a great tip from Lance Norskog on the mailing list a couple years ago). I'm not sure if the current merge policy implementations still have this issue. -Michael -Original Message- From: Furkan KAMACI [mailto:furkankam...@gmail.com] Sent: Thursday, April 11, 2013 2:44 AM To: solr-user@lucene.apache.org Subject: Re: Approximately needed RAM for 5000 query/second at a Solr machine? Hi Walter; Is there any document or something else says that worst case is three times of disk space? Twice times or three times. It is really different when we talk about GB's of disk spaces. 2013/4/10 Walter Underwood wun...@wunderwood.org Correct, except the worst case maximum for disk space is three times. --wunder On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote: You're mixing up disk and RAM requirements when you talk about having twice the disk size. Solr does _NOT_ require twice the index size of RAM to optimize, it requires twice the size on _DISK_. In terms of RAM requirements, you need to create an index, run realistic queries at the installation and measure. Best Erick On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote: On 4/9/2013 7:03 PM, Furkan KAMACI wrote: These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? What we have been discussing is the OS cache, which is memory that is not used by programs. The OS uses that memory to make everything run faster. The OS will instantly give that memory up if a program requests it. Solr is a java program, and java uses memory a little differently, so Solr most likely will NOT use more memory when it is available. In a normal directly executable program, memory can be allocated at any time, and given back to the system at any time. With Java, you tell it the maximum amount of memory the program is ever allowed to use. Because of how memory is used inside Java, most long-running Java programs (like Solr) will allocate up to the configured maximum even if they don't really need that much memory. Most Java virtual machines will never give the memory back to the system even if it is not required. Thanks, Shawn Furkan KAMACI furkankam...@gmail.com writes: I am sorry but you said: *you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use* I have made an assumption my indexes at my machine. Let's assume that it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up to how much I define it as a Java processes. When we think about the indexes at storage and caching them at RAM by OS, is that what you talk about: having more than 5 GB - or - 10 GB RAM for my machine? 2013/4/10 Shawn Heisey s...@elyograg.org 10 GB. Because when
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
Segments are on a per-field basis... so doesn't it depend on how many fields are merged in parallel? I mean, when most people say index size they are referring to all fields, collectively, not individual fields. I'm just wondering how number of processor cores might affect things (more cores might make the worst case scenario worse since it will maximize the amount of data processed at a given moment.) But, I suppose in the final analysis, it may all average out. It may not be exactly the worst case, but maybe close enough. And all of this depends on which merge policy you choose. With the default tiered merge policy things shouldn't be so bad as the 3x worst case. -- Jack Krupansky -Original Message- From: Walter Underwood Sent: Thursday, April 11, 2013 10:40 AM To: solr-user@lucene.apache.org Subject: Re: Approximately needed RAM for 5000 query/second at a Solr machine? Here is the situation where merging can require 3X space. It can only happen if you force merge, then index with merging turned off, but we had Ultraseek customers do that. * All documents are merged into a single segment. * Without a merge, all documents are replaced. * This results in one segment of deleted documents and one of new documents (2X). * A merge takes place, creating a new segment of the same size, thus 3X. For normal operation, 2X is plenty of room. wunder On Apr 11, 2013, at 6:46 AM, Michael Ryan wrote: I've investigated this in the past. The worst case is 2*indexSize additional disk space (3*indexSize total) during an optimize. In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor of 10. We would see the worst case happen when there were exactly 20 segments (or some other multiple of 10, I believe) at the start of the optimize. IIRC, it would merge those 20 segments down to 2 segments, and then merge those 2 segments down to 1 segment. 1*indexSize space was used by the original index (because there is still a reader open on it), 1*indexSpace was used by the 2 segments, and 1*indexSize space was used by the 1 segment. This is the worst case because there are two full additional copies of the index on disk. Normally, when the number of segments is not a multiple of the mergeFactor, there will be some part of the index that was not part of both merges (and this part that is excluded usually would be the largest segments). We worked around this by doing multiple optimize passes, where the first pass merges down to between 2 and 2*mergeFactor-1 segments (based on a great tip from Lance Norskog on the mailing list a couple years ago). I'm not sure if the current merge policy implementations still have this issue. -Michael -Original Message- From: Furkan KAMACI [mailto:furkankam...@gmail.com] Sent: Thursday, April 11, 2013 2:44 AM To: solr-user@lucene.apache.org Subject: Re: Approximately needed RAM for 5000 query/second at a Solr machine? Hi Walter; Is there any document or something else says that worst case is three times of disk space? Twice times or three times. It is really different when we talk about GB's of disk spaces. 2013/4/10 Walter Underwood wun...@wunderwood.org Correct, except the worst case maximum for disk space is three times. --wunder On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote: You're mixing up disk and RAM requirements when you talk about having twice the disk size. Solr does _NOT_ require twice the index size of RAM to optimize, it requires twice the size on _DISK_. In terms of RAM requirements, you need to create an index, run realistic queries at the installation and measure. Best Erick On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote: On 4/9/2013 7:03 PM, Furkan KAMACI wrote: These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? What we have been discussing is the OS cache, which is memory that is not used by programs. The OS uses that memory to make everything run faster. The OS will instantly give that memory up if a program requests it. Solr is a java program, and java uses memory a little differently, so Solr most likely will NOT use more memory when it is available. In a normal directly executable program, memory can be allocated at any time, and given back to the system at any time. With Java, you tell it the maximum amount of memory the program is ever allowed to use. Because of how memory is used inside Java, most long-running Java programs (like Solr) will allocate up to the configured maximum even if they don't really need that much memory. Most Java virtual machines will never give the memory back
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
On 4/11/2013 7:46 AM, Michael Ryan wrote: In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor of 10. We would see the worst case happen when there were exactly 20 segments (or some other multiple of 10, I believe) at the start of the optimize. IIRC, it would merge those 20 segments down to 2 segments, and then merge those 2 segments down to 1 segment. 1*indexSize space was used by the original index (because there is still a reader open on it), 1*indexSpace was used by the 2 segments, and 1*indexSize space was used by the 1 segment. This is the worst case because there are two full additional copies of the index on disk. Normally, when the number of segments is not a multiple of the mergeFactor, there will be some part of the index that was not part of both merges (and this part that is excluded usually would be the largest segments). We worked around this by doing multiple optimize passes, where the first pass merges down to between 2 and 2*mergeFactor-1 segments (based on a great tip from Lance Norskog on the mailing list a couple years ago). For optimizes that are taking multiple passes instead of just building one segment from the start, TieredMergePolicy offers the maxMergeAtOnceExplicit parameter. In most situations, setting this to three times the value of maxMergeAtOnce and maxSegmentsPerTier (which are usually set the same) will probably result in all optimizes completing in one pass. Thanks, Shawn
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
Thank you for your explanations, this will help me to figure out my system. 2013/4/10 Shawn Heisey s...@elyograg.org On 4/9/2013 9:12 PM, Furkan KAMACI wrote: I am sorry but you said: *you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use* I have made an assumption my indexes at my machine. Let's assume that it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up to how much I define it as a Java processes. When we think about the indexes at storage and caching them at RAM by OS, is that what you talk about: having more than 5 GB - or - 10 GB RAM for my machine? If your index is 5GB, and you give 3GB of RAM to the Solr JVM, then you would want at least 8GB of total RAM for that machine - the 3GB of RAM given to Solr, plus the rest so the OS can cache the index in RAM. If you plan for double the cache memory, you'd need 13 to 14GB. Thanks, Shawn
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
On 4/9/2013 7:03 PM, Furkan KAMACI wrote: These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? What we have been discussing is the OS cache, which is memory that is not used by programs. The OS uses that memory to make everything run faster. The OS will instantly give that memory up if a program requests it. Solr is a java program, and java uses memory a little differently, so Solr most likely will NOT use more memory when it is available. In a normal directly executable program, memory can be allocated at any time, and given back to the system at any time. With Java, you tell it the maximum amount of memory the program is ever allowed to use. Because of how memory is used inside Java, most long-running Java programs (like Solr) will allocate up to the configured maximum even if they don't really need that much memory. Most Java virtual machines will never give the memory back to the system even if it is not required. Thanks, Shawn Furkan KAMACI furkankam...@gmail.com writes: I am sorry but you said: *you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use* I have made an assumption my indexes at my machine. Let's assume that it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up to how much I define it as a Java processes. When we think about the indexes at storage and caching them at RAM by OS, is that what you talk about: having more than 5 GB - or - 10 GB RAM for my machine? 2013/4/10 Shawn Heisey s...@elyograg.org 10 GB. Because when Solr shuffles the data around, it could use up to twice the size of the index in order to optimize the index on disk. -- Justin
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
You're mixing up disk and RAM requirements when you talk about having twice the disk size. Solr does _NOT_ require twice the index size of RAM to optimize, it requires twice the size on _DISK_. In terms of RAM requirements, you need to create an index, run realistic queries at the installation and measure. Best Erick On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote: On 4/9/2013 7:03 PM, Furkan KAMACI wrote: These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? What we have been discussing is the OS cache, which is memory that is not used by programs. The OS uses that memory to make everything run faster. The OS will instantly give that memory up if a program requests it. Solr is a java program, and java uses memory a little differently, so Solr most likely will NOT use more memory when it is available. In a normal directly executable program, memory can be allocated at any time, and given back to the system at any time. With Java, you tell it the maximum amount of memory the program is ever allowed to use. Because of how memory is used inside Java, most long-running Java programs (like Solr) will allocate up to the configured maximum even if they don't really need that much memory. Most Java virtual machines will never give the memory back to the system even if it is not required. Thanks, Shawn Furkan KAMACI furkankam...@gmail.com writes: I am sorry but you said: *you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use* I have made an assumption my indexes at my machine. Let's assume that it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up to how much I define it as a Java processes. When we think about the indexes at storage and caching them at RAM by OS, is that what you talk about: having more than 5 GB - or - 10 GB RAM for my machine? 2013/4/10 Shawn Heisey s...@elyograg.org 10 GB. Because when Solr shuffles the data around, it could use up to twice the size of the index in order to optimize the index on disk. -- Justin
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
Correct, except the worst case maximum for disk space is three times. --wunder On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote: You're mixing up disk and RAM requirements when you talk about having twice the disk size. Solr does _NOT_ require twice the index size of RAM to optimize, it requires twice the size on _DISK_. In terms of RAM requirements, you need to create an index, run realistic queries at the installation and measure. Best Erick On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote: On 4/9/2013 7:03 PM, Furkan KAMACI wrote: These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? What we have been discussing is the OS cache, which is memory that is not used by programs. The OS uses that memory to make everything run faster. The OS will instantly give that memory up if a program requests it. Solr is a java program, and java uses memory a little differently, so Solr most likely will NOT use more memory when it is available. In a normal directly executable program, memory can be allocated at any time, and given back to the system at any time. With Java, you tell it the maximum amount of memory the program is ever allowed to use. Because of how memory is used inside Java, most long-running Java programs (like Solr) will allocate up to the configured maximum even if they don't really need that much memory. Most Java virtual machines will never give the memory back to the system even if it is not required. Thanks, Shawn Furkan KAMACI furkankam...@gmail.com writes: I am sorry but you said: *you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use* I have made an assumption my indexes at my machine. Let's assume that it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up to how much I define it as a Java processes. When we think about the indexes at storage and caching them at RAM by OS, is that what you talk about: having more than 5 GB - or - 10 GB RAM for my machine? 2013/4/10 Shawn Heisey s...@elyograg.org 10 GB. Because when Solr shuffles the data around, it could use up to twice the size of the index in order to optimize the index on disk. -- Justin -- Walter Underwood wun...@wunderwood.org
Approximately needed RAM for 5000 query/second at a Solr machine?
Are there anybody who can help me about how to guess the approximately needed RAM for 5000 query/second at a Solr machine?
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
It all depends on the nature of your query and the nature of the data in the index. Does returning results from a result cache count in your QPS? Not to mention how many cores and CPU speed and CPU caching as well. Not to mention network latency. The best way to answer is to do a proof of concept implementation and measure it yourself. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Tuesday, April 09, 2013 6:06 PM To: solr-user@lucene.apache.org Subject: Approximately needed RAM for 5000 query/second at a Solr machine? Are there anybody who can help me about how to guess the approximately needed RAM for 5000 query/second at a Solr machine?
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
Actually I will propose a system and I should figure out about machine specifications. There will be no faceting mechanism at first, just simple search queries of a web search engine. We can think that I will have a commodity server (I don't know is there any benchmark for a usual Solr machine) 2013/4/10 Jack Krupansky j...@basetechnology.com It all depends on the nature of your query and the nature of the data in the index. Does returning results from a result cache count in your QPS? Not to mention how many cores and CPU speed and CPU caching as well. Not to mention network latency. The best way to answer is to do a proof of concept implementation and measure it yourself. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Tuesday, April 09, 2013 6:06 PM To: solr-user@lucene.apache.org Subject: Approximately needed RAM for 5000 query/second at a Solr machine? Are there anybody who can help me about how to guess the approximately needed RAM for 5000 query/second at a Solr machine?
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
On Apr 9, 2013, at 3:06 PM, Furkan KAMACI wrote: Are there anybody who can help me about how to guess the approximately needed RAM for 5000 query/second at a Solr machine? No. That depends on the kind of queries you have, the size and content of the index, the required response time, how frequently the index is updated, and many more factors. So anyone who can guess that is wrong. You can only find that out by running your own benchmarks with your own queries against your own index. In our system, we can meet our response time requirements at a rate of 4000 queries/minute. We have several cores, but most traffic goes to a 3M document index. This index is small documents, mostly titles and authors of books. We have no wildcard queries and less than 5% of our queries use fuzzy matching. We update once per day and have cache hit rates of around 30%. We run new benchmarks twice each year, before our busy seasons. We use the current index and configuration and the queries from the busiest day of the previous season. Our key benchmark is the 95th percentile response time, but we also measure median, 90th, and 99th percentile. We are currently on Solr 3.3 with some customizations. We're working on transitioning to Solr 4. wunder -- Walter Underwood wun...@wunderwood.org
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
Hi Walter; Firstly thank for your detailed reply. I know that this is not a well detailed question but I don't have any metrics yet. If we talk about your system, what is the average RAM size of your Solr machines? Maybe that can help me to make a comparison. 2013/4/10 Walter Underwood wun...@wunderwood.org On Apr 9, 2013, at 3:06 PM, Furkan KAMACI wrote: Are there anybody who can help me about how to guess the approximately needed RAM for 5000 query/second at a Solr machine? No. That depends on the kind of queries you have, the size and content of the index, the required response time, how frequently the index is updated, and many more factors. So anyone who can guess that is wrong. You can only find that out by running your own benchmarks with your own queries against your own index. In our system, we can meet our response time requirements at a rate of 4000 queries/minute. We have several cores, but most traffic goes to a 3M document index. This index is small documents, mostly titles and authors of books. We have no wildcard queries and less than 5% of our queries use fuzzy matching. We update once per day and have cache hit rates of around 30%. We run new benchmarks twice each year, before our busy seasons. We use the current index and configuration and the queries from the busiest day of the previous season. Our key benchmark is the 95th percentile response time, but we also measure median, 90th, and 99th percentile. We are currently on Solr 3.3 with some customizations. We're working on transitioning to Solr 4. wunder -- Walter Underwood wun...@wunderwood.org
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
We are using Amazon EC2 M1 Extra Large instances (m1.xlarge). http://aws.amazon.com/ec2/instance-types/ wunder On Apr 9, 2013, at 3:35 PM, Furkan KAMACI wrote: Hi Walter; Firstly thank for your detailed reply. I know that this is not a well detailed question but I don't have any metrics yet. If we talk about your system, what is the average RAM size of your Solr machines? Maybe that can help me to make a comparison. 2013/4/10 Walter Underwood wun...@wunderwood.org On Apr 9, 2013, at 3:06 PM, Furkan KAMACI wrote: Are there anybody who can help me about how to guess the approximately needed RAM for 5000 query/second at a Solr machine? No. That depends on the kind of queries you have, the size and content of the index, the required response time, how frequently the index is updated, and many more factors. So anyone who can guess that is wrong. You can only find that out by running your own benchmarks with your own queries against your own index. In our system, we can meet our response time requirements at a rate of 4000 queries/minute. We have several cores, but most traffic goes to a 3M document index. This index is small documents, mostly titles and authors of books. We have no wildcard queries and less than 5% of our queries use fuzzy matching. We update once per day and have cache hit rates of around 30%. We run new benchmarks twice each year, before our busy seasons. We use the current index and configuration and the queries from the busiest day of the previous season. Our key benchmark is the 95th percentile response time, but we also measure median, 90th, and 99th percentile. We are currently on Solr 3.3 with some customizations. We're working on transitioning to Solr 4. wunder -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
Thanks for your answer. 2013/4/10 Walter Underwood wun...@wunderwood.org We are using Amazon EC2 M1 Extra Large instances (m1.xlarge). http://aws.amazon.com/ec2/instance-types/ wunder On Apr 9, 2013, at 3:35 PM, Furkan KAMACI wrote: Hi Walter; Firstly thank for your detailed reply. I know that this is not a well detailed question but I don't have any metrics yet. If we talk about your system, what is the average RAM size of your Solr machines? Maybe that can help me to make a comparison. 2013/4/10 Walter Underwood wun...@wunderwood.org On Apr 9, 2013, at 3:06 PM, Furkan KAMACI wrote: Are there anybody who can help me about how to guess the approximately needed RAM for 5000 query/second at a Solr machine? No. That depends on the kind of queries you have, the size and content of the index, the required response time, how frequently the index is updated, and many more factors. So anyone who can guess that is wrong. You can only find that out by running your own benchmarks with your own queries against your own index. In our system, we can meet our response time requirements at a rate of 4000 queries/minute. We have several cores, but most traffic goes to a 3M document index. This index is small documents, mostly titles and authors of books. We have no wildcard queries and less than 5% of our queries use fuzzy matching. We update once per day and have cache hit rates of around 30%. We run new benchmarks twice each year, before our busy seasons. We use the current index and configuration and the queries from the busiest day of the previous season. Our key benchmark is the 95th percentile response time, but we also measure median, 90th, and 99th percentile. We are currently on Solr 3.3 with some customizations. We're working on transitioning to Solr 4. wunder -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
On 4/9/2013 4:06 PM, Furkan KAMACI wrote: Are there anybody who can help me about how to guess the approximately needed RAM for 5000 query/second at a Solr machine? You've already gotten some good replies, and I'm aware that they haven't really answered your question. This is the kind of question that cannot be answered. The amount of RAM that you'll need for extreme performance actually isn't hard to figure out - you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use. Normally this will be twice the size of all the indexes on the machine, because that's how much disk space will likely be used in a worst-case merge scenario (optimize). That's very expensive, so it is cheaper to budget for only the size of the index. A load of 5000 queries per second is pretty high, and probably something you will not achieve with a single-server (not counting backup) approach. All of the tricks that high-volume website developers use are also applicable to Solr. Once you have enough RAM, you need to worry more about the number of servers, the number of CPU cores in each server, and the speed of those CPU cores. Testing with actual production queries is the only way to find out what you really need. Beyond hardware design, making the requests as simple as possible and taking advantage of caches is important. Solr has caches for queries, filters, and documents. You can also put a caching proxy (something like Varnish) in front of Solr, but that would make NRT updates pretty much impossible, and that kind of caching can be difficult to get working right. Thanks, Shawn
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? 2013/4/10 Shawn Heisey s...@elyograg.org On 4/9/2013 4:06 PM, Furkan KAMACI wrote: Are there anybody who can help me about how to guess the approximately needed RAM for 5000 query/second at a Solr machine? You've already gotten some good replies, and I'm aware that they haven't really answered your question. This is the kind of question that cannot be answered. The amount of RAM that you'll need for extreme performance actually isn't hard to figure out - you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use. Normally this will be twice the size of all the indexes on the machine, because that's how much disk space will likely be used in a worst-case merge scenario (optimize). That's very expensive, so it is cheaper to budget for only the size of the index. A load of 5000 queries per second is pretty high, and probably something you will not achieve with a single-server (not counting backup) approach. All of the tricks that high-volume website developers use are also applicable to Solr. Once you have enough RAM, you need to worry more about the number of servers, the number of CPU cores in each server, and the speed of those CPU cores. Testing with actual production queries is the only way to find out what you really need. Beyond hardware design, making the requests as simple as possible and taking advantage of caches is important. Solr has caches for queries, filters, and documents. You can also put a caching proxy (something like Varnish) in front of Solr, but that would make NRT updates pretty much impossible, and that kind of caching can be difficult to get working right. Thanks, Shawn
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
On 4/9/2013 7:03 PM, Furkan KAMACI wrote: These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? What we have been discussing is the OS cache, which is memory that is not used by programs. The OS uses that memory to make everything run faster. The OS will instantly give that memory up if a program requests it. Solr is a java program, and java uses memory a little differently, so Solr most likely will NOT use more memory when it is available. In a normal directly executable program, memory can be allocated at any time, and given back to the system at any time. With Java, you tell it the maximum amount of memory the program is ever allowed to use. Because of how memory is used inside Java, most long-running Java programs (like Solr) will allocate up to the configured maximum even if they don't really need that much memory. Most Java virtual machines will never give the memory back to the system even if it is not required. Thanks, Shawn
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
I am sorry but you said: *you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use* I have made an assumption my indexes at my machine. Let's assume that it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up to how much I define it as a Java processes. When we think about the indexes at storage and caching them at RAM by OS, is that what you talk about: having more than 5 GB - or - 10 GB RAM for my machine? 2013/4/10 Shawn Heisey s...@elyograg.org On 4/9/2013 7:03 PM, Furkan KAMACI wrote: These are really good metrics for me: You say that RAM size should be at least index size, and it is better to have a RAM size twice the index size (because of worst case scenario). On the other hand let's assume that I have a RAM size that is bigger than twice of indexes at machine. Can Solr use that extra RAM or is it a approximately maximum limit (to have twice size of indexes at machine)? What we have been discussing is the OS cache, which is memory that is not used by programs. The OS uses that memory to make everything run faster. The OS will instantly give that memory up if a program requests it. Solr is a java program, and java uses memory a little differently, so Solr most likely will NOT use more memory when it is available. In a normal directly executable program, memory can be allocated at any time, and given back to the system at any time. With Java, you tell it the maximum amount of memory the program is ever allowed to use. Because of how memory is used inside Java, most long-running Java programs (like Solr) will allocate up to the configured maximum even if they don't really need that much memory. Most Java virtual machines will never give the memory back to the system even if it is not required. Thanks, Shawn
Re: Approximately needed RAM for 5000 query/second at a Solr machine?
On 4/9/2013 9:12 PM, Furkan KAMACI wrote: I am sorry but you said: *you need enough free RAM for the OS to cache the maximum amount of disk space all your indexes will ever use* I have made an assumption my indexes at my machine. Let's assume that it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up to how much I define it as a Java processes. When we think about the indexes at storage and caching them at RAM by OS, is that what you talk about: having more than 5 GB - or - 10 GB RAM for my machine? If your index is 5GB, and you give 3GB of RAM to the Solr JVM, then you would want at least 8GB of total RAM for that machine - the 3GB of RAM given to Solr, plus the rest so the OS can cache the index in RAM. If you plan for double the cache memory, you'd need 13 to 14GB. Thanks, Shawn