Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-13 Thread Erick Erickson
bq: disk space is three times

True, I keep forgetting about compound since I never use it...

On Wed, Apr 10, 2013 at 11:05 AM, Walter Underwood
wun...@wunderwood.org wrote:
 Correct, except the worst case maximum for disk space is three times. --wunder

 On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote:

 You're mixing up disk and RAM requirements when you talk
 about having twice the disk size. Solr does _NOT_ require
 twice the index size of RAM to optimize, it requires twice
 the size on _DISK_.

 In terms of RAM requirements, you need to create an index,
 run realistic queries at the installation and measure.

 Best
 Erick

 On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote:



 On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
 These are really good metrics for me:
 You say that RAM size should be at least index size, and it is
 better to have a RAM size twice the index size (because of worst
 case scenario).
 On the other hand let's assume that I have a RAM size that is
 bigger than twice of indexes at machine. Can Solr use that extra
 RAM or is it a approximately maximum limit (to have twice size of
 indexes at machine)?
 What we have been discussing is the OS cache, which is memory that
 is not used by programs.  The OS uses that memory to make everything
 run faster.  The OS will instantly give that memory up if a program
 requests it.
 Solr is a java program, and java uses memory a little differently,
 so Solr most likely will NOT use more memory when it is available.
 In a normal directly executable program, memory can be allocated
 at any time, and given back to the system at any time.
 With Java, you tell it the maximum amount of memory the program is
 ever allowed to use.  Because of how memory is used inside Java,
 most long-running Java programs (like Solr) will allocate up to the
 configured maximum even if they don't really need that much memory.
 Most Java virtual machines will never give the memory back to the
 system even if it is not required.
 Thanks, Shawn


 Furkan KAMACI furkankam...@gmail.com writes:

 I am sorry but you said:

 *you need enough free RAM for the OS to cache the maximum amount of
 disk space all your indexes will ever use*

 I have made an assumption my indexes at my machine. Let's assume that
 it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will
 use RAM up to how much I define it as a Java processes. When we think
 about the indexes at storage and caching them at RAM by OS, is that
 what you talk about: having more than 5 GB - or - 10 GB RAM for my
 machine?

 2013/4/10 Shawn Heisey s...@elyograg.org


 10 GB.  Because when Solr shuffles the data around, it could use up to
 twice the size of the index in order to optimize the index on disk.

 -- Justin

 --
 Walter Underwood
 wun...@wunderwood.org





Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-13 Thread Furkan KAMACI
Hi Jack;

Due to I am new to Solr, can you explain this two things that you said:

1) when most people say index size they are referring to all fields,
collectively, not individual fields (what do you mean with Segments are on
a per-field basis  and all fields, individual fields.)
2) more cores might make the worst case scenario worse since it will
maximize the amount of data processed at a given moment


2013/4/13 Erick Erickson erickerick...@gmail.com

 bq: disk space is three times

 True, I keep forgetting about compound since I never use it...

 On Wed, Apr 10, 2013 at 11:05 AM, Walter Underwood
 wun...@wunderwood.org wrote:
  Correct, except the worst case maximum for disk space is three times.
 --wunder
 
  On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote:
 
  You're mixing up disk and RAM requirements when you talk
  about having twice the disk size. Solr does _NOT_ require
  twice the index size of RAM to optimize, it requires twice
  the size on _DISK_.
 
  In terms of RAM requirements, you need to create an index,
  run realistic queries at the installation and measure.
 
  Best
  Erick
 
  On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote:
 
 
 
  On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
  These are really good metrics for me:
  You say that RAM size should be at least index size, and it is
  better to have a RAM size twice the index size (because of worst
  case scenario).
  On the other hand let's assume that I have a RAM size that is
  bigger than twice of indexes at machine. Can Solr use that extra
  RAM or is it a approximately maximum limit (to have twice size of
  indexes at machine)?
  What we have been discussing is the OS cache, which is memory that
  is not used by programs.  The OS uses that memory to make everything
  run faster.  The OS will instantly give that memory up if a program
  requests it.
  Solr is a java program, and java uses memory a little differently,
  so Solr most likely will NOT use more memory when it is available.
  In a normal directly executable program, memory can be allocated
  at any time, and given back to the system at any time.
  With Java, you tell it the maximum amount of memory the program is
  ever allowed to use.  Because of how memory is used inside Java,
  most long-running Java programs (like Solr) will allocate up to the
  configured maximum even if they don't really need that much memory.
  Most Java virtual machines will never give the memory back to the
  system even if it is not required.
  Thanks, Shawn
 
 
  Furkan KAMACI furkankam...@gmail.com writes:
 
  I am sorry but you said:
 
  *you need enough free RAM for the OS to cache the maximum amount of
  disk space all your indexes will ever use*
 
  I have made an assumption my indexes at my machine. Let's assume that
  it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will
  use RAM up to how much I define it as a Java processes. When we think
  about the indexes at storage and caching them at RAM by OS, is that
  what you talk about: having more than 5 GB - or - 10 GB RAM for my
  machine?
 
  2013/4/10 Shawn Heisey s...@elyograg.org
 
 
  10 GB.  Because when Solr shuffles the data around, it could use up to
  twice the size of the index in order to optimize the index on disk.
 
  -- Justin
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 



Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Furkan KAMACI
Hi Walter;

Is there any document or something else says that worst case is three times
of disk space? Twice times or three times. It is really different when we
talk about GB's of disk spaces.


2013/4/10 Walter Underwood wun...@wunderwood.org

 Correct, except the worst case maximum for disk space is three times.
 --wunder

 On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote:

  You're mixing up disk and RAM requirements when you talk
  about having twice the disk size. Solr does _NOT_ require
  twice the index size of RAM to optimize, it requires twice
  the size on _DISK_.
 
  In terms of RAM requirements, you need to create an index,
  run realistic queries at the installation and measure.
 
  Best
  Erick
 
  On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote:
 
 
 
  On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
  These are really good metrics for me:
  You say that RAM size should be at least index size, and it is
  better to have a RAM size twice the index size (because of worst
  case scenario).
  On the other hand let's assume that I have a RAM size that is
  bigger than twice of indexes at machine. Can Solr use that extra
  RAM or is it a approximately maximum limit (to have twice size of
  indexes at machine)?
  What we have been discussing is the OS cache, which is memory that
  is not used by programs.  The OS uses that memory to make everything
  run faster.  The OS will instantly give that memory up if a program
  requests it.
  Solr is a java program, and java uses memory a little differently,
  so Solr most likely will NOT use more memory when it is available.
  In a normal directly executable program, memory can be allocated
  at any time, and given back to the system at any time.
  With Java, you tell it the maximum amount of memory the program is
  ever allowed to use.  Because of how memory is used inside Java,
  most long-running Java programs (like Solr) will allocate up to the
  configured maximum even if they don't really need that much memory.
  Most Java virtual machines will never give the memory back to the
  system even if it is not required.
  Thanks, Shawn
 
 
  Furkan KAMACI furkankam...@gmail.com writes:
 
  I am sorry but you said:
 
  *you need enough free RAM for the OS to cache the maximum amount of
  disk space all your indexes will ever use*
 
  I have made an assumption my indexes at my machine. Let's assume that
  it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will
  use RAM up to how much I define it as a Java processes. When we think
  about the indexes at storage and caching them at RAM by OS, is that
  what you talk about: having more than 5 GB - or - 10 GB RAM for my
  machine?
 
  2013/4/10 Shawn Heisey s...@elyograg.org
 
 
  10 GB.  Because when Solr shuffles the data around, it could use up to
  twice the size of the index in order to optimize the index on disk.
 
  -- Justin

 --
 Walter Underwood
 wun...@wunderwood.org






RE: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Michael Ryan
I've investigated this in the past. The worst case is 2*indexSize additional 
disk space (3*indexSize total) during an optimize.

In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor of 
10. We would see the worst case happen when there were exactly 20 segments (or 
some other multiple of 10, I believe) at the start of the optimize. IIRC, it 
would merge those 20 segments down to 2 segments, and then merge those 2 
segments down to 1 segment. 1*indexSize space was used by the original index 
(because there is still a reader open on it), 1*indexSpace was used by the 2 
segments, and 1*indexSize space was used by the 1 segment. This is the worst 
case because there are two full additional copies of the index on disk. 
Normally, when the number of segments is not a multiple of the mergeFactor, 
there will be some part of the index that was not part of both merges (and this 
part that is excluded usually would be the largest segments).

We worked around this by doing multiple optimize passes, where the first pass 
merges down to between 2 and 2*mergeFactor-1 segments (based on a great tip 
from Lance Norskog on the mailing list a couple years ago).

I'm not sure if the current merge policy implementations still have this issue.

-Michael

-Original Message-
From: Furkan KAMACI [mailto:furkankam...@gmail.com] 
Sent: Thursday, April 11, 2013 2:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Approximately needed RAM for 5000 query/second at a Solr machine?

Hi Walter;

Is there any document or something else says that worst case is three times of 
disk space? Twice times or three times. It is really different when we talk 
about GB's of disk spaces.


2013/4/10 Walter Underwood wun...@wunderwood.org

 Correct, except the worst case maximum for disk space is three times.
 --wunder

 On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote:

  You're mixing up disk and RAM requirements when you talk about 
  having twice the disk size. Solr does _NOT_ require twice the index 
  size of RAM to optimize, it requires twice the size on _DISK_.
 
  In terms of RAM requirements, you need to create an index, run 
  realistic queries at the installation and measure.
 
  Best
  Erick
 
  On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote:
 
 
 
  On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
  These are really good metrics for me:
  You say that RAM size should be at least index size, and it is 
  better to have a RAM size twice the index size (because of worst 
  case scenario).
  On the other hand let's assume that I have a RAM size that is 
  bigger than twice of indexes at machine. Can Solr use that extra 
  RAM or is it a approximately maximum limit (to have twice size 
  of indexes at machine)?
  What we have been discussing is the OS cache, which is memory 
  that is not used by programs.  The OS uses that memory to make 
  everything run faster.  The OS will instantly give that memory up 
  if a program requests it.
  Solr is a java program, and java uses memory a little 
  differently, so Solr most likely will NOT use more memory when it is 
  available.
  In a normal directly executable program, memory can be 
  allocated at any time, and given back to the system at any time.
  With Java, you tell it the maximum amount of memory the program 
  is ever allowed to use.  Because of how memory is used inside 
  Java, most long-running Java programs (like Solr) will allocate 
  up to the configured maximum even if they don't really need that much 
  memory.
  Most Java virtual machines will never give the memory back to the 
  system even if it is not required.
  Thanks, Shawn
 
 
  Furkan KAMACI furkankam...@gmail.com writes:
 
  I am sorry but you said:
 
  *you need enough free RAM for the OS to cache the maximum amount 
  of disk space all your indexes will ever use*
 
  I have made an assumption my indexes at my machine. Let's assume 
  that it is 5 GB. So it is better to have at least 5 GB RAM? OK, 
  Solr will use RAM up to how much I define it as a Java processes. 
  When we think about the indexes at storage and caching them at RAM 
  by OS, is that what you talk about: having more than 5 GB - or - 
  10 GB RAM for my machine?
 
  2013/4/10 Shawn Heisey s...@elyograg.org
 
 
  10 GB.  Because when Solr shuffles the data around, it could use up 
  to twice the size of the index in order to optimize the index on disk.
 
  -- Justin

 --
 Walter Underwood
 wun...@wunderwood.org






Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Walter Underwood
Here is the situation where merging can require 3X space. It can only happen if 
you force merge, then index with merging turned off, but we had Ultraseek 
customers do that.

* All documents are merged into a single segment.
* Without a merge, all documents are replaced.
* This results in one segment of deleted documents and one of new documents 
(2X).
* A merge takes place, creating a new segment of the same size, thus 3X.

For normal operation, 2X is plenty of room.

wunder

On Apr 11, 2013, at 6:46 AM, Michael Ryan wrote:

 I've investigated this in the past. The worst case is 2*indexSize additional 
 disk space (3*indexSize total) during an optimize.
 
 In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor 
 of 10. We would see the worst case happen when there were exactly 20 segments 
 (or some other multiple of 10, I believe) at the start of the optimize. IIRC, 
 it would merge those 20 segments down to 2 segments, and then merge those 2 
 segments down to 1 segment. 1*indexSize space was used by the original index 
 (because there is still a reader open on it), 1*indexSpace was used by the 2 
 segments, and 1*indexSize space was used by the 1 segment. This is the worst 
 case because there are two full additional copies of the index on disk. 
 Normally, when the number of segments is not a multiple of the mergeFactor, 
 there will be some part of the index that was not part of both merges (and 
 this part that is excluded usually would be the largest segments).
 
 We worked around this by doing multiple optimize passes, where the first pass 
 merges down to between 2 and 2*mergeFactor-1 segments (based on a great tip 
 from Lance Norskog on the mailing list a couple years ago).
 
 I'm not sure if the current merge policy implementations still have this 
 issue.
 
 -Michael
 
 -Original Message-
 From: Furkan KAMACI [mailto:furkankam...@gmail.com] 
 Sent: Thursday, April 11, 2013 2:44 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Approximately needed RAM for 5000 query/second at a Solr machine?
 
 Hi Walter;
 
 Is there any document or something else says that worst case is three times 
 of disk space? Twice times or three times. It is really different when we 
 talk about GB's of disk spaces.
 
 
 2013/4/10 Walter Underwood wun...@wunderwood.org
 
 Correct, except the worst case maximum for disk space is three times.
 --wunder
 
 On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote:
 
 You're mixing up disk and RAM requirements when you talk about 
 having twice the disk size. Solr does _NOT_ require twice the index 
 size of RAM to optimize, it requires twice the size on _DISK_.
 
 In terms of RAM requirements, you need to create an index, run 
 realistic queries at the installation and measure.
 
 Best
 Erick
 
 On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote:
 
 
 
 On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
 These are really good metrics for me:
 You say that RAM size should be at least index size, and it is 
 better to have a RAM size twice the index size (because of worst 
 case scenario).
 On the other hand let's assume that I have a RAM size that is 
 bigger than twice of indexes at machine. Can Solr use that extra 
 RAM or is it a approximately maximum limit (to have twice size 
 of indexes at machine)?
 What we have been discussing is the OS cache, which is memory 
 that is not used by programs.  The OS uses that memory to make 
 everything run faster.  The OS will instantly give that memory up 
 if a program requests it.
 Solr is a java program, and java uses memory a little 
 differently, so Solr most likely will NOT use more memory when it is 
 available.
 In a normal directly executable program, memory can be 
 allocated at any time, and given back to the system at any time.
 With Java, you tell it the maximum amount of memory the program 
 is ever allowed to use.  Because of how memory is used inside 
 Java, most long-running Java programs (like Solr) will allocate 
 up to the configured maximum even if they don't really need that much 
 memory.
 Most Java virtual machines will never give the memory back to the 
 system even if it is not required.
 Thanks, Shawn
 
 
 Furkan KAMACI furkankam...@gmail.com writes:
 
 I am sorry but you said:
 
 *you need enough free RAM for the OS to cache the maximum amount 
 of disk space all your indexes will ever use*
 
 I have made an assumption my indexes at my machine. Let's assume 
 that it is 5 GB. So it is better to have at least 5 GB RAM? OK, 
 Solr will use RAM up to how much I define it as a Java processes. 
 When we think about the indexes at storage and caching them at RAM 
 by OS, is that what you talk about: having more than 5 GB - or - 
 10 GB RAM for my machine?
 
 2013/4/10 Shawn Heisey s...@elyograg.org
 
 
 10 GB.  Because when Solr shuffles the data around, it could use up 
 to twice the size of the index in order to optimize the index on disk.
 
 -- Justin
 
 --
 Walter

Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Furkan KAMACI
Thanks Walter, you guys gave me really nice ideas about RAM approximation.

2013/4/11 Walter Underwood wun...@wunderwood.org

 Here is the situation where merging can require 3X space. It can only
 happen if you force merge, then index with merging turned off, but we had
 Ultraseek customers do that.

 * All documents are merged into a single segment.
 * Without a merge, all documents are replaced.
 * This results in one segment of deleted documents and one of new
 documents (2X).
 * A merge takes place, creating a new segment of the same size, thus 3X.

 For normal operation, 2X is plenty of room.

 wunder

 On Apr 11, 2013, at 6:46 AM, Michael Ryan wrote:

  I've investigated this in the past. The worst case is 2*indexSize
 additional disk space (3*indexSize total) during an optimize.
 
  In our system, we use LogByteSizeMergePolicy, and used to have a
 mergeFactor of 10. We would see the worst case happen when there were
 exactly 20 segments (or some other multiple of 10, I believe) at the start
 of the optimize. IIRC, it would merge those 20 segments down to 2 segments,
 and then merge those 2 segments down to 1 segment. 1*indexSize space was
 used by the original index (because there is still a reader open on it),
 1*indexSpace was used by the 2 segments, and 1*indexSize space was used by
 the 1 segment. This is the worst case because there are two full additional
 copies of the index on disk. Normally, when the number of segments is not a
 multiple of the mergeFactor, there will be some part of the index that was
 not part of both merges (and this part that is excluded usually would be
 the largest segments).
 
  We worked around this by doing multiple optimize passes, where the first
 pass merges down to between 2 and 2*mergeFactor-1 segments (based on a
 great tip from Lance Norskog on the mailing list a couple years ago).
 
  I'm not sure if the current merge policy implementations still have this
 issue.
 
  -Michael
 
  -Original Message-
  From: Furkan KAMACI [mailto:furkankam...@gmail.com]
  Sent: Thursday, April 11, 2013 2:44 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Approximately needed RAM for 5000 query/second at a Solr
 machine?
 
  Hi Walter;
 
  Is there any document or something else says that worst case is three
 times of disk space? Twice times or three times. It is really different
 when we talk about GB's of disk spaces.
 
 
  2013/4/10 Walter Underwood wun...@wunderwood.org
 
  Correct, except the worst case maximum for disk space is three times.
  --wunder
 
  On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote:
 
  You're mixing up disk and RAM requirements when you talk about
  having twice the disk size. Solr does _NOT_ require twice the index
  size of RAM to optimize, it requires twice the size on _DISK_.
 
  In terms of RAM requirements, you need to create an index, run
  realistic queries at the installation and measure.
 
  Best
  Erick
 
  On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es
 wrote:
 
 
 
  On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
  These are really good metrics for me:
  You say that RAM size should be at least index size, and it is
  better to have a RAM size twice the index size (because of worst
  case scenario).
  On the other hand let's assume that I have a RAM size that is
  bigger than twice of indexes at machine. Can Solr use that extra
  RAM or is it a approximately maximum limit (to have twice size
  of indexes at machine)?
  What we have been discussing is the OS cache, which is memory
  that is not used by programs.  The OS uses that memory to make
  everything run faster.  The OS will instantly give that memory up
  if a program requests it.
  Solr is a java program, and java uses memory a little
  differently, so Solr most likely will NOT use more memory when it
 is available.
  In a normal directly executable program, memory can be
  allocated at any time, and given back to the system at any time.
  With Java, you tell it the maximum amount of memory the program
  is ever allowed to use.  Because of how memory is used inside
  Java, most long-running Java programs (like Solr) will allocate
  up to the configured maximum even if they don't really need that
 much memory.
  Most Java virtual machines will never give the memory back to the
  system even if it is not required.
  Thanks, Shawn
 
 
  Furkan KAMACI furkankam...@gmail.com writes:
 
  I am sorry but you said:
 
  *you need enough free RAM for the OS to cache the maximum amount
  of disk space all your indexes will ever use*
 
  I have made an assumption my indexes at my machine. Let's assume
  that it is 5 GB. So it is better to have at least 5 GB RAM? OK,
  Solr will use RAM up to how much I define it as a Java processes.
  When we think about the indexes at storage and caching them at RAM
  by OS, is that what you talk about: having more than 5 GB - or -
  10 GB RAM for my machine?
 
  2013/4/10 Shawn Heisey s...@elyograg.org
 
 
  10 GB.  Because when

Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Jack Krupansky
Segments are on a per-field basis... so doesn't it depend on how many fields 
are merged in parallel? I mean, when most people say index size they are 
referring to all fields, collectively, not individual fields. I'm just 
wondering how number of processor cores might affect things (more cores 
might make the worst case scenario worse since it will maximize the amount 
of data processed at a given moment.)


But, I suppose in the final analysis, it may all average out. It may not be 
exactly the worst case, but maybe close enough.


And all of this depends on which merge policy you choose. With the default 
tiered merge policy things shouldn't be so bad as the 3x worst case.


-- Jack Krupansky

-Original Message- 
From: Walter Underwood

Sent: Thursday, April 11, 2013 10:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Approximately needed RAM for 5000 query/second at a Solr 
machine?


Here is the situation where merging can require 3X space. It can only happen 
if you force merge, then index with merging turned off, but we had Ultraseek 
customers do that.


* All documents are merged into a single segment.
* Without a merge, all documents are replaced.
* This results in one segment of deleted documents and one of new documents 
(2X).

* A merge takes place, creating a new segment of the same size, thus 3X.

For normal operation, 2X is plenty of room.

wunder

On Apr 11, 2013, at 6:46 AM, Michael Ryan wrote:

I've investigated this in the past. The worst case is 2*indexSize 
additional disk space (3*indexSize total) during an optimize.


In our system, we use LogByteSizeMergePolicy, and used to have a 
mergeFactor of 10. We would see the worst case happen when there were 
exactly 20 segments (or some other multiple of 10, I believe) at the start 
of the optimize. IIRC, it would merge those 20 segments down to 2 
segments, and then merge those 2 segments down to 1 segment. 1*indexSize 
space was used by the original index (because there is still a reader open 
on it), 1*indexSpace was used by the 2 segments, and 1*indexSize space was 
used by the 1 segment. This is the worst case because there are two full 
additional copies of the index on disk. Normally, when the number of 
segments is not a multiple of the mergeFactor, there will be some part of 
the index that was not part of both merges (and this part that is excluded 
usually would be the largest segments).


We worked around this by doing multiple optimize passes, where the first 
pass merges down to between 2 and 2*mergeFactor-1 segments (based on a 
great tip from Lance Norskog on the mailing list a couple years ago).


I'm not sure if the current merge policy implementations still have this 
issue.


-Michael

-Original Message-
From: Furkan KAMACI [mailto:furkankam...@gmail.com]
Sent: Thursday, April 11, 2013 2:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Approximately needed RAM for 5000 query/second at a Solr 
machine?


Hi Walter;

Is there any document or something else says that worst case is three 
times of disk space? Twice times or three times. It is really different 
when we talk about GB's of disk spaces.



2013/4/10 Walter Underwood wun...@wunderwood.org


Correct, except the worst case maximum for disk space is three times.
--wunder

On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote:


You're mixing up disk and RAM requirements when you talk about
having twice the disk size. Solr does _NOT_ require twice the index
size of RAM to optimize, it requires twice the size on _DISK_.

In terms of RAM requirements, you need to create an index, run
realistic queries at the installation and measure.

Best
Erick

On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote:





On 4/9/2013 7:03 PM, Furkan KAMACI wrote:

These are really good metrics for me:
You say that RAM size should be at least index size, and it is
better to have a RAM size twice the index size (because of worst
case scenario).
On the other hand let's assume that I have a RAM size that is
bigger than twice of indexes at machine. Can Solr use that extra
RAM or is it a approximately maximum limit (to have twice size
of indexes at machine)?

What we have been discussing is the OS cache, which is memory
that is not used by programs.  The OS uses that memory to make
everything run faster.  The OS will instantly give that memory up
if a program requests it.
Solr is a java program, and java uses memory a little
differently, so Solr most likely will NOT use more memory when it is 
available.

In a normal directly executable program, memory can be
allocated at any time, and given back to the system at any time.
With Java, you tell it the maximum amount of memory the program
is ever allowed to use.  Because of how memory is used inside
Java, most long-running Java programs (like Solr) will allocate
up to the configured maximum even if they don't really need that much 
memory.

Most Java virtual machines will never give the memory back

Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-11 Thread Shawn Heisey

On 4/11/2013 7:46 AM, Michael Ryan wrote:

In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor of 
10. We would see the worst case happen when there were exactly 20 segments (or 
some other multiple of 10, I believe) at the start of the optimize. IIRC, it 
would merge those 20 segments down to 2 segments, and then merge those 2 
segments down to 1 segment. 1*indexSize space was used by the original index 
(because there is still a reader open on it), 1*indexSpace was used by the 2 
segments, and 1*indexSize space was used by the 1 segment. This is the worst 
case because there are two full additional copies of the index on disk. 
Normally, when the number of segments is not a multiple of the mergeFactor, 
there will be some part of the index that was not part of both merges (and this 
part that is excluded usually would be the largest segments).

We worked around this by doing multiple optimize passes, where the first pass 
merges down to between 2 and 2*mergeFactor-1 segments (based on a great tip 
from Lance Norskog on the mailing list a couple years ago).


For optimizes that are taking multiple passes instead of just building 
one segment from the start, TieredMergePolicy offers the 
maxMergeAtOnceExplicit parameter.  In most situations, setting this to 
three times the value of maxMergeAtOnce and maxSegmentsPerTier (which 
are usually set the same) will probably result in all optimizes 
completing in one pass.


Thanks,
Shawn



Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-10 Thread Furkan KAMACI
Thank you for your explanations, this will help me to figure out my system.

2013/4/10 Shawn Heisey s...@elyograg.org

 On 4/9/2013 9:12 PM, Furkan KAMACI wrote:
  I am sorry but you said:
 
  *you need enough free RAM for the OS to cache the maximum amount of disk
  space all your indexes will ever use*
 
  I have made an assumption my indexes at my machine. Let's assume that it
 is
  5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up
  to how much I define it as a Java processes. When we think about the
  indexes at storage and caching them at RAM by OS, is that what you talk
  about: having more than 5 GB - or - 10 GB RAM for my machine?

 If your index is 5GB, and you give 3GB of RAM to the Solr JVM, then you
 would want at least 8GB of total RAM for that machine - the 3GB of RAM
 given to Solr, plus the rest so the OS can cache the index in RAM.  If
 you plan for double the cache memory, you'd need 13 to 14GB.

 Thanks,
 Shawn




Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-10 Thread bigjust



 On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
  These are really good metrics for me:
  You say that RAM size should be at least index size, and it is
  better to have a RAM size twice the index size (because of worst
  case scenario).
  On the other hand let's assume that I have a RAM size that is
  bigger than twice of indexes at machine. Can Solr use that extra
  RAM or is it a approximately maximum limit (to have twice size of
  indexes at machine)?
 What we have been discussing is the OS cache, which is memory that
 is not used by programs.  The OS uses that memory to make everything
 run faster.  The OS will instantly give that memory up if a program
 requests it.
 Solr is a java program, and java uses memory a little differently,
 so Solr most likely will NOT use more memory when it is available.
 In a normal directly executable program, memory can be allocated
 at any time, and given back to the system at any time.
 With Java, you tell it the maximum amount of memory the program is
 ever allowed to use.  Because of how memory is used inside Java,
 most long-running Java programs (like Solr) will allocate up to the
 configured maximum even if they don't really need that much memory.
 Most Java virtual machines will never give the memory back to the
 system even if it is not required.
 Thanks, Shawn


Furkan KAMACI furkankam...@gmail.com writes:

 I am sorry but you said:

 *you need enough free RAM for the OS to cache the maximum amount of
 disk space all your indexes will ever use*

 I have made an assumption my indexes at my machine. Let's assume that
 it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will
 use RAM up to how much I define it as a Java processes. When we think
 about the indexes at storage and caching them at RAM by OS, is that
 what you talk about: having more than 5 GB - or - 10 GB RAM for my
 machine?

 2013/4/10 Shawn Heisey s...@elyograg.org


10 GB.  Because when Solr shuffles the data around, it could use up to
twice the size of the index in order to optimize the index on disk.

-- Justin


Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-10 Thread Erick Erickson
You're mixing up disk and RAM requirements when you talk
about having twice the disk size. Solr does _NOT_ require
twice the index size of RAM to optimize, it requires twice
the size on _DISK_.

In terms of RAM requirements, you need to create an index,
run realistic queries at the installation and measure.

Best
Erick

On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote:



 On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
  These are really good metrics for me:
  You say that RAM size should be at least index size, and it is
  better to have a RAM size twice the index size (because of worst
  case scenario).
  On the other hand let's assume that I have a RAM size that is
  bigger than twice of indexes at machine. Can Solr use that extra
  RAM or is it a approximately maximum limit (to have twice size of
  indexes at machine)?
 What we have been discussing is the OS cache, which is memory that
 is not used by programs.  The OS uses that memory to make everything
 run faster.  The OS will instantly give that memory up if a program
 requests it.
 Solr is a java program, and java uses memory a little differently,
 so Solr most likely will NOT use more memory when it is available.
 In a normal directly executable program, memory can be allocated
 at any time, and given back to the system at any time.
 With Java, you tell it the maximum amount of memory the program is
 ever allowed to use.  Because of how memory is used inside Java,
 most long-running Java programs (like Solr) will allocate up to the
 configured maximum even if they don't really need that much memory.
 Most Java virtual machines will never give the memory back to the
 system even if it is not required.
 Thanks, Shawn


 Furkan KAMACI furkankam...@gmail.com writes:

 I am sorry but you said:

 *you need enough free RAM for the OS to cache the maximum amount of
 disk space all your indexes will ever use*

 I have made an assumption my indexes at my machine. Let's assume that
 it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will
 use RAM up to how much I define it as a Java processes. When we think
 about the indexes at storage and caching them at RAM by OS, is that
 what you talk about: having more than 5 GB - or - 10 GB RAM for my
 machine?

 2013/4/10 Shawn Heisey s...@elyograg.org


 10 GB.  Because when Solr shuffles the data around, it could use up to
 twice the size of the index in order to optimize the index on disk.

 -- Justin


Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-10 Thread Walter Underwood
Correct, except the worst case maximum for disk space is three times. --wunder

On Apr 10, 2013, at 6:04 AM, Erick Erickson wrote:

 You're mixing up disk and RAM requirements when you talk
 about having twice the disk size. Solr does _NOT_ require
 twice the index size of RAM to optimize, it requires twice
 the size on _DISK_.
 
 In terms of RAM requirements, you need to create an index,
 run realistic queries at the installation and measure.
 
 Best
 Erick
 
 On Tue, Apr 9, 2013 at 10:32 PM, bigjust bigj...@lambdaphil.es wrote:
 
 
 
 On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
 These are really good metrics for me:
 You say that RAM size should be at least index size, and it is
 better to have a RAM size twice the index size (because of worst
 case scenario).
 On the other hand let's assume that I have a RAM size that is
 bigger than twice of indexes at machine. Can Solr use that extra
 RAM or is it a approximately maximum limit (to have twice size of
 indexes at machine)?
 What we have been discussing is the OS cache, which is memory that
 is not used by programs.  The OS uses that memory to make everything
 run faster.  The OS will instantly give that memory up if a program
 requests it.
 Solr is a java program, and java uses memory a little differently,
 so Solr most likely will NOT use more memory when it is available.
 In a normal directly executable program, memory can be allocated
 at any time, and given back to the system at any time.
 With Java, you tell it the maximum amount of memory the program is
 ever allowed to use.  Because of how memory is used inside Java,
 most long-running Java programs (like Solr) will allocate up to the
 configured maximum even if they don't really need that much memory.
 Most Java virtual machines will never give the memory back to the
 system even if it is not required.
 Thanks, Shawn
 
 
 Furkan KAMACI furkankam...@gmail.com writes:
 
 I am sorry but you said:
 
 *you need enough free RAM for the OS to cache the maximum amount of
 disk space all your indexes will ever use*
 
 I have made an assumption my indexes at my machine. Let's assume that
 it is 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will
 use RAM up to how much I define it as a Java processes. When we think
 about the indexes at storage and caching them at RAM by OS, is that
 what you talk about: having more than 5 GB - or - 10 GB RAM for my
 machine?
 
 2013/4/10 Shawn Heisey s...@elyograg.org
 
 
 10 GB.  Because when Solr shuffles the data around, it could use up to
 twice the size of the index in order to optimize the index on disk.
 
 -- Justin

--
Walter Underwood
wun...@wunderwood.org





Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
Are there anybody who can help me about how to guess the approximately
needed RAM for 5000 query/second at a Solr machine?


Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Jack Krupansky
It all depends on the nature of your query and the nature of the data in the 
index. Does returning results from a result cache count in your QPS? Not to 
mention how many cores and CPU speed and CPU caching as well. Not to mention 
network latency.


The best way to answer is to do a proof of concept implementation and 
measure it yourself.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Tuesday, April 09, 2013 6:06 PM
To: solr-user@lucene.apache.org
Subject: Approximately needed RAM for 5000 query/second at a Solr machine?

Are there anybody who can help me about how to guess the approximately
needed RAM for 5000 query/second at a Solr machine? 



Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
Actually I will propose a system and I should figure out about machine
specifications. There will be no faceting mechanism at first, just simple
search queries of a web search engine. We can think that I will have a
commodity server (I don't know is there any benchmark for a usual Solr
machine)

2013/4/10 Jack Krupansky j...@basetechnology.com

 It all depends on the nature of your query and the nature of the data in
 the index. Does returning results from a result cache count in your QPS?
 Not to mention how many cores and CPU speed and CPU caching as well. Not to
 mention network latency.

 The best way to answer is to do a proof of concept implementation and
 measure it yourself.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Tuesday, April 09, 2013 6:06 PM
 To: solr-user@lucene.apache.org
 Subject: Approximately needed RAM for 5000 query/second at a Solr machine?


 Are there anybody who can help me about how to guess the approximately
 needed RAM for 5000 query/second at a Solr machine?



Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Walter Underwood
On Apr 9, 2013, at 3:06 PM, Furkan KAMACI wrote:

 Are there anybody who can help me about how to guess the approximately
 needed RAM for 5000 query/second at a Solr machine?

No.

That depends on the kind of queries you have, the size and content of the 
index, the required response time, how frequently the index is updated, and 
many more factors. So anyone who can guess that is wrong.

You can only find that out by running your own benchmarks with your own queries 
against your own index.

In our system, we can meet our response time requirements at a rate of 4000 
queries/minute. We have several cores, but most traffic goes to a 3M document 
index. This index is small documents, mostly titles and authors of books. We 
have no wildcard queries and less than 5% of our queries use fuzzy matching. We 
update once per day and have cache hit rates of around 30%.

We run new benchmarks twice each year, before our busy seasons. We use the 
current index and configuration and the queries from the busiest day of the 
previous season.

Our key benchmark is the 95th percentile response time, but we also measure 
median, 90th, and 99th percentile.

We are currently on Solr 3.3 with some customizations. We're working on 
transitioning to Solr 4.

wunder
--
Walter Underwood
wun...@wunderwood.org





Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
Hi Walter;

Firstly thank for your detailed reply. I know that this is not a well
detailed question but I don't have any metrics yet. If we talk about your
system, what is the average RAM size of your Solr machines? Maybe that can
help me to make a comparison.

2013/4/10 Walter Underwood wun...@wunderwood.org

 On Apr 9, 2013, at 3:06 PM, Furkan KAMACI wrote:

  Are there anybody who can help me about how to guess the approximately
  needed RAM for 5000 query/second at a Solr machine?

 No.

 That depends on the kind of queries you have, the size and content of the
 index, the required response time, how frequently the index is updated, and
 many more factors. So anyone who can guess that is wrong.

 You can only find that out by running your own benchmarks with your own
 queries against your own index.

 In our system, we can meet our response time requirements at a rate of
 4000 queries/minute. We have several cores, but most traffic goes to a 3M
 document index. This index is small documents, mostly titles and authors of
 books. We have no wildcard queries and less than 5% of our queries use
 fuzzy matching. We update once per day and have cache hit rates of around
 30%.

 We run new benchmarks twice each year, before our busy seasons. We use the
 current index and configuration and the queries from the busiest day of the
 previous season.

 Our key benchmark is the 95th percentile response time, but we also
 measure median, 90th, and 99th percentile.

 We are currently on Solr 3.3 with some customizations. We're working on
 transitioning to Solr 4.

 wunder
 --
 Walter Underwood
 wun...@wunderwood.org






Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Walter Underwood
We are using Amazon EC2 M1 Extra Large instances (m1.xlarge).

http://aws.amazon.com/ec2/instance-types/

wunder

On Apr 9, 2013, at 3:35 PM, Furkan KAMACI wrote:

 Hi Walter;
 
 Firstly thank for your detailed reply. I know that this is not a well
 detailed question but I don't have any metrics yet. If we talk about your
 system, what is the average RAM size of your Solr machines? Maybe that can
 help me to make a comparison.
 
 2013/4/10 Walter Underwood wun...@wunderwood.org
 
 On Apr 9, 2013, at 3:06 PM, Furkan KAMACI wrote:
 
 Are there anybody who can help me about how to guess the approximately
 needed RAM for 5000 query/second at a Solr machine?
 
 No.
 
 That depends on the kind of queries you have, the size and content of the
 index, the required response time, how frequently the index is updated, and
 many more factors. So anyone who can guess that is wrong.
 
 You can only find that out by running your own benchmarks with your own
 queries against your own index.
 
 In our system, we can meet our response time requirements at a rate of
 4000 queries/minute. We have several cores, but most traffic goes to a 3M
 document index. This index is small documents, mostly titles and authors of
 books. We have no wildcard queries and less than 5% of our queries use
 fuzzy matching. We update once per day and have cache hit rates of around
 30%.
 
 We run new benchmarks twice each year, before our busy seasons. We use the
 current index and configuration and the queries from the busiest day of the
 previous season.
 
 Our key benchmark is the 95th percentile response time, but we also
 measure median, 90th, and 99th percentile.
 
 We are currently on Solr 3.3 with some customizations. We're working on
 transitioning to Solr 4.
 
 wunder
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
Thanks for your answer.

2013/4/10 Walter Underwood wun...@wunderwood.org

 We are using Amazon EC2 M1 Extra Large instances (m1.xlarge).

 http://aws.amazon.com/ec2/instance-types/

 wunder

 On Apr 9, 2013, at 3:35 PM, Furkan KAMACI wrote:

  Hi Walter;
 
  Firstly thank for your detailed reply. I know that this is not a well
  detailed question but I don't have any metrics yet. If we talk about your
  system, what is the average RAM size of your Solr machines? Maybe that
 can
  help me to make a comparison.
 
  2013/4/10 Walter Underwood wun...@wunderwood.org
 
  On Apr 9, 2013, at 3:06 PM, Furkan KAMACI wrote:
 
  Are there anybody who can help me about how to guess the approximately
  needed RAM for 5000 query/second at a Solr machine?
 
  No.
 
  That depends on the kind of queries you have, the size and content of
 the
  index, the required response time, how frequently the index is updated,
 and
  many more factors. So anyone who can guess that is wrong.
 
  You can only find that out by running your own benchmarks with your own
  queries against your own index.
 
  In our system, we can meet our response time requirements at a rate of
  4000 queries/minute. We have several cores, but most traffic goes to a
 3M
  document index. This index is small documents, mostly titles and
 authors of
  books. We have no wildcard queries and less than 5% of our queries use
  fuzzy matching. We update once per day and have cache hit rates of
 around
  30%.
 
  We run new benchmarks twice each year, before our busy seasons. We use
 the
  current index and configuration and the queries from the busiest day of
 the
  previous season.
 
  Our key benchmark is the 95th percentile response time, but we also
  measure median, 90th, and 99th percentile.
 
  We are currently on Solr 3.3 with some customizations. We're working on
  transitioning to Solr 4.
 
  wunder
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Shawn Heisey

On 4/9/2013 4:06 PM, Furkan KAMACI wrote:

Are there anybody who can help me about how to guess the approximately
needed RAM for 5000 query/second at a Solr machine?


You've already gotten some good replies, and I'm aware that they haven't 
really answered your question.  This is the kind of question that cannot 
be answered.


The amount of RAM that you'll need for extreme performance actually 
isn't hard to figure out - you need enough free RAM for the OS to cache 
the maximum amount of disk space all your indexes will ever use. 
Normally this will be twice the size of all the indexes on the machine, 
because that's how much disk space will likely be used in a worst-case 
merge scenario (optimize).  That's very expensive, so it is cheaper to 
budget for only the size of the index.


A load of 5000 queries per second is pretty high, and probably something 
you will not achieve with a single-server (not counting backup) 
approach.  All of the tricks that high-volume website developers use are 
also applicable to Solr.


Once you have enough RAM, you need to worry more about the number of 
servers, the number of CPU cores in each server, and the speed of those 
CPU cores.  Testing with actual production queries is the only way to 
find out what you really need.


Beyond hardware design, making the requests as simple as possible and 
taking advantage of caches is important.  Solr has caches for queries, 
filters, and documents.  You can also put a caching proxy (something 
like Varnish) in front of Solr, but that would make NRT updates pretty 
much impossible, and that kind of caching can be difficult to get 
working right.


Thanks,
Shawn



Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
These are really good metrics for me:

You say that RAM size should be at least index size, and it is better to
have a RAM size twice the index size (because of worst case scenario).

On the other hand let's assume that I have a RAM size that is bigger than
twice of indexes at machine. Can Solr use that extra RAM or is it a
approximately maximum limit (to have twice size of indexes at machine)?


2013/4/10 Shawn Heisey s...@elyograg.org

 On 4/9/2013 4:06 PM, Furkan KAMACI wrote:

 Are there anybody who can help me about how to guess the approximately
 needed RAM for 5000 query/second at a Solr machine?


 You've already gotten some good replies, and I'm aware that they haven't
 really answered your question.  This is the kind of question that cannot be
 answered.

 The amount of RAM that you'll need for extreme performance actually isn't
 hard to figure out - you need enough free RAM for the OS to cache the
 maximum amount of disk space all your indexes will ever use. Normally this
 will be twice the size of all the indexes on the machine, because that's
 how much disk space will likely be used in a worst-case merge scenario
 (optimize).  That's very expensive, so it is cheaper to budget for only the
 size of the index.

 A load of 5000 queries per second is pretty high, and probably something
 you will not achieve with a single-server (not counting backup) approach.
  All of the tricks that high-volume website developers use are also
 applicable to Solr.

 Once you have enough RAM, you need to worry more about the number of
 servers, the number of CPU cores in each server, and the speed of those CPU
 cores.  Testing with actual production queries is the only way to find out
 what you really need.

 Beyond hardware design, making the requests as simple as possible and
 taking advantage of caches is important.  Solr has caches for queries,
 filters, and documents.  You can also put a caching proxy (something like
 Varnish) in front of Solr, but that would make NRT updates pretty much
 impossible, and that kind of caching can be difficult to get working right.

 Thanks,
 Shawn




Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Shawn Heisey
On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
 These are really good metrics for me:
 
 You say that RAM size should be at least index size, and it is better to
 have a RAM size twice the index size (because of worst case scenario).
 
 On the other hand let's assume that I have a RAM size that is bigger than
 twice of indexes at machine. Can Solr use that extra RAM or is it a
 approximately maximum limit (to have twice size of indexes at machine)?

What we have been discussing is the OS cache, which is memory that is
not used by programs.  The OS uses that memory to make everything run
faster.  The OS will instantly give that memory up if a program requests it.

Solr is a java program, and java uses memory a little differently, so
Solr most likely will NOT use more memory when it is available.

In a normal directly executable program, memory can be allocated at
any time, and given back to the system at any time.

With Java, you tell it the maximum amount of memory the program is ever
allowed to use.  Because of how memory is used inside Java, most
long-running Java programs (like Solr) will allocate up to the
configured maximum even if they don't really need that much memory.
Most Java virtual machines will never give the memory back to the system
even if it is not required.

Thanks,
Shawn



Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Furkan KAMACI
I am sorry but you said:

*you need enough free RAM for the OS to cache the maximum amount of disk
space all your indexes will ever use*

I have made an assumption my indexes at my machine. Let's assume that it is
5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up
to how much I define it as a Java processes. When we think about the
indexes at storage and caching them at RAM by OS, is that what you talk
about: having more than 5 GB - or - 10 GB RAM for my machine?

2013/4/10 Shawn Heisey s...@elyograg.org

 On 4/9/2013 7:03 PM, Furkan KAMACI wrote:
  These are really good metrics for me:
 
  You say that RAM size should be at least index size, and it is better to
  have a RAM size twice the index size (because of worst case scenario).
 
  On the other hand let's assume that I have a RAM size that is bigger than
  twice of indexes at machine. Can Solr use that extra RAM or is it a
  approximately maximum limit (to have twice size of indexes at machine)?

 What we have been discussing is the OS cache, which is memory that is
 not used by programs.  The OS uses that memory to make everything run
 faster.  The OS will instantly give that memory up if a program requests
 it.

 Solr is a java program, and java uses memory a little differently, so
 Solr most likely will NOT use more memory when it is available.

 In a normal directly executable program, memory can be allocated at
 any time, and given back to the system at any time.

 With Java, you tell it the maximum amount of memory the program is ever
 allowed to use.  Because of how memory is used inside Java, most
 long-running Java programs (like Solr) will allocate up to the
 configured maximum even if they don't really need that much memory.
 Most Java virtual machines will never give the memory back to the system
 even if it is not required.

 Thanks,
 Shawn




Re: Approximately needed RAM for 5000 query/second at a Solr machine?

2013-04-09 Thread Shawn Heisey
On 4/9/2013 9:12 PM, Furkan KAMACI wrote:
 I am sorry but you said:
 
 *you need enough free RAM for the OS to cache the maximum amount of disk
 space all your indexes will ever use*
 
 I have made an assumption my indexes at my machine. Let's assume that it is
 5 GB. So it is better to have at least 5 GB RAM? OK, Solr will use RAM up
 to how much I define it as a Java processes. When we think about the
 indexes at storage and caching them at RAM by OS, is that what you talk
 about: having more than 5 GB - or - 10 GB RAM for my machine?

If your index is 5GB, and you give 3GB of RAM to the Solr JVM, then you
would want at least 8GB of total RAM for that machine - the 3GB of RAM
given to Solr, plus the rest so the OS can cache the index in RAM.  If
you plan for double the cache memory, you'd need 13 to 14GB.

Thanks,
Shawn