RE: Hardware Specs Question

2010-09-06 Thread Toke Eskildsen
From: Dennis Gearon [gear...@sbcglobal.net]:
 I wouldn't have thought that CPU was a big deal with the speed/cores of CPU's
 continuously growing according to Moore's law and the change in Disk Speed
 barely changine 50% in 15 years. Must have a lot to do with caching.

I am not sure I follow you? When seek times are suddenly a 100 times faster 
(slight exaggeration, but only slight) why wouldn't it cause the bottleneck to 
move? Yes, CPU's has increased tremendously in speed, but so has our processing 
needs. Lucene (and by extension Solr) was made with long seek times in mind and 
looking at the current marked, it makes sense to continue supporting this for 
some years. If the software was optimized for sub-ms seek times, it might lower 
CPU usage or at the very least lower the need for caching (internal as well as 
external).

 What size indexes are you working with?

Around 40GB for our primary index. 9 million documents, AFAIR.

 Are you saying you can get the whole thing in memory?

No. For that test we had to reduce the index to 14GB on our 24GB test machine 
with Lucene's RAMDirectory. In order to avoid the everything is cached and 
thus everything is the same speed-problem, we lowered the amount of available 
memory to 3GB when we measured harddisk  SSD speed against the 14GB index. The 
Cliff notes is harddisks 200 raw queries/second, SSDs 774 q/sec and RAM 952 
q/s, but as always it is not so simple to extract a single number for 
performance when warm up and caching comes into play. Let me be quick to add 
that this was with Lucene + custom code, not with Solr.

 That would negate almost any disk benefits.

That depends very much on your setup. It takes a fair amount of time to copy 
14GB from storage into RAM so an index fully in RAM would either be very static 
or require some logic to handle updates and sync data in case of outages. I 
know there's some interesting work being done with this, but as SSDs are a lot 
cheaper than RAM and fulfill our needs, it is not something we pursue.


RE: Hardware Specs Question

2010-09-06 Thread Dennis Gearon
Very interesting stuff!

I'm pretty sure everything will be non hard disk for intense applications FRONT 
line use by 10 years or sooner, with hard disk as backup/boot up.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/6/10, Toke Eskildsen t...@statsbiblioteket.dk wrote:

 From: Toke Eskildsen t...@statsbiblioteket.dk
 Subject: RE: Hardware Specs Question
 To: Dennis Gearon gear...@sbcglobal.net, solr-user@lucene.apache.org 
 solr-user@lucene.apache.org
 Date: Monday, September 6, 2010, 12:35 PM
 From: Dennis Gearon [gear...@sbcglobal.net]:
  I wouldn't have thought that CPU was a big deal with
 the speed/cores of CPU's
  continuously growing according to Moore's law and the
 change in Disk Speed
  barely changine 50% in 15 years. Must have a lot to do
 with caching.
 
 I am not sure I follow you? When seek times are suddenly a
 100 times faster (slight exaggeration, but only slight) why
 wouldn't it cause the bottleneck to move? Yes, CPU's has
 increased tremendously in speed, but so has our processing
 needs. Lucene (and by extension Solr) was made with long
 seek times in mind and looking at the current marked, it
 makes sense to continue supporting this for some years. If
 the software was optimized for sub-ms seek times, it might
 lower CPU usage or at the very least lower the need for
 caching (internal as well as external).
 
  What size indexes are you working with?
 
 Around 40GB for our primary index. 9 million documents,
 AFAIR.
 
  Are you saying you can get the whole thing in memory?
 
 No. For that test we had to reduce the index to 14GB on our
 24GB test machine with Lucene's RAMDirectory. In order to
 avoid the everything is cached and thus everything is the
 same speed-problem, we lowered the amount of available
 memory to 3GB when we measured harddisk  SSD speed
 against the 14GB index. The Cliff notes is harddisks 200 raw
 queries/second, SSDs 774 q/sec and RAM 952 q/s, but as
 always it is not so simple to extract a single number for
 performance when warm up and caching comes into play. Let me
 be quick to add that this was with Lucene + custom code, not
 with Solr.
 
  That would negate almost any disk benefits.
 
 That depends very much on your setup. It takes a fair
 amount of time to copy 14GB from storage into RAM so an
 index fully in RAM would either be very static or require
 some logic to handle updates and sync data in case of
 outages. I know there's some interesting work being done
 with this, but as SSDs are a lot cheaper than RAM and
 fulfill our needs, it is not something we pursue.
 


Re: Hardware Specs Question

2010-09-03 Thread Dennis Gearon
If you really want to see performance, try external DRAM disks. Whew! 800X 
faster than a disk.


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Thu, 9/2/10, Shawn Heisey s...@elyograg.org wrote:

 From: Shawn Heisey s...@elyograg.org
 Subject: Re: Hardware Specs Question
 To: solr-user@lucene.apache.org
 Date: Thursday, September 2, 2010, 6:45 PM
  On 9/2/2010 2:54 AM, Toke Eskildsen
 wrote:
  We've done a fair amount of experimentation in this
 area (1997-era SSDs
  vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000
 RPM harddisks in
  RAID 0). The harddisk setups never stood a chance for
 searching. With
  current SSD's being faster than harddisks for writes
 too, they'll also
  be better for index building, although not as
 impressive as for
  searches. Old notes at http://wiki.statsbiblioteket.dk/summa/Hardware
 
 How does it compare to six SATA drives in a Dell hardware
 RAID10?  That's what my VM hosts have, which each run
 three large shards and a couple of supporting systems.
 
 



Re: Hardware Specs Question

2010-09-03 Thread Toke Eskildsen
On Fri, 2010-09-03 at 03:45 +0200, Shawn Heisey wrote:
 On 9/2/2010 2:54 AM, Toke Eskildsen wrote:
  We've done a fair amount of experimentation in this area (1997-era SSDs
  vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in
  RAID 0). The harddisk setups never stood a chance for searching. With
  current SSD's being faster than harddisks for writes too, they'll also
  be better for index building, although not as impressive as for
  searches. Old notes at http://wiki.statsbiblioteket.dk/summa/Hardware
 
 How does it compare to six SATA drives in a Dell hardware RAID10?  

I'll have to extrapolate a lot here (also known as guessing).

You don't mention what kind of harddrives you're using, so let's say
15.000 RPM to err on the high-end side. Compared to the 2 drives @
15.000 RPM in RAID 1 we've experimented with, the difference is that the
striping allows for concurrency when the different reads are on
different physical drives (sorry if this is basic, I'm just trying to
establish a common understanding here).

The chance for 2 concurrent reads to be on different drives with 3
harddrives is 5/6, the chance for 3 concurrent reads is 1/6 and the
chance for 3 concurrent reads to be on at least 2 drives is 5/6. For the
sake of argument, let's say that the 3 * striping gives us double the
concurrency I/O.

Taking my old measurements at face value and doubling the numbers for
the 15.000 RPM measurements, this would bring six 15.000 RPM SATA 10
drives up to a throughput that is 1/3 - 2/3 of the SSD, depending on how
we measure.


Some general observations:

With long runtimes, the throughput for harddisk rises relative to the
SSD as the disk cache gets warmed. If there is frequent index updates
with deletions, the SSD gains more ground as it is not nearly as
dependent on disk cache as harddisks.

With small indexes, the difference between harddisks and SSD is
relatively small as the disk cache quickly gets filled. Consequently the
difference increases for large indexes.


One point to note for RAID is that they do not improve the speed of
single searches on a single index: They do not lower the seek time for a
single small I/O request and searching on a single index is done with a
number of small successive requests. If the performance problem is long
search time, RAID does not help (but in combination with sharding or
similar it will). If the problem is the number of concurrent searches,
RAID helps.



Re: Hardware Specs Question

2010-09-03 Thread Toke Eskildsen
On Fri, 2010-09-03 at 11:07 +0200, Dennis Gearon wrote:
 If you really want to see performance, try external DRAM disks.
 Whew! 800X faster than a disk.

As sexy as they are, the DRAM drives does not buy much more extra
performance. At least not at the search stage. For searching, SSDs are
not that far from holding the index fully in RAM (about 3/4 the speed in
our tests but YMMV). The CPU is the bottleneck.

That was with Lucene 2.4 so the relative numbers might have changed, but
the old lesson still stands: A well balanced system is key.



Re: Hardware Specs Question

2010-09-03 Thread scott chu

well balanced system
=
Agree. Here we'll start a performance  load test this month. I've defined a 
test criteria of 'qps', 'RTpQ'  worse case according to our use case  past 
experience. Our goal is pursuing this criteria  adjust hardware  system 
configuration to find a well balanced scalable Solr aritecture.


However, the past discussion of this thread has several good suggestion for 
our test. Thanks to all who provides their experience  suggestion.


Scott

- Original Message - 
From: Toke Eskildsen t...@statsbiblioteket.dk

To: solr-user@lucene.apache.org
Sent: Friday, September 03, 2010 6:43 PM
Subject: Re: Hardware Specs Question



On Fri, 2010-09-03 at 11:07 +0200, Dennis Gearon wrote:

If you really want to see performance, try external DRAM disks.
Whew! 800X faster than a disk.


As sexy as they are, the DRAM drives does not buy much more extra
performance. At least not at the search stage. For searching, SSDs are
not that far from holding the index fully in RAM (about 3/4 the speed in
our tests but YMMV). The CPU is the bottleneck.

That was with Lucene 2.4 so the relative numbers might have changed, but
the old lesson still stands: A well balanced system is key.






Re: Hardware Specs Question

2010-09-03 Thread Dennis Gearon
I wouldn't have thought that CPU was a big deal with the speed/cores of CPU's 
continuously growing according to Moore's law and the change in Disk Speed 
barely changine 50% in 15 years. Must have a lot to do with caching.

What size indexes are you working with? Are you saying you can get the whole 
thing in memory? That would negate almost any disk benefits.

I'm guessing that keeping shards small enough to fit into memory must be one of 
the big tricks.


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 9/3/10, Toke Eskildsen t...@statsbiblioteket.dk wrote:

 From: Toke Eskildsen t...@statsbiblioteket.dk
 Subject: Re: Hardware Specs Question
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Friday, September 3, 2010, 3:43 AM
 On Fri, 2010-09-03 at 11:07 +0200,
 Dennis Gearon wrote:
  If you really want to see performance, try external
 DRAM disks.
  Whew! 800X faster than a disk.
 
 As sexy as they are, the DRAM drives does not buy much more
 extra
 performance. At least not at the search stage. For
 searching, SSDs are
 not that far from holding the index fully in RAM (about 3/4
 the speed in
 our tests but YMMV). The CPU is the bottleneck.
 
 That was with Lucene 2.4 so the relative numbers might have
 changed, but
 the old lesson still stands: A well balanced system is
 key.
 
 


Re: Hardware Specs Question

2010-09-03 Thread Shawn Heisey

 On 9/3/2010 3:39 AM, Toke Eskildsen wrote:

I'll have to extrapolate a lot here (also known as guessing).
You don't mention what kind of harddrives you're using, so let's say
15.000 RPM to err on the high-end side. Compared to the 2 drives @
15.000 RPM in RAID 1 we've experimented with, the difference is that the
striping allows for concurrency when the different reads are on
different physical drives (sorry if this is basic, I'm just trying to
establish a common understanding here).

The chance for 2 concurrent reads to be on different drives with 3
harddrives is 5/6, the chance for 3 concurrent reads is 1/6 and the
chance for 3 concurrent reads to be on at least 2 drives is 5/6. For the
sake of argument, let's say that the 3 * striping gives us double the
concurrency I/O.


I actually didn't know that there were 15,000 RPM SATA drives until just 
now when I googled.  I knew that Western Digital made some 10,000 RPM, 
but most SATA drives are 7200.  Dell doesn't sell any SATA drives faster 
than 7200, and the 500GB drives in my servers are 7200.  I'm using the 
maximum 1MB stripe size to increase the likelihood of concurrent reads.  
Our query rate is quite low (less than 1 per second), so any concurrency 
that's achieved will be limited to possibly allowing all three VMs on 
the server to access the disk at the exact same time.  With three 
stripes and two copies of each of those stripes, the chance of that is 
fair to good.


So with all that, I probably only see around a third (and possibly maybe 
up to half) the performance of SSDs.  Thanks!




Re: Hardware Specs Question

2010-09-02 Thread Toke Eskildsen
On Thu, 2010-09-02 at 03:37 +0200, Lance Norskog wrote:
 I don't know how much SSD disks cost, but they will certainly cure the
 disk i/o problem.

We've done a fair amount of experimentation in this area (1997-era SSDs
vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in
RAID 0). The harddisk setups never stood a chance for searching. With
current SSD's being faster than harddisks for writes too, they'll also
be better for index building, although not as impressive as for
searches. Old notes at http://wiki.statsbiblioteket.dk/summa/Hardware

With consumer level SSD's, there is more bang-for-the-buck than RAIDing
up with high-end harddisks. They should be the first choice when IO is
an issue.


There are of course opposing views on this issue. Some people think
enterprise: Expensive and very reliable systems where consumer hardware
is a big no-no. The price point for pro SSDs might make them unfeasible
in such a setup. Other go for cheaper setups and handle the reliability
issues with redundancy. I'm firmly in the second camp, but it is
obviously not an option for all people.

A point of concern is writes. Current consumer SSDs uses wear leveling
and they can take a lot of punishment (as a rough measurement: The
amount of free space times 10.000). They might not be suitable for
holding massive databases with thousands of writes/second, but they can
surely handle the measly amount of writes required for Lucene index
updating and searching.

A long story short: Put a quality consumer SSD in each server and be
happy.



Re: Hardware Specs Question

2010-09-02 Thread Shawn Heisey

 On 9/2/2010 2:54 AM, Toke Eskildsen wrote:

We've done a fair amount of experimentation in this area (1997-era SSDs
vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in
RAID 0). The harddisk setups never stood a chance for searching. With
current SSD's being faster than harddisks for writes too, they'll also
be better for index building, although not as impressive as for
searches. Old notes at http://wiki.statsbiblioteket.dk/summa/Hardware


How does it compare to six SATA drives in a Dell hardware RAID10?  
That's what my VM hosts have, which each run three large shards and a 
couple of supporting systems.





Re: Hardware Specs Question

2010-09-01 Thread Lance Norskog
I was just reading about configuring mass computation grids: hardware
writes on 2 striped disks take 10% than writes on a single disk,
because you have to wait for the slower disk to finish. So, single
disks without RAID are faster.

I don't know how much SSD disks cost, but they will certainly cure the
disk i/o problem.

On Tue, Aug 31, 2010 at 1:35 AM, scott chu (朱炎詹) scott@udngroup.com wrote:
 In our current lab project, we already built a Chinese newspaper index with
 18 millions documents. The index size is around 51GB. So I am very concerned
 about the memory issue you guys mentioned.

 I also look up the Hathitrust report on SolrPerformanceData page:
 http://wiki.apache.org/solr/SolrPerformanceData. They said their main
 bottleneck is Disk-I/O even they have 10 shards spread over 4 servers.

 Can you guys give me some helpful suggestion about hardward spec  memory
 configuration on our project?

 Thanks in advance.

 Scott

 - Original Message - From: Lance Norskog goks...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, August 31, 2010 1:01 PM
 Subject: Re: Hardware Specs Question


 There are synchronization points, which become chokepoints at some
 number of cores. I don't know where they cause Lucene to top out.
 Lucene apps are generally disk-bound, not CPU-bound, but yours will
 be. There are so many variables that it's really not possible to give
 any numbers.

 Lance

 On Mon, Aug 30, 2010 at 8:34 PM, Amit Nithian anith...@gmail.com wrote:

 Lance,

 makes sense and I have heard about the long GC times on large heaps but I
 personally haven't experienced a slowdown but that doesn't mean anything
 either :-). Agreed that tuning the SOLR caching is the way to go.

 I haven't followed all the solr/lucene changes but from what I remember
 there are synchronization points that could be a bottleneck where adding
 more cores won't help this problem? Or am I completely missing something.

 Thanks again
 Amit

 On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹)
 scott@udngroup.comwrote:

 I am also curious as Amit does. Can you make an example about the garbage
 collection problem you mentioned?

 - Original Message - From: Lance Norskog goks...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, August 31, 2010 9:14 AM
 Subject: Re: Hardware Specs Question



 It generally works best to tune the Solr caches and allocate enough

 RAM to run comfortably. Linux  Windows et. al. have their own cache
 of disk blocks. They use very good algorithms for managing this cache.
 Also, they do not make long garbage collection passes.

 On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian anith...@gmail.com
 wrote:

 Lance,

 Thanks for your help. What do you mean by that the OS can keep the
 index
 in
 memory better than Solr? Do you mean that you should use another means
 to
 keep the index in memory (i.e. ramdisk)? Is there a generally accepted
 heap
 size/index size that you follow?

 Thanks
 Amit

 On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com
 wrote:

 The price-performance knee for small servers is 32G ram, 2-6 SATA

 disks on a raid, 8/16 cores. You can buy these servers and half-fill
 them, leaving room for expansion.

 I have not done benchmarks about the max # of processors that can be
 kept busy during indexing or querying, and the total numbers: QPS,
 response time averages  variability, etc.

 If your index file size is 8G, and your Java heap is 8G, you will do
 long garbage collection cycles. The operating system is very good at
 keeping your index in memory- better than Solr can.

 Lance

 On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com
 wrote:
  Hi all,
 
  I am curious to know get some opinions on at what point having more
    
 CPU
  cores shows diminishing returns in terms of QPS. Our index size is 
 about
 8GB
  and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
  Currently I have the heap to 8GB.
 
  We are looking to get more servers to increase capacity and because
    
 the
  warranty is set to expire on our old servers and so I was curious 
 before
  asking for a certain spec what others run and at what point does 
 having
 more
  cores cease to matter? Mainly looking at somewhere between 4-12 
  cores
  per
  server.
 
  Thanks!
  Amit
 



 --
 Lance Norskog
 goks...@gmail.com





 --
 Lance Norskog
 goks...@gmail.com





 



 ___b___J_T_f_r_C
 Checked by AVG - www.avg.com
 Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10
 14:35:00






 --
 Lance Norskog
 goks...@gmail.com



 



 ___b___J_T_f_r_C
 Checked by AVG - www.avg.com
 Version: 9.0.851 / Virus Database: 271.1.1/3103 - Release Date: 08/31/10
 02:34:00





-- 
Lance Norskog
goks...@gmail.com


Re: Hardware Specs Question

2010-08-31 Thread 朱炎詹
In our current lab project, we already built a Chinese newspaper index with 
18 millions documents. The index size is around 51GB. So I am very concerned 
about the memory issue you guys mentioned.


I also look up the Hathitrust report on SolrPerformanceData page: 
http://wiki.apache.org/solr/SolrPerformanceData. They said their main 
bottleneck is Disk-I/O even they have 10 shards spread over 4 servers.


Can you guys give me some helpful suggestion about hardward spec  memory 
configuration on our project?


Thanks in advance.

Scott

- Original Message - 
From: Lance Norskog goks...@gmail.com

To: solr-user@lucene.apache.org
Sent: Tuesday, August 31, 2010 1:01 PM
Subject: Re: Hardware Specs Question


There are synchronization points, which become chokepoints at some
number of cores. I don't know where they cause Lucene to top out.
Lucene apps are generally disk-bound, not CPU-bound, but yours will
be. There are so many variables that it's really not possible to give
any numbers.

Lance

On Mon, Aug 30, 2010 at 8:34 PM, Amit Nithian anith...@gmail.com wrote:

Lance,

makes sense and I have heard about the long GC times on large heaps but I
personally haven't experienced a slowdown but that doesn't mean anything
either :-). Agreed that tuning the SOLR caching is the way to go.

I haven't followed all the solr/lucene changes but from what I remember
there are synchronization points that could be a bottleneck where adding
more cores won't help this problem? Or am I completely missing something.

Thanks again
Amit

On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹) 
scott@udngroup.comwrote:



I am also curious as Amit does. Can you make an example about the garbage
collection problem you mentioned?

- Original Message - From: Lance Norskog goks...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tuesday, August 31, 2010 9:14 AM
Subject: Re: Hardware Specs Question



It generally works best to tune the Solr caches and allocate enough

RAM to run comfortably. Linux  Windows et. al. have their own cache
of disk blocks. They use very good algorithms for managing this cache.
Also, they do not make long garbage collection passes.

On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian anith...@gmail.com 
wrote:



Lance,

Thanks for your help. What do you mean by that the OS can keep the 
index

in
memory better than Solr? Do you mean that you should use another means 
to

keep the index in memory (i.e. ramdisk)? Is there a generally accepted
heap
size/index size that you follow?

Thanks
Amit

On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com
wrote:

The price-performance knee for small servers is 32G ram, 2-6 SATA

disks on a raid, 8/16 cores. You can buy these servers and half-fill
them, leaving room for expansion.

I have not done benchmarks about the max # of processors that can be
kept busy during indexing or querying, and the total numbers: QPS,
response time averages  variability, etc.

If your index file size is 8G, and your Java heap is 8G, you will do
long garbage collection cycles. The operating system is very good at
keeping your index in memory- better than Solr can.

Lance

On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com
wrote:
 Hi all,

 I am curious to know get some opinions on at what point having more 
  

CPU
 cores shows diminishing returns in terms of QPS. Our index size is 
about
8GB
 and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
 Currently I have the heap to 8GB.

 We are looking to get more servers to increase capacity and because 
  

the
 warranty is set to expire on our old servers and so I was curious 
before
 asking for a certain spec what others run and at what point does 
having
more
 cores cease to matter? Mainly looking at somewhere between 4-12 
 cores

 per
 server.

 Thanks!
 Amit




--
Lance Norskog
goks...@gmail.com







--
Lance Norskog
goks...@gmail.com









___b___J_T_f_r_C
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10
14:35:00








--
Lance Norskog
goks...@gmail.com







___b___J_T_f_r_C
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3103 - Release Date: 08/31/10 
02:34:00




Hardware Specs Question

2010-08-30 Thread Amit Nithian
Hi all,

I am curious to know get some opinions on at what point having more CPU
cores shows diminishing returns in terms of QPS. Our index size is about 8GB
and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
Currently I have the heap to 8GB.

We are looking to get more servers to increase capacity and because the
warranty is set to expire on our old servers and so I was curious before
asking for a certain spec what others run and at what point does having more
cores cease to matter? Mainly looking at somewhere between 4-12 cores per
server.

Thanks!
Amit


Re: Hardware Specs Question

2010-08-30 Thread Lance Norskog
The price-performance knee for small servers is 32G ram, 2-6 SATA
disks on a raid, 8/16 cores. You can buy these servers and half-fill
them, leaving room for expansion.

I have not done benchmarks about the max # of processors that can be
kept busy during indexing or querying, and the total numbers: QPS,
response time averages  variability, etc.

If your index file size is 8G, and your Java heap is 8G, you will do
long garbage collection cycles. The operating system is very good at
keeping your index in memory- better than Solr can.

Lance

On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com wrote:
 Hi all,

 I am curious to know get some opinions on at what point having more CPU
 cores shows diminishing returns in terms of QPS. Our index size is about 8GB
 and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
 Currently I have the heap to 8GB.

 We are looking to get more servers to increase capacity and because the
 warranty is set to expire on our old servers and so I was curious before
 asking for a certain spec what others run and at what point does having more
 cores cease to matter? Mainly looking at somewhere between 4-12 cores per
 server.

 Thanks!
 Amit




-- 
Lance Norskog
goks...@gmail.com


Re: Hardware Specs Question

2010-08-30 Thread Amit Nithian
Lance,

Thanks for your help. What do you mean by that the OS can keep the index in
memory better than Solr? Do you mean that you should use another means to
keep the index in memory (i.e. ramdisk)? Is there a generally accepted heap
size/index size that you follow?

Thanks
Amit

On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com wrote:

 The price-performance knee for small servers is 32G ram, 2-6 SATA
 disks on a raid, 8/16 cores. You can buy these servers and half-fill
 them, leaving room for expansion.

 I have not done benchmarks about the max # of processors that can be
 kept busy during indexing or querying, and the total numbers: QPS,
 response time averages  variability, etc.

 If your index file size is 8G, and your Java heap is 8G, you will do
 long garbage collection cycles. The operating system is very good at
 keeping your index in memory- better than Solr can.

 Lance

 On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com wrote:
  Hi all,
 
  I am curious to know get some opinions on at what point having more CPU
  cores shows diminishing returns in terms of QPS. Our index size is about
 8GB
  and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
  Currently I have the heap to 8GB.
 
  We are looking to get more servers to increase capacity and because the
  warranty is set to expire on our old servers and so I was curious before
  asking for a certain spec what others run and at what point does having
 more
  cores cease to matter? Mainly looking at somewhere between 4-12 cores per
  server.
 
  Thanks!
  Amit
 



 --
 Lance Norskog
 goks...@gmail.com



Re: Hardware Specs Question

2010-08-30 Thread Lance Norskog
It generally works best to tune the Solr caches and allocate enough
RAM to run comfortably. Linux  Windows et. al. have their own cache
of disk blocks. They use very good algorithms for managing this cache.
Also, they do not make long garbage collection passes.

On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian anith...@gmail.com wrote:
 Lance,

 Thanks for your help. What do you mean by that the OS can keep the index in
 memory better than Solr? Do you mean that you should use another means to
 keep the index in memory (i.e. ramdisk)? Is there a generally accepted heap
 size/index size that you follow?

 Thanks
 Amit

 On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com wrote:

 The price-performance knee for small servers is 32G ram, 2-6 SATA
 disks on a raid, 8/16 cores. You can buy these servers and half-fill
 them, leaving room for expansion.

 I have not done benchmarks about the max # of processors that can be
 kept busy during indexing or querying, and the total numbers: QPS,
 response time averages  variability, etc.

 If your index file size is 8G, and your Java heap is 8G, you will do
 long garbage collection cycles. The operating system is very good at
 keeping your index in memory- better than Solr can.

 Lance

 On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com wrote:
  Hi all,
 
  I am curious to know get some opinions on at what point having more CPU
  cores shows diminishing returns in terms of QPS. Our index size is about
 8GB
  and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
  Currently I have the heap to 8GB.
 
  We are looking to get more servers to increase capacity and because the
  warranty is set to expire on our old servers and so I was curious before
  asking for a certain spec what others run and at what point does having
 more
  cores cease to matter? Mainly looking at somewhere between 4-12 cores per
  server.
 
  Thanks!
  Amit
 



 --
 Lance Norskog
 goks...@gmail.com





-- 
Lance Norskog
goks...@gmail.com


Re: Hardware Specs Question

2010-08-30 Thread 朱炎詹
I am also curious as Amit does. Can you make an example about the garbage 
collection problem you mentioned?


- Original Message - 
From: Lance Norskog goks...@gmail.com

To: solr-user@lucene.apache.org
Sent: Tuesday, August 31, 2010 9:14 AM
Subject: Re: Hardware Specs Question



It generally works best to tune the Solr caches and allocate enough
RAM to run comfortably. Linux  Windows et. al. have their own cache
of disk blocks. They use very good algorithms for managing this cache.
Also, they do not make long garbage collection passes.

On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian anith...@gmail.com wrote:

Lance,

Thanks for your help. What do you mean by that the OS can keep the index 
in

memory better than Solr? Do you mean that you should use another means to
keep the index in memory (i.e. ramdisk)? Is there a generally accepted 
heap

size/index size that you follow?

Thanks
Amit

On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com wrote:


The price-performance knee for small servers is 32G ram, 2-6 SATA
disks on a raid, 8/16 cores. You can buy these servers and half-fill
them, leaving room for expansion.

I have not done benchmarks about the max # of processors that can be
kept busy during indexing or querying, and the total numbers: QPS,
response time averages  variability, etc.

If your index file size is 8G, and your Java heap is 8G, you will do
long garbage collection cycles. The operating system is very good at
keeping your index in memory- better than Solr can.

Lance

On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com 
wrote:

 Hi all,

 I am curious to know get some opinions on at what point having more 
 CPU
 cores shows diminishing returns in terms of QPS. Our index size is 
 about

8GB
 and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
 Currently I have the heap to 8GB.

 We are looking to get more servers to increase capacity and because 
 the
 warranty is set to expire on our old servers and so I was curious 
 before
 asking for a certain spec what others run and at what point does 
 having

more
 cores cease to matter? Mainly looking at somewhere between 4-12 cores 
 per

 server.

 Thanks!
 Amit




--
Lance Norskog
goks...@gmail.com







--
Lance Norskog
goks...@gmail.com








___b___J_T_f_r_C
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10 
14:35:00




Re: Hardware Specs Question

2010-08-30 Thread Amit Nithian
Lance,

makes sense and I have heard about the long GC times on large heaps but I
personally haven't experienced a slowdown but that doesn't mean anything
either :-). Agreed that tuning the SOLR caching is the way to go.

I haven't followed all the solr/lucene changes but from what I remember
there are synchronization points that could be a bottleneck where adding
more cores won't help this problem? Or am I completely missing something.

Thanks again
Amit

On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹) scott@udngroup.comwrote:

 I am also curious as Amit does. Can you make an example about the garbage
 collection problem you mentioned?

 - Original Message - From: Lance Norskog goks...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, August 31, 2010 9:14 AM
 Subject: Re: Hardware Specs Question



  It generally works best to tune the Solr caches and allocate enough
 RAM to run comfortably. Linux  Windows et. al. have their own cache
 of disk blocks. They use very good algorithms for managing this cache.
 Also, they do not make long garbage collection passes.

 On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian anith...@gmail.com wrote:

 Lance,

 Thanks for your help. What do you mean by that the OS can keep the index
 in
 memory better than Solr? Do you mean that you should use another means to
 keep the index in memory (i.e. ramdisk)? Is there a generally accepted
 heap
 size/index size that you follow?

 Thanks
 Amit

 On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com
 wrote:

  The price-performance knee for small servers is 32G ram, 2-6 SATA
 disks on a raid, 8/16 cores. You can buy these servers and half-fill
 them, leaving room for expansion.

 I have not done benchmarks about the max # of processors that can be
 kept busy during indexing or querying, and the total numbers: QPS,
 response time averages  variability, etc.

 If your index file size is 8G, and your Java heap is 8G, you will do
 long garbage collection cycles. The operating system is very good at
 keeping your index in memory- better than Solr can.

 Lance

 On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com
 wrote:
  Hi all,
 
  I am curious to know get some opinions on at what point having more 
 CPU
  cores shows diminishing returns in terms of QPS. Our index size is 
 about
 8GB
  and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
  Currently I have the heap to 8GB.
 
  We are looking to get more servers to increase capacity and because 
 the
  warranty is set to expire on our old servers and so I was curious 
 before
  asking for a certain spec what others run and at what point does 
 having
 more
  cores cease to matter? Mainly looking at somewhere between 4-12 cores
  per
  server.
 
  Thanks!
  Amit
 



 --
 Lance Norskog
 goks...@gmail.com





 --
 Lance Norskog
 goks...@gmail.com




 



 ___b___J_T_f_r_C
 Checked by AVG - www.avg.com
 Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10
 14:35:00




Re: Hardware Specs Question

2010-08-30 Thread Lance Norskog
There are synchronization points, which become chokepoints at some
number of cores. I don't know where they cause Lucene to top out.
Lucene apps are generally disk-bound, not CPU-bound, but yours will
be. There are so many variables that it's really not possible to give
any numbers.

Lance

On Mon, Aug 30, 2010 at 8:34 PM, Amit Nithian anith...@gmail.com wrote:
 Lance,

 makes sense and I have heard about the long GC times on large heaps but I
 personally haven't experienced a slowdown but that doesn't mean anything
 either :-). Agreed that tuning the SOLR caching is the way to go.

 I haven't followed all the solr/lucene changes but from what I remember
 there are synchronization points that could be a bottleneck where adding
 more cores won't help this problem? Or am I completely missing something.

 Thanks again
 Amit

 On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹) 
 scott@udngroup.comwrote:

 I am also curious as Amit does. Can you make an example about the garbage
 collection problem you mentioned?

 - Original Message - From: Lance Norskog goks...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, August 31, 2010 9:14 AM
 Subject: Re: Hardware Specs Question



  It generally works best to tune the Solr caches and allocate enough
 RAM to run comfortably. Linux  Windows et. al. have their own cache
 of disk blocks. They use very good algorithms for managing this cache.
 Also, they do not make long garbage collection passes.

 On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian anith...@gmail.com wrote:

 Lance,

 Thanks for your help. What do you mean by that the OS can keep the index
 in
 memory better than Solr? Do you mean that you should use another means to
 keep the index in memory (i.e. ramdisk)? Is there a generally accepted
 heap
 size/index size that you follow?

 Thanks
 Amit

 On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com
 wrote:

  The price-performance knee for small servers is 32G ram, 2-6 SATA
 disks on a raid, 8/16 cores. You can buy these servers and half-fill
 them, leaving room for expansion.

 I have not done benchmarks about the max # of processors that can be
 kept busy during indexing or querying, and the total numbers: QPS,
 response time averages  variability, etc.

 If your index file size is 8G, and your Java heap is 8G, you will do
 long garbage collection cycles. The operating system is very good at
 keeping your index in memory- better than Solr can.

 Lance

 On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com
 wrote:
  Hi all,
 
  I am curious to know get some opinions on at what point having more 
 CPU
  cores shows diminishing returns in terms of QPS. Our index size is 
 about
 8GB
  and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
  Currently I have the heap to 8GB.
 
  We are looking to get more servers to increase capacity and because 
 the
  warranty is set to expire on our old servers and so I was curious 
 before
  asking for a certain spec what others run and at what point does 
 having
 more
  cores cease to matter? Mainly looking at somewhere between 4-12 cores
  per
  server.
 
  Thanks!
  Amit
 



 --
 Lance Norskog
 goks...@gmail.com





 --
 Lance Norskog
 goks...@gmail.com




 



 ___b___J_T_f_r_C
 Checked by AVG - www.avg.com
 Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10
 14:35:00






-- 
Lance Norskog
goks...@gmail.com