Although I've never indexed anything quite that large, i've had good
experiences with splitting the index out over a cluster.  (for
example, a set that would be about 4 seconds per complicated query on
one of our machines becomes around a second when spread out over 6)  I
think the reason why this helps is because of the disk I/O speed
bounding of performance that the others have mentioned, and how adding
another disk array adds to the effective disk bandwidth.

good luck
- andy g

On Fri, 07 May 2004 04:47:55 +0500, Will Allen <[EMAIL PROTECTED]> wrote:
> 
> Hi,
>         I am considering a project that would index 315+ million documents. I am 
> comfortable that the indexing will work well in creating an index ~800GB in size, 
> but am concerned about the query performance. (Is this a = bad
> assumption?)
> 
> What are the bottlenecks of performance as an index scales?  Memory?  = Cost is not 
> a concern, so what would be the shortcomings of a theoretical = machine with 16GB of 
> ram, 4-16 cpus and 1-2 terabytes of space?  Would it be = better to cluster machines 
> to break apart the query?
> 
> Thank you for your serious responses,
> Will Allen
> --
> ___________________________________________________________
> Sign-up for Ads Free at Mail.com
> http://promo.mail.com/adsfreejump.htm
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to