Re: Very large number of indices in distributed environment

Otis Gospodnetic Thu, 04 Aug 2005 10:04:04 -0700

Hi Benjamin,

I don't know what exactly your application is doing, but option 2)
sounds the most manageable to me.


Otis

--- Benjamin Reitzammer <[EMAIL PROTECTED]> wrote:

> Hi,
> we are in the process of planning a search feature of a product and
> we
> are having quite a hard time figuring out the "right" way to do it.
> 
> The requirements for our app are the following:
> 1) Large number of indices (at _least_ 10000)
> 2) The amount of data involved per index is not very high, but
> because
> of the number of indices involved the data set will be something
> about
> 500 - 1000 GB
> 3) The searching capabilities must be fail safe, while it's
> acceptable
> if deletes/updates can take some time.
> 4) The majority of operations will be searching the indices.
> 
> I've followed the mailing list intensively the last month and
> especially the "Best Practices for Distributing Lucene Indexing and
> Searching"
> (http://marc.theaimsgroup.com/?l=lucene-user&m=110971318020691&w=2)
> and "Real time indexing and distribution to lucene on separate boxes
> (long)"
> (http://marc.theaimsgroup.com/?l=lucene-user&m=107900097217474&w=2)
> threads provided some interesting insight.
> 
> But still our requirements are a bit different.
> 
> My thoughts how the above could be handled, so far are:
> 
> 1) Have one *really big* "master"  which handles all tasks related to
> index manipulation. Sync the indices according to Doug's tips
> http://marc.theaimsgroup.com/?l=lucene-user&m=110973989200204&w=2 out
> to a cluster of slaves that are responsible for searching.
> Problem: How to make sure that indices across  slaves are in sync. 
> Big Problem: Syncing of this large number of indices will cause a lot
> of traffic and cause already quite a load on the slaves (not to speak
> of the master)
> 
> 1.1) Is it safe to _search_ (only) an index mounted via NFS? If yes,
> then the search boxes could mount the indices on the master box. But
> this solution would probably lead to some serious perfomance issues
> because of the needed disk I/O on the master.
> Though I'd love to be proven wrong on this one.
> 
> 2) Split up index collection into smaller portions and distribute a
> certain number of indices (~ up to 1000 indices) into smaller
> autonomous clusters, that are completely responsible for their
> collection of indices.
> Problem: How do I keep index distribution dynamic so I don't have to
> hardcode where to look for a certain index (that's not a real lucene
> issue, but more one of distributed computing, but nevertheless I
> thought you guys might know a way to solve it).
> 
> Any ideas on this? 
> Has anyone ever worked with such a large number of Lucene indices
> (and
> the amount of data it involves)?
> 
> I appreciate your help very much.
> 
> Cheers
> 
> Benjamin
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Very large number of indices in distributed environment

Reply via email to