Paul - I'm doing the same (smaller indices) for Simpy.com for similar reasons (fast, independent and faster reindexing, etc.). Each index has its own IndexSearcher, and they are kept in a LRU data structure. Before each search the index version is checked, and new IndexSearcher created in case the index changed.
Otis --- Sven Duzont <[EMAIL PROTECTED]> wrote: > Hello, > > We are already using this design in production for a email job > application system. > Each client (company) have an account and may have multiple users > When a new client is created, a new lucene index is automatically > created when new job-applications arrive for this account. > Job applications are in principle owned by users, but some times they > can share it with other users in same account, so the search can be > user-independent. > This design works fine for us as the flow of job applications is not > the same for different accounts. There are lucene indices that are > more often updated than others. > It also permit us to rebuild one client index without impacting > others > > We have only one problem : when the index is updated and searched at > the same time, the index may be corrupted and an exception may be > thrown by the indexer ("Read past OEF", i unfortunately don't have > the stack trace right now under my hand). I think that it is because > the search and indexation are made in two different java processes. > We will rework the routines to lock the search when an indexation is > running and vice versa > > --- sven > > lundi 11 juillet 2005, 03:03:29, vous avez écrit: > > > PS> On 11/07/2005, at 10:43 AM, Chris Hostetter wrote: > > >> > >> : > Generally speaking, you only ever need one active Searcher, > which > >> : > all of > >> : > your threads should be able to use. (Of course, Nathan says > that > >> : > in his > >> : > code base, doing this causes his JVM to freeze up, but I've > >> never seen > >> : > this myself). > >> : > > >> : Thanks for your response Chris. Do you think we are going down > a > >> : deadly path by having "many smaller" IndexSearchers open rather > than > >> : "one very large one"? > >> > >> I'm sorry ... i think i may have confused you, i forgot that this > >> thread > >> was regarding partioning the index. i ment one searcher *per > >> index* ... > >> don't try to make a seperate searcher per client, or have a pool > of > >> searchers, or anything like that. But if you have a need to > partition > >> your data into multiple indexes, then have one searcher per index. > > PS> Actually I think I confused you first, and then you confused me > PS> back... Let me... uhh, clarify 'ourselves'.. :) > > PS> My use of the word 'pool' was an error on my part (and a very > silly > PS> one). I should really have meant "LRU Cache". > > PS> We have recognized that there is a finite # of IndexSearchers > that > PS> can probably be open at one time. So we'll use an LRU cache to > make > PS> sure only the 'actively' in use Searchers are open. However > there > PS> will only be one IndexSearcher for a given physical Index > directory > PS> open at a time, we're just making sure only the recently used > ones > PS> are kept open to keep memory limits sane. > > >> > >> now assume you partition your data into two seperate indexes, > >> unless the > >> way you partition your data lets you cleanly so that each of hte > >> two indexes contains only half the number of terms as if you had > >> one big > >> index, then sorting on a field in those two indexes will require > >> more RAM > >> then sorting on the same data in asingle index. > >> > > PS> Our data is logically segmented into Projects. Each Project can > > PS> contain Documents and Mail. So we currently have 2 physical > Indexes > PS> per Project. 90% of the time our users work within one project > at a > PS> time, and only work in "document mode" or "mail mode". Every now > and > PS> then they may need to do a general search across all Entities > and/or > PS> Projects they are involved in (accomplished with Mulitsearcher). > PS> Perhaps we should just put Documents and Mail all in one Index > for a > PS> project (ie have 1 Index per project)?? > > PS> Part of the reason in to partition is to make the cost of > rebuilding > PS> a given project cheaper. Reduces the risk of an Uber-Index being > PS> corrupted and screwing all the users up. We can order the > reindexing > PS> of projects to make sure our more important customers get > re-indexed > PS> first if there is a serious issue. > > PS> I would have thought that partitioning indexes would have > performance > PS> benefits too: a lot less data to scan (most of the data is > already > PS> relevant). > > PS> Since this isn't in production yet, I'd rather be proven wrong > now > PS> rather than later! :) > > PS> Thanks for your input. > > PS> Paul --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]