> Otherwise it would be possible for the document IDs of the > documents to change between the time the search is run and > the time the document is referenced.
Well, I started coding to use Searcher#search_each and found myself recoding most of the infrastructure of Index#search_each (and its friends) simply to avoid its @dir.synchronize when what you were saying above started to sink in. Ie, as I understand it, I can have concurrent searchers if the index is read-only but not if I have a writer. So while its possible to have multiple readers, 1 writer, the 1 writer requirement forces use of synchronized, which means that the readers must be serialised and not concurrent - is this correct? Kind Regards Neville -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Balmain Sent: Monday, 4 September 2006 2:05 PM To: [email protected] Subject: Re: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario On 9/4/06, Neville Burnell <[EMAIL PROTECTED]> wrote: > Thanks for your reply Dave, > > > The one situation where you might be better off using a single > > IndexReader is when you are relying on caching. > > Filters and Sorts are cached per IndexReader and Sorts in particular > > can take up a fair chunk of memory so if you have a large index > > (large as in number of documents, not size of data) then you may be > > better off with a single IndexReader. IndexReader is thread-safe so > > using it concurrently should be fine. > > Just to clarify, I'm using Ferret::Index::Index concurrently at the > moment, and I'm not getting concurrent searches via #search_each. IE, > if a slow wild-card search arrives first, all subsequent searches wait > until the wild-card search completes. > > So I guess #search_each is "synchronised"? That's correct. Otherwise it would be possible for the document IDs of the documents to change between the time the search is run and the time the document is referenced. For the benefit of those who don't know this, document IDs are not constant. They represent the position of the document in the index. Think of it like an array. Let's add 5 documents to the index. [0,1,2,3,4] Now let's delete documents 1 and 2; [0,3,4] So document 4 now has a doc_id of 2. If this happened in the middle of a search you'd have a problem. So instead we synchronize the the Index#search and Index#search_each methods. Now this isn't the case for Searcher#search and Searcher#search_each since the IndexReader that Searcher uses remains consistent so you should be able to use Searcher concurrently. > Therefore to have multiple searches on an index concurrently, I really > need an IndexReader per thread and I would need to manage a pool of > reusable IndexReaders? Using Ferret::Index::Index this would be true. But if performance is a concern you should definitely use a Ferret::Search::Searcher object instead anyway and you'll be able to use it concurrently. > Any pointers on how other web apps [not using Rails] handle multiple > Ferret readers? Let us know if using the Searcher object isn't adequate. > > You can actually pass an array of readers as the first (only) > parameter to > > IndexReader.new. > > > > reader = IndexReader.new([reader1, reader2, reader3]) > > > > Interesting ... I had a look, but I don't really understand what this > does? Would you elaborate please :D A MultiReader object was initially what was used to read and search multiple indexes at a time. This functionality is now simply handled by the IndexReader object. There are several uses for this. One was to store each model in a separate index and you could then offer search across multiple models using a MultiReader. Another use-case might be to have multiple indexes to speed up indexing. If for example you are scraping websites it is a very good idea to have multiple scraping processes. The best way to do this is to have each process indexing to its own index. You could then search all indexes at once using a MultiReader or you could also merge all indexes into a single index. Hope that makes sense. Cheers, Dave _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

