Re: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario

Neville Burnell Tue, 05 Sep 2006 22:11:54 -0700

> Otherwise it would be possible for the document IDs of the 
> documents to change between the time the search is run and
> the time the document is referenced.

Well, I started coding to use Searcher#search_each and found myself
recoding most of the infrastructure of Index#search_each (and its
friends) simply to avoid its @dir.synchronize when what you were saying
above started to sink in. Ie, as I understand it, I can have concurrent
searchers if the index is read-only but not if I have a writer.

So while its possible to have multiple readers, 1 writer, the 1 writer
requirement forces use of synchronized, which means that the readers
must be serialised and not concurrent - is this correct?

Kind Regards

Neville

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of David Balmain
Sent: Monday, 4 September 2006 2:05 PM
To: [email protected]
Subject: Re: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario

On 9/4/06, Neville Burnell <[EMAIL PROTECTED]> wrote:
> Thanks for your reply Dave,
>
> > The one situation where you might be better off using a single 
> > IndexReader is when you are relying on caching.
> > Filters and Sorts are cached per IndexReader and Sorts in particular

> > can take up a fair chunk of memory so if you have a large index 
> > (large as in number of documents, not size of data) then you may be 
> > better off with a single IndexReader. IndexReader is thread-safe so 
> > using it concurrently should be fine.
>
> Just to clarify, I'm using Ferret::Index::Index concurrently at the 
> moment, and I'm not getting concurrent searches via #search_each. IE, 
> if a slow wild-card search arrives first, all subsequent searches wait

> until the wild-card search completes.
>
> So I guess #search_each is "synchronised"?

That's correct. Otherwise it would be possible for the document IDs of
the documents to change between the time the search is run and the time
the document is referenced. For the benefit of those who don't know
this, document IDs are not constant. They represent the position of the
document in the index. Think of it like an array. Let's add 5 documents
to the index.

    [0,1,2,3,4]

Now let's delete documents 1 and 2;

    [0,3,4]

So document 4 now has a doc_id of 2. If this happened in the middle of a
search you'd have a problem. So instead we synchronize the the
Index#search and Index#search_each methods. Now this isn't the case for
Searcher#search and Searcher#search_each since the IndexReader that
Searcher uses remains consistent so you should be able to use Searcher
concurrently.

> Therefore to have multiple searches on an index concurrently, I really

> need an IndexReader per thread and I would need to manage a pool of 
> reusable IndexReaders?

Using Ferret::Index::Index this would be true. But if performance is a
concern you should definitely use a Ferret::Search::Searcher object
instead anyway and you'll be able to use it concurrently.

> Any pointers on how other web apps [not using Rails] handle multiple 
> Ferret readers?

Let us know if using the Searcher object isn't adequate.

> > You can actually pass an array of readers as the first (only)
> parameter to
> > IndexReader.new.
> >
> >    reader = IndexReader.new([reader1, reader2, reader3])
> >
>
> Interesting ... I had a look, but I don't really understand what this 
> does? Would you elaborate please :D

A MultiReader object was initially what was used to read and search
multiple indexes at a time. This functionality is now simply handled by
the IndexReader object. There are several uses for this. One was to
store each model in a separate index and you could then offer search
across multiple models using a MultiReader. Another use-case might be to
have multiple indexes to speed up indexing. If for example you are
scraping websites it is a very good idea to have multiple scraping
processes. The best way to do this is to have each process indexing to
its own index. You could then search all indexes at once using a
MultiReader or you could also merge all indexes into a single index.

Hope that makes sense.

Cheers,
Dave
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] Help with Multiple Readers, 1 Writer scenario

Reply via email to