On 9/6/06, Neville Burnell <[EMAIL PROTECTED]> wrote: > > Otherwise it would be possible for the document IDs of the > > documents to change between the time the search is run and > > the time the document is referenced. > > Well, I started coding to use Searcher#search_each and found myself > recoding most of the infrastructure of Index#search_each (and its > friends) simply to avoid its @dir.synchronize when what you were saying > above started to sink in. Ie, as I understand it, I can have concurrent > searchers if the index is read-only but not if I have a writer. > > So while its possible to have multiple readers, 1 writer, the 1 writer > requirement forces use of synchronized, which means that the readers > must be serialised and not concurrent - is this correct?
Close, When you open an IndexReader on the index it is opened up on that particular version (or state) of the index. So any operations on the IndexReader (like searches) will only show what was in the index at the time you opened it. Any modifications to the index (usually through and IndexWriter) that occur after you open the IndexReader will not appear in your searches. So to keep searches up to date you need to close and reopen your IndexReader every time you commit changes to the index. So the writer doesn't force the use of synchronized. Rather it forces you to decide whether searches need to return the most up to date results available or if there can be a short delay between changes being written to the index and changes appearing in the search results. The Index class makes it as simple as possible to always search the latest index but there is a performance hit. Most of the time performance should be fine. The Ferret C core has been highly optimized and will still beat most other solutions hands down, even when used in this way. Now, if I were writing an application where search performance is a big issue (as it seems to be in your case) then I would start by using the base classes like IndexReader and IndexWriter (as we've already discussed). Like I just mentioned you might allow a delay between the time the index is modified and the time those modifications appear in search results. This would allow you to update the IndexReader every minute/hour/day/week without regard to what the IndexWriter is doing. This solution works well when when scraping webpages. Google's results, for example, aren't always completely up to date with the pages they index. If one of their results is a dead link it isn't the end of the world. If, however, you are indexing data in a database it often isn't this simple. If you use the previous solution with a database that allows deletes then you need some way to handle results that reference objects that have been deleted from the database. Otherwise you will need some way to synchronize on the index (probably on the Ferret::Store::Directory like Ferret::Index::Index does) so that no searches are done while the deletion is committed to the index and the IndexReaders are updated. Another solution which I'm going to experiment with is using the index as your database. You may still keep your original database but store any data in the index that will be shown back to the user as the result of a search. That way you don't need to worry about synchronization with the database. I don't think I've explained this very clearly here so feel free to try and clarify. I will be endeavoring to write this all down more clear and comprehensible manner so that everyone can work out the solution that best fits their needs. Cheers, Dave PS: The ideal solution for me would be an object database with Ferret-like full-text search built in. I've been thinking about this a lot lately. It would certainly fit the style of development used in many Rails apps. That is to say, all access to the database must go through the model as that is where all the validation is. If you are developing this way, why bother with the relational database and ORM solution. A good object database would serve the same purpose and would be a LOT more performant. Obviously this solution wouldn't be for everybody though so enterprise developers feel free to ignore. ;-) _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

