On Sat, Sep 17, 2011 at 08:52:41AM +0200, goran kent wrote: > I've been wondering (and I'll eventually get around to performing a > comparative test sometime this weekend) about IO and search > performance (ie, ignore OS caching).
Hmm, Lucy is actually designed to integrate with the system IO cache very tightly! We exploit mmap so that all Searchers are backed by the same IO cache memory pages. And if you have an Indexer going at the same time, the new index data it just wrote is also in the IO cache, and so is available immediately to a new Searcher. Very little gets read into process RAM when you open a Searcher. "The OS is our JVM." - Lucy developer Nate Kurz. > What's the biggest cause of search degradation when Lucy is chugging > through it's on-disk index? > > Physically *finding* data (ie, seeking and thrashing around the disk), > waiting for data to *transfer* from the disk to CPU? Well, the projects I've been involved with have taken the approach that there should always be enough RAM on the box to fit the necessary index files. "RAM is the new disk" as they say. I can tell you that once an index is in RAM, we're CPU bound. I can't provide you with analysis about performance characteristics when the index is not yet in RAM, though. We don't like to be in that state for very long. :) > I'm quite interested to know whether using an SSD where seek time and > other latency issues are almost zero would dramatically improve search > times. I've seen vast improvements when using them in RDBMS', but > this may not translate as well here. I would speculate that with SSDs you'd get a more graceful performance degradation as Lucy's RAM requirements start to exceed what the box can provide. But I have no numbers to back that up. Marvin Humphrey
