Hi, like the sound of this. What I am not familiar with in terms of Lucene is how the index get's swapped in and out of memory. When it comes to database tables (non partitionable tables at least) I know that one should have enough memory to fit the entire index into memory to avoid file-sorts for instance which of course is slow.
If one operates mostly on the newest added entries will those always be mapped into a cache or something. How should I think ? Hmmm... I think now that I need to create a rotation-partition scheme where I separate cold (old entries) and warm entries... /M On Tue, Jun 30, 2009 at 11:29 AM, Uwe Schindler <uschind...@pangaea.de>wrote: > > On Mon, 2009-06-29 at 09:47 +0200, Marcus Herou wrote: > > > Index size(and growing): 16Gx8 = 128G > > > Doc size (data): 20k > > > Num docs: 90M > > > Num users: Few hundred but most critical is that the admin staff which > > is > > > using the index all day long. > > > Query types: Example: title:"Iphone" OR description:"Iphone" sorted by > > > publishedDate... = Very simple, no fuzzy searches etc. However since > the > > > dataset is large it will consume memory on sorting I guess. > > > > > > Could not one draw any conclusions about best-practice in terms of > > hardware > > > given the above "specs" ? > > > > Can you give us an estimate of the number of concurrent searches in > > prime time and in what range a satisfactory response time would be? > > > > Going for a fully RAM-based search on a corpus of this size would mean > > that each machine holds about 30GB of index (taken from your hardware > > suggestion). I would expect that such a machine would be able to serve > > something like 500-1000 searches/second (highly dependent on the index > > and the searches, but what you're describing sounds simple enough) if we > > just measure the raw search time and lookup of one or two fields for the > > first 20 hits. It that what you're aiming for? > > > > Wrapping in web services and such lowers the number of searches that can > > be performed, which makes the RAM-option even more expensive relative to > > a harddisk or SSD solution. > > I would never say: "I copy an index into a RAMDirectory or something like > that". I would buy enough RAM to fit as most as possible into RAM and (as > we > for sure are on a 64bit platform) use MMapDirectory instead of > SimpleFSDirectory or NIOFSDirectory (I am talking with Lucene 2.9 class > names, where FSDirs can be simply instantiated, as you may have noticed). > MMapDirectory uses the index like a swap file that is mapped into address > space. The OS kernel's will then use the index like RAM and map it into > real > RAM as needed. We had the discussion a lot of times in this mailing list > (search for MMapDirectory in the archives). So the simple answer is always: > If 64 bit platform with lots of RAM, use MMapDirectory. On Windows this is > still buggy (but with 2.9 there is a workaround in MMapDirectory). When you > warm your searchers before (I think you will do...), the Operating system > kernel will "swap" in as much as possible from the index. > > Uwe > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/