I'm a little late to this thread. But is there any performance difference between the compound index format and the multifile index format when *searching*? The Lucene book mentions a performance difference when *indexing*, but not when searching.
Monsur > -----Original Message----- > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 27, 2005 6:45 PM > To: java-user@lucene.apache.org > Subject: RE: Hardware Question > > Ah - my brain was off. :) > In the Lucene book we refer to that index format as "compound index > format", while the original format we call "multifile index format" > > http://www.lucenebook.com/search?query=compound+index > http://www.lucenebook.com/search?query=multifile+index > > Yes, the latter will give you a bit more juice. > > Otis > > > --- Mark Bennett <[EMAIL PROTECTED]> wrote: > > > My apologies Otis, I should have spelled that out. > > > > I'm going to take a stab at answering this. But please, others on > > the list, > > chime in with corrections / clarifications. > > > > CFS = "compact file system" or "consolidate file system" or > something > > like > > that. > > > > Essentially, each Lucene index segment is actually a set of files; > > files for > > a segment have a common file name and then a set of extensions; OR a > > segment > > is just stored as ONE file, with a .cfs extension. > > > > CFS means that the multiple files for that segment have been joined > > together > > into one physical file; inside there is actually the original set of > > logical > > files, but on your disk it's just one file and one set of file > > handles to > > open that segmgent. > > > > If you do a DIR / ls on your indexes, if you see a bunch of .cfs > > files, then > > you're using CFS. The default for the past version or so > is that you > > DO get > > CFS files unless you say otherwise. > > > > I think the idea is that, generally, having fewer physical files is > > better, > > in terms of file handles, etc. But for search performance, I'm not > > sure if > > that's always the best case; certainly for indexing it takes more > > work to > > create CFS files. > > > > > > -----Original Message----- > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, July 27, 2005 3:20 PM > > To: java-user@lucene.apache.org; [EMAIL PROTECTED] > > Subject: RE: Hardware Question > > > > What's CFS? Cryptographic File System? I'm not being sarcastic > > here, > > I'm really curious about what you referring to. > > > > Otis > > > > --- Mark Bennett <[EMAIL PROTECTED]> wrote: > > > > > Also, non-hardware, have you considered turning off CFS? > > > > > > Our client told us this sped up their system. > > > > > > -----Original Message----- > > > From: Chris Lamprecht [mailto:[EMAIL PROTECTED] > > > Sent: Wednesday, July 27, 2005 11:52 AM > > > To: java-user@lucene.apache.org > > > Subject: Re: Hardware Question > > > > > > It depends on your usage. When you search, does your code also > > > retrieve the docs (using Searcher.document(n), for instance). If > > > your > > > index is 8GB, part of that is the "indexed" part (searchable), and > > > part is just "stored" document fields. > > > > > > It may be as simple as adding more RAM (try 4, 6, and 8GB) -- but > > not > > > for your java heap -- instead for the linux filesystem cache. > > > > > > I suggest first adding some simple timing output to your search. > > You > > > want to see how much time you are spending in the call to > search(), > > > and then how much time you're spending pulling the Documents from > > the > > > index (and how much time you're spending in other parts of your > > > search > > > application). The call to search() is typically CPU-intensive, > > > while > > > pulling Documents is I/O-bound. And RAM is about 5 or 6 orders of > > > magnitude faster than disk I/O. > > > > > > -chris > > > > > > On 7/27/05, Michael Celona <[EMAIL PROTECTED]> wrote: > > > > I am going over ways to increase overall search performance. > > > > > > > > > > > > > > > > Currently, I have a dual zeon with 2G of ram dedicated to java > > > searching > > > an > > > > 8G index on one 7200 rpm drive. > > > > > > > > > > > > > > > > Which will give the greatest payoff? > > > > > > > > > > > > > > > > 1) Going to 64bit server and giving more memory to java > > with > > > faster > > > > drives > > > > > > > > > > > > > > > > Or > > > > > > > > > > > > > > > > 2) Staying with 32bit server but going with faster drives > > and > > > > splitting the operating system from the index drive. > > > > > > > > > > > > > > > > > > > > > > > > Basically, what are the performance improvements from separating > > > the > > > > operation system form the index drive(s). > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Michael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]