"Mike Klaas" <[EMAIL PROTECTED]> wrote: > > My main comment is that the benefits of this change can be achieved by > using the non-compound index format. For people that care about the > difference in performance, it isn't difficult to configure your system > to mitigate the problems of the non-compound format, and they probably > have already done so. > > It would help the people who are file-descriptor conscious, but it > also increases lucene's fd footprint by a factor of four.
That's right - people worried about indexing performance can easily apply setUseCompound(false). My guess though is that most people just keep the default setting. Large systems that maintain many indexes, would be worried about the number of file descriptors and would use compound format. But it is not clear to me what would be the preference in such systems - four times the file descriptors, or twice as much the IO? If such a third choice is supported - "semmi compound" - how many systems would {be able to / choose to} use it? Depending on the specific system maybe. I verified the IO factor, by counting bytes read in FSIndexInput.readInternal(byte[],int,int) and written in FSIndexOutput.flushBuffer(byte[],int): round vect stor cmpnd runCnt recsPerRun rec/s elapsedSec write read 0 true true true 1 100000 153.4 651.74 2 GB 1.9 GB - 1 true true false - - 1 - - 100000 169.5 - - 589.82 - 1 GB 0.9 GB 2 false false true 1 100000 151.4 660.41 2 GB 1.9 GB - 3 false false false - - 1 - - 100000 168.0 - - 595.37 - 1 GB 0.9 GB Indeed, there is a factor of two for both read bytes and written bytes. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]