"Mike Klaas" <[EMAIL PROTECTED]> wrote:
>
> My main comment is that the benefits of this change can be achieved by
> using the non-compound index format. For people that care about the
> difference in performance, it isn't difficult to configure your system
> to mitigate the problems of the non-compound format, and they probably
> have already done so.
>
> It would help the people who are file-descriptor conscious, but it
> also increases lucene's fd footprint by a factor of four.
That's right - people worried about indexing performance can easily apply
setUseCompound(false).
My guess though is that most people just keep the default setting.
Large systems that maintain many indexes, would be worried about the number
of file descriptors and would use compound format. But it is not clear to
me what would be the preference in such systems - four times the file
descriptors, or twice as much the IO? If such a third choice is supported
- "semmi compound" - how many systems would {be able to / choose to} use
it? Depending on the specific system maybe.
I verified the IO factor, by counting bytes read in
FSIndexInput.readInternal(byte[],int,int) and written in
FSIndexOutput.flushBuffer(byte[],int):
round vect stor cmpnd runCnt recsPerRun rec/s elapsedSec write
read
0 true true true 1 100000 153.4 651.74 2 GB
1.9 GB
- 1 true true false - - 1 - - 100000 169.5 - - 589.82 - 1 GB
0.9 GB
2 false false true 1 100000 151.4 660.41 2 GB
1.9 GB
- 3 false false false - - 1 - - 100000 168.0 - - 595.37 - 1 GB
0.9 GB
Indeed, there is a factor of two for both read bytes and written bytes.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]