"Mike Klaas" <[EMAIL PROTECTED]> wrote:
>
> My main comment is that the benefits of this change can be achieved by
> using the non-compound index format.  For people that care about the
> difference in performance, it isn't difficult to configure your system
> to mitigate the problems of the non-compound format, and they probably
> have already done so.
>
> It would help the people who are file-descriptor conscious, but it
> also increases lucene's fd footprint by a factor of four.

That's right - people worried about indexing performance can easily apply
setUseCompound(false).

My guess though is that most people just keep the default setting.

Large systems that maintain many indexes, would be worried about the number
of file descriptors and would use compound format. But it is not clear to
me what would be the preference in such systems - four times the file
descriptors, or twice as much the IO?  If such a third choice is supported
- "semmi compound" - how many systems would {be able to / choose to} use
it? Depending on the specific system maybe.

I verified the IO factor, by counting bytes read in
FSIndexInput.readInternal(byte[],int,int) and written in
FSIndexOutput.flushBuffer(byte[],int):

 round  vect  stor cmpnd   runCnt   recsPerRun  rec/s  elapsedSec    write
read
     0  true  true  true        1       100000  153.4      651.74    2 GB
1.9 GB
 -   1  true  true false -  -   1 -  -  100000  169.5 -  - 589.82 -  1 GB
0.9 GB
     2 false false  true        1       100000  151.4      660.41    2 GB
1.9 GB
 -   3 false false false -  -   1 -  -  100000  168.0 -  - 595.37 -  1 GB
0.9 GB

Indeed, there is a factor of two for both read bytes and written bytes.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to