"Doug Cutting (JIRA)" <[EMAIL PROTECTED]> wrote on 26/11/2007 20:14:43:

> > I found out however that delaying the syncs (but intending tosync) also
> means keeping the file handles open [...]
>
> Not necessarily.  You could just queue the file names for sync,
> close them, and then have the background thread open, sync and
> close them.  The close could trigger the OS to sync things
> faster in the background.  Then the open/sync/close could
> mostly be a no-op.  Might be worth a try.

Good point. Actually even with a background thread we must
use file-names, because otherwise there's no control over
the number of open file handles.

In addition, my tests on XP indicated that this way many syncs
were no-ops - i.e. close() and later open+sync+close was
faster than flush() and later sync+close.

On both XP and Linux, a background thread was faster than
a sync-at-end.

Some numbers (no-sync, immediate-sync, at-end, background):
100 files of 10K,
  Linux: 5.7, 5.8, 6.4, 5.9
     XP: 6.6, 11.1, 7.7, 6.8
1,000 files of 1K
  Linux: 5.8, 13.8, 11.2, 6.0
     XP: 8.1, 44.5, 19.2, 15.0
10,000 files of 100 chars
  Linux: 7.0, 89.9, 68.0, 60.3

So, as much as I am not happy about adding a thread, it seems
to be faster, at least for this synthetic test.. I'm curious to
see Mike's actual Lucene numbers.

In any case we should not sync files saved during non-commit writes.
Theses are most writes for large indexes with AutoCommit=false.

Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to