"Doug Cutting (JIRA)" <[EMAIL PROTECTED]> wrote on 26/11/2007 20:14:43:
> > I found out however that delaying the syncs (but intending tosync) also
> means keeping the file handles open [...]
>
> Not necessarily. You could just queue the file names for sync,
> close them, and then have the background thread open, sync and
> close them. The close could trigger the OS to sync things
> faster in the background. Then the open/sync/close could
> mostly be a no-op. Might be worth a try.
Good point. Actually even with a background thread we must
use file-names, because otherwise there's no control over
the number of open file handles.
In addition, my tests on XP indicated that this way many syncs
were no-ops - i.e. close() and later open+sync+close was
faster than flush() and later sync+close.
On both XP and Linux, a background thread was faster than
a sync-at-end.
Some numbers (no-sync, immediate-sync, at-end, background):
100 files of 10K,
Linux: 5.7, 5.8, 6.4, 5.9
XP: 6.6, 11.1, 7.7, 6.8
1,000 files of 1K
Linux: 5.8, 13.8, 11.2, 6.0
XP: 8.1, 44.5, 19.2, 15.0
10,000 files of 100 chars
Linux: 7.0, 89.9, 68.0, 60.3
So, as much as I am not happy about adding a thread, it seems
to be faster, at least for this synthetic test.. I'm curious to
see Mike's actual Lucene numbers.
In any case we should not sync files saved during non-commit writes.
Theses are most writes for large indexes with AutoCommit=false.
Doron
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]