"Doug Cutting (JIRA)" <[EMAIL PROTECTED]> wrote on 26/11/2007 20:14:43:
> > I found out however that delaying the syncs (but intending tosync) also > means keeping the file handles open [...] > > Not necessarily. You could just queue the file names for sync, > close them, and then have the background thread open, sync and > close them. The close could trigger the OS to sync things > faster in the background. Then the open/sync/close could > mostly be a no-op. Might be worth a try. Good point. Actually even with a background thread we must use file-names, because otherwise there's no control over the number of open file handles. In addition, my tests on XP indicated that this way many syncs were no-ops - i.e. close() and later open+sync+close was faster than flush() and later sync+close. On both XP and Linux, a background thread was faster than a sync-at-end. Some numbers (no-sync, immediate-sync, at-end, background): 100 files of 10K, Linux: 5.7, 5.8, 6.4, 5.9 XP: 6.6, 11.1, 7.7, 6.8 1,000 files of 1K Linux: 5.8, 13.8, 11.2, 6.0 XP: 8.1, 44.5, 19.2, 15.0 10,000 files of 100 chars Linux: 7.0, 89.9, 68.0, 60.3 So, as much as I am not happy about adding a thread, it seems to be faster, at least for this synthetic test.. I'm curious to see Mike's actual Lucene numbers. In any case we should not sync files saved during non-commit writes. Theses are most writes for large indexes with AutoCommit=false. Doron --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]