Ok, I tested this approach - not a clean code yet, just enough to test if indeed there is potential improvement here, and I think there is.
Performance results for the (short) tests I ran on my everyday machine: (read as: [oldTimeMillis] to [newTimeMillis] is [speed-up] for adding: n docs, maxBuffered=x mergeFactor=y) --- "new" runs before "old" --- 3605 to 2964 is 17% for: 500 docs, buf=10 mrg=3 2163 to 1923 is 11% for: 2000 docs, buf=100 mrg=4 6990 to 5759 is 17% for: 8000 docs, buf=200 mrg=5 20529 to 18286 is 10% for: 32000 docs, buf=400 mrg=6 44444 to 39677 is 10% for: 64000 docs, buf=1000 mrg=7 --- "old" runs before "new" --- 3926 to 2434 is 38% for: 500 docs, buf=10 mrg=3 2233 to 1732 is 22% for: 2000 docs, buf=100 mrg=4 6199 to 5678 is 8% for: 8000 docs, buf=200 mrg=5 20139 to 16955 is 15% for: 32000 docs, buf=400 mrg=6 42220 to 39507 is 6% for: 64000 docs, buf=1000 mrg=7 I will submit this in a Jira issue. Thoughts anyone? Any particular other setting you think should be tested? - Doron Doron Cohen/Haifa/[EMAIL PROTECTED] wrote on 18/10/2006 15:29:26: > > Currently IndexWriter.flushRamSegments() always merge all ram segments to > disk. Later it may merge more, depending on the maybe-merge algorithm. This > happens at closing the index and when the number of (1 doc) (ram) segments > exceeds max-buffered-docs. > > Can there be a performance penalty for always merging to disk first? > > Assume the following merges take place: > merging segments _ram_0 (1 docs) _ram_1 (1 docs) ... _ram_N (1_docs) into > _a (N docs) > merging segments _6 (M docs) _7 (K docs) _8 (L docs) into _b (N+M+K+L > docs) > > Alternatively, we could tell (compute) that this is going to happen, and > have a single merge: > merging segments _ram_0 (1 docs) _ram_1 (1 docs) ... _ram_N (1_docs) > _6 (M docs) _7 (K docs) _8 (L docs) into _b (N+M+K+L > docs) > > This would save writing the segemnt of size N to disk and reading it again. > For large enough N, Is there really potential save here? > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
