> I haven't read the details, but should maxBufferedDocs be exposed in > some subinterfaces instead of the MergePolicy interface?
I've been wondering about this, too, but haven't come to any strong opinions (yet). I figured maybe playing with a few merge policies might make things clearer. maxBufferedDocs: is this truly an invariant of all merge policies? I don't know. But actually, I think a possible question is whether merge policies should have any role in this at all, or if IndexWriter should just do it itself. If we go forward with Mike's stuff about writing a segment w/multiple docs w/o a merge, it's sounding more like the buffering of docs is not actually a merge policy a question. maxMergeDocs: should all merge policies accept this? > 1) A merge thread is started when an IndexWriter is created and > stopped when the IndexWriter is closed. (A single merge thread is used > for simplicity. Multiple merge threads could be used.) I haven't looked at pooling of threads, whether it be one or more than one, but I agree it needs to be looked at. I've heard that threads can't be created willy-nilly in J2EE apps but instead have to be drawn from the J2EE pool, so I figured when we look at pooling, we might need to accommodate that kind of constrained environment. > 2) The merge thread periodically checks if there is merge work to do. > This is done by synchronously checking segmentInfos and producing a > MergeSpecification if there is merge work to do. It does this check via a synchronized call on IndexWriter, right? > 3) If a MergeSpecification is produced, the merge thread goes ahead > and does the merge. Importantly, documents in the segments being > merged may be deleted while the concurrent merge is happening. These > deletes have to be remembered. Yup, and I haven't looked at that yet. > I see you start a thread whenever there is merge work. Would it be > hard to control system load? I think it needs to be looked at. Since concurrent conflicting merges aren't allowed, there is a bound on concurrency, but it might be too loose a bound. I'm setting up tests to start getting a feel for the dynamics. My strawman model was to start with as much concurrency as the data allowed, then scale it back as necessary. My main interest is in reducing the latency of add docs. In the example in my head, I have segments on a number of levels. Lets say merges at the higher end are going to take 3 seconds, 3 hours, and 3 days. I'd like to launch the 3 day merge and let it run in the background. It should be a while before a 3 hour merge is required, but if one is required before the 3 day merge is complete, I'd like not to block in that case, too. If load is an issue, the idea would be to lower the priority or suspend the 3 day merge while the 3 hour merge is going. My focus isn't on slowing things down, i.e., handling a system where you truly can't keep up, but in spreading out the big lumps of work, rather than putting them in the add doc control path. It's possible that at some point you'll want to do a merge that includes segments that are being merged concurrently. In that case, the code currently blocks. There are alternatives, like allowing more than mergeFactor segments on a level, at least temporarily, but I haven't gone that way yet. So my way of keeping things simple (if any version of concurrent can be called simple) is not to make blocking impossible, but to make it less likely. In the serial case, it's a certainty. The main thing I've been trying to understand up until now was the concurrency of IndexWriter#segmentInfos, given that multiple merges could be running. If you allow that merges could be running AND a merge might be blocked, you can't make a synchronized call on IndexWriter, because the blocked merge request holds that. But my most recent thinking has been that I've been going down the wrong path trying to separately synchronize segmentInfos. I think instead the merge threads can make a separate queue of merge results that IndexWriter can look at when it wants to. I'm gonna look at that soon. Currently my concurrent stuff won't work because of this part is incomplete. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]