On Mon, May 10, 2010 at 3:08 PM, Shai Erera <[email protected]> wrote: > That's still weird Mike - we call optimize in addIndexes to reduce the > number of SRs, that's fair. So why don't we do that in addIndexesNoOpt?
I agree it's weird and inconsistent and all that :) > There, we get a SR per SI. And name of the method suggests optimize() is > avoided on purpose ... it's as if addIndexesNoOpt should be called > addDirectories, and we should let the caller decide whether to call > optimize() on all IRs (including the local) before he calls addIndexes, or > NoOpt. Well... there used to be an addIndexes(Directory..), that did an optimize, I think both before and after. So addIndexesNoOpt was reacting to that. > I mean, we call those methods in confusing names, and don't follow the same > approach when handling each ... I can live a/ addIndexes existing to take IR > extensions, and w/ addDirectories if you don't need IR extensions. But > calling/not-calling optimize() is inconsistent, and from what I understand, > for no good reason? It's for a good reason -- it's to attempt to ensure that the single .merge done by that method isn't insanely slow, if your index has alot of segments. But, really, those merges ought to go through a merge policy/scheduler, so we do mergeFactor at a time, we do up to N concurrently, etc. So I think this pre-optimize is a hack to try to keep how many readers we merge at once, contained. > I'm asking these questions b/c someone asked me the other day when one > should call each and what the hell that NoOpt is doing in the name ... I was > confused when I was asked the question, and I'm confused now :). I hear you... > So how about if we: > 1) Rename addIndexesNoOptimize to addDirectories Hmm addDirectories feels a bit too low level... why not call it addIndexes (it's a different signature since it accepts Dir not IR). > 2) Remove optimize() call from addIndexes +1 But advertise this in back compat breaks. We could also preserve old way under Version. > 3) Document that clearly in both, w/ a recommendation to call optimize() > before on any of the Directories/Indexes if it's a concern. Good. > That way, we maintain all the flexibility in the API - addIndexes allows for > using IR extensions, addDirectories is considered more efficient, by > allowing the merges to happen concurrently (depending on MS) and also > factors in the MP. So unless you have an IR extension, addDirectories is > really the one you should be using. And you have the freedom to call > optimize() before each if you care about it, or don't if you don't care. > Either way, incurring the cost of optimize() is entirely in your hands. Good! Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
