What I would do is: In the warm method, load a FieldCache for every field I was going to end up using a FieldCache for. If its just for sorting, I might do a search with a sort on every field I was going to sort on. That will get the segment FieldCaches into RAM before the SegmentReader is put into use.
I might also do a search or two that hits a lot of terms to get some of the index into RAM. Or maybe walk a termenum- or anything one normally does when warming Readers (like Solr does, or many other home grown solutions.) I don't think there is anything special in this case. You don't have to hit it with every unique search you expect it to see - you just get some key pieces (especially the FieldCaches) into RAM. Don't give mike a hard time about his valuable time - I'm sure he would have answered, but he's likely in bed (That cat wakes early it seems. ). He's a lot nicer than I am ;) John Wang wrote: > No worries. > Just trying to understand things. > > I wanted to double check but didn't want to write "My IDE told me that > was the case" to sound pissy. > > I did look at the code, sometimes too much actually, but I never want > to claim I understand the code 100%, hence going to the source is > probably the best, even at the expense of sounding dumb, it is usually > worthy it ;) > > My question is more on how would a person do it on the public API > level without having to hack into the source code. > > My main misunderstanding at this point is that I had thought > IndexReaderWarmer can directly warm the field cache deterministically. > > Thanks > > -John > > On Wed, Sep 23, 2009 at 8:33 AM, Mark Miller <markrmil...@gmail.com > <mailto:markrmil...@gmail.com>> wrote: > > Don't take me too seriously John - I doubt anyone does :) > > And I wasn't implying Mike's time was more valuable than yours. I was > being ... uh ... me :) > > And I don't claim that all of your many questions could have been > found > in 5 seconds ;) > > Just the ones you were asking - its very quick (at least with eclipse) > to see that there is no default impl. > Its also very quick to see that a segment reader is passed to the warm > method every time. I think its just > a generic IndexReader because you would warm a multi-reader the > same way > as a segmentreader. > > I was just suggesting you look at the code a bit, because I think its > fairly easy to figure out the basics of the warmer (hey, if I can > do it > ;) ). > > Again, don't take me too seriously. I send out my comments faster > than I > can think of them. And I've probably wasted more of Mike's time > than anyone. > > The only way you will load the entire FieldCache is to use a top level > Reader outside of the core API - the core api works per segment > now. And > the IndexReaderWarmer is always passed a segmentreader from the > readerPool. > > - Mark > > John Wang wrote: > > Mark: > > > > I did spend at least a quarter of an ounce. :) And I am sure Mike's > > time is more valuable than mine, but it was meant to be a > "double-check" > > > > I was under the impression there is a default impl from previous > email > > threads on how to handle field cache warming, perhaps I > misunderstood. > > > > The real question here is "warms the reader" From a public API point > > of view, I wasn't sure if passing in a IndexReader impl is something > > we can do to avoid loading the entire field cache. e.g. would I need > > to down cast? can it be a filtered reader? etc. > > > > If you think there is something I could have done witin 5 sec, > please > > point me to the right direction. > > > > Thanks > > > > -John > > > > On Wed, Sep 23, 2009 at 7:55 AM, Mark Miller > <markrmil...@gmail.com <mailto:markrmil...@gmail.com> > > <mailto:markrmil...@gmail.com <mailto:markrmil...@gmail.com>>> > wrote: > > > > Come on dude :) Spend a half ounce of effort first. Mike's > time is too > > valuable ! > > > > Luckily mine is not. > > > > There is no default impl - the class is dead simple (and the > class has > > been pointed out like 3 times in this thread - I'm not even > fully > > following and I know where to find it): > > > > public static abstract class IndexReaderWarmer { > > public abstract void warm(IndexReader reader) throws > IOException; > > } > > > > Now pass something in that warms the reader. Load a > fieldcache - do a > > search. Do the hokey pokey and turn your self around ... > > > > Investigation time: 5 seconds. > > > > John Wang wrote: > > > Hi Michael: > > > > > > Thanks for the pointer! > > > > > > Pardon my ignorance, but I am still no seeing the > connection > > > between this api to per/segment loading of FieldCache. > (the api > > takes > > > in an IndexReader instead of maybe SegmentReader[]) > > > > > > Can you point me to maybe the default impl of > > IndexReaderWarmer > > > to help me understand? > > > > > > Thanks > > > > > > -John > > > > > > On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless > > > <luc...@mikemccandless.com > <mailto:luc...@mikemccandless.com> > <mailto:luc...@mikemccandless.com <mailto:luc...@mikemccandless.com>> > > <mailto:luc...@mikemccandless.com > <mailto:luc...@mikemccandless.com> > > <mailto:luc...@mikemccandless.com > <mailto:luc...@mikemccandless.com>>>> wrote: > > > > > > This is exactly why we added > > IndexWriter.setMergedSegmentWarmer -- you > > > can warm the reader w/o blocking ongoing updates. > > > > > > Mike > > > > > > On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller > > > <markrmil...@gmail.com <mailto:markrmil...@gmail.com> > <mailto:markrmil...@gmail.com <mailto:markrmil...@gmail.com>> > > <mailto:markrmil...@gmail.com <mailto:markrmil...@gmail.com> > <mailto:markrmil...@gmail.com <mailto:markrmil...@gmail.com>>>> wrote: > > > > Right - when a large segment is invalidated, you > will have > > a bigger > > > > fieldcache piece to reload - pre 2.9, you'd be reloading > > the *whole* > > > > field cache every time though. Sounds like you are > trying to > > > deal with > > > > those large segments changing anyway :) They are > always an > > issue > > > when > > > > doing RT it seems. > > > > > > > > I don't believe deletes invalidate a field cache - > terms from > > > deleted > > > > docs stay in a field cache and segmentreaders use their > > > freqStream as > > > > the fieldcache key. Only when the deletes are merged out > > would they > > > > invalidate - but because your writing a new segment > anyway ... > > > > > > > > - Mark > > > > > > > > John Wang wrote: > > > >> I understand what you are saying. Let me detail > what I am > > > trying to say: > > > >> > > > >> When "currently processed segments" are flushed down, > > merge may > > > >> happen. When merges happen, some of those "stable > > segments" will be > > > >> invalidated, and so will the fieldcache data keyed > by them. > > > >> > > > >> In a high update environment, such scenarios can > happen quite > > > often. > > > >> > > > >> The way the default mergePolicy works is that small > > segments get > > > >> merged into the larger segments. Eventually, what > will be > > > invalidated > > > >> would be a large segment, and when that happens, a > large > > chunk > > > of the > > > >> field cache would be invalidated. > > > >> > > > >> Furthermore, in the case where there are high updates, > > the stable > > > >> segments can be invalidate much sooner when there > are deletes > > > in those > > > >> segments, and I would guess the corresponding > FieldCache > > needs > > > to be > > > >> adjusted. Not sure how it is handled right now. > > > >> > > > >> Just my two cents, and of course when I find the > time I will > > > need to > > > >> run some tests to see. > > > >> > > > >> -John > > > >> > > > >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler > > <u...@thetaphi.de <mailto:u...@thetaphi.de> > <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de>> > > > <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de> > <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de>>> > > > >> <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de> > <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de>> > > <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de> > <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de>>>>> wrote: > > > >> > > > >> The NRT reader coming from the > > IndexWriter.getReader() has only > > > >> changes in the currently processed segments, the > > other segments > > > >> keep stable (and even their IndexReader keys > used for the > > > >> FieldCache). The rest of the segments keep stable. > > For the > > > >> consumer it looks like a normal reader (it is > in fact a > > > >> ReadOnlyDirectoryReader) supporting > > > getSequentialSubReaders() and > > > >> so on. > > > >> > > > >> > > > >> > > > >> ----- > > > >> Uwe Schindler > > > >> H.-H.-Meier-Allee 63, D-28213 Bremen > > > >> http://www.thetaphi.de > > > >> eMail: u...@thetaphi.de <mailto:u...@thetaphi.de> > <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de>> > > <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de> > <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de>>> > > > <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de> > <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de>> > > <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de> > <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de>>>> > > > >> > > > >> > > > > > > ------------------------------------------------------------------------ > > > >> > > > >> *From:* John Wang [mailto:john.w...@gmail.com > <mailto:john.w...@gmail.com> > > <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com>> > > > <mailto:john.w...@gmail.com > <mailto:john.w...@gmail.com> <mailto:john.w...@gmail.com > <mailto:john.w...@gmail.com>>> > > > >> <mailto:john.w...@gmail.com > <mailto:john.w...@gmail.com> > > <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com>> > <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com> > > <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com>>>>] > > > >> *Sent:* Tuesday, September 22, 2009 9:32 AM > > > >> *To:* java-dev@lucene.apache.org > <mailto:java-dev@lucene.apache.org> > > <mailto:java-dev@lucene.apache.org > <mailto:java-dev@lucene.apache.org>> > > > <mailto:java-dev@lucene.apache.org > <mailto:java-dev@lucene.apache.org> > > <mailto:java-dev@lucene.apache.org > <mailto:java-dev@lucene.apache.org>>> > > > <mailto:java-dev@lucene.apache.org > <mailto:java-dev@lucene.apache.org> > > <mailto:java-dev@lucene.apache.org > <mailto:java-dev@lucene.apache.org>> > > > <mailto:java-dev@lucene.apache.org > <mailto:java-dev@lucene.apache.org> > > <mailto:java-dev@lucene.apache.org > <mailto:java-dev@lucene.apache.org>>>> > > > >> *Subject:* Re: 2.9 NRT w.r.t. sorting and field > cache > > > >> > > > >> > > > >> > > > >> Thanks Mark for the pointer! > > > >> > > > >> I guess my point is with NRT, and when segment > files > > change > > > often, > > > >> this would be an issue, no? > > > >> > > > >> Anyway, I can run some tests. > > > >> > > > >> Thanks > > > >> > > > >> -John > > > >> > > > >> On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller > > > >> <markrmil...@gmail.com > <mailto:markrmil...@gmail.com> <mailto:markrmil...@gmail.com > <mailto:markrmil...@gmail.com>> > > <mailto:markrmil...@gmail.com <mailto:markrmil...@gmail.com> > <mailto:markrmil...@gmail.com <mailto:markrmil...@gmail.com>>> > > > <mailto:markrmil...@gmail.com > <mailto:markrmil...@gmail.com> <mailto:markrmil...@gmail.com > <mailto:markrmil...@gmail.com>> > > <mailto:markrmil...@gmail.com <mailto:markrmil...@gmail.com> > <mailto:markrmil...@gmail.com <mailto:markrmil...@gmail.com>>>>> > wrote: > > > >> > > > >> 1483 - indexsearcher pulls out a readers subreaders > > > >> (segmentreaders) and sends a collector over > them one > > by one, > > > >> rather than using the multireader. So only fc > for seg > > > readers that > > > >> change need to be reloaded. > > > >> > > > >> - Mark > > > >> > > > >> > > > >> > > > >> http://www.lucidimagination.com (mobile) > > > >> > > > >> > > > >> On Sep 22, 2009, at 1:27 AM, John Wang > > <john.w...@gmail.com <mailto:john.w...@gmail.com> > <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com>> > > > <mailto:john.w...@gmail.com > <mailto:john.w...@gmail.com> <mailto:john.w...@gmail.com > <mailto:john.w...@gmail.com>>> > > > >> <mailto:john.w...@gmail.com > <mailto:john.w...@gmail.com> > > <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com>> > <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com> > > <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com>>>>> > > > wrote: > > > >> > > > >>> Hi Yonik: > > > >>> > > > >>> Actually that is what I am looking for. > Can you > > > please point > > > >>> me to where/how sorting is done per-segment? > > > >>> > > > >>> When heaving indexing introduces or modifies > > > segments, would > > > >>> it cause reloading of FieldCache at query time and > > thus would > > > >>> impact search performance? > > > >>> > > > >>> thanks > > > >>> > > > >>> -John > > > >>> > > > >>> On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley > > > >>> <yo...@lucidimagination.com > <mailto:yo...@lucidimagination.com> > > <mailto:yo...@lucidimagination.com > <mailto:yo...@lucidimagination.com>> > > > <mailto:yo...@lucidimagination.com > <mailto:yo...@lucidimagination.com> > > <mailto:yo...@lucidimagination.com > <mailto:yo...@lucidimagination.com>>> > > > <mailto:yo...@lucidimagination.com > <mailto:yo...@lucidimagination.com> > > <mailto:yo...@lucidimagination.com > <mailto:yo...@lucidimagination.com>> > > > <mailto:yo...@lucidimagination.com > <mailto:yo...@lucidimagination.com> > > <mailto:yo...@lucidimagination.com > <mailto:yo...@lucidimagination.com>>>>> > > > >>> wrote: > > > >>> > > > >>> On Tue, Sep 22, 2009 at 12:56 AM, John Wang > > > <john.w...@gmail.com <mailto:john.w...@gmail.com> > <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com>> > > <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com> > <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com>>> > > > >>> <mailto:john.w...@gmail.com > <mailto:john.w...@gmail.com> > > <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com>> > <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com> > > <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com>>>>> > > > wrote: > > > >>> > Looking at the code, seems there is a disconnect > > between > > > >>> how/when field > > > >>> > cache is loaded when IndexWriter.getReader() is > > called. > > > >>> > > > >>> I'm not sure what you mean by "disconnect" > > > >>> > > > >>> > Is FieldCache updated? > > > >>> > > > >>> FieldCache entries are populated on demand, as > they > > always > > > have been. > > > >>> > > > >>> > > > >>> > Otherwise, are we reloading FieldCache for each > > > >>> > reader instance? > > > >>> > > > >>> Searching/sorting is now per-segment, and so > is the > > use of the > > > >>> FieldCache. Segments that don't change shouldn't > > have to > > > reload > > > >>> their > > > >>> FieldCache entries. > > > >>> > > > >>> -Yonik > > > >>> http://www.lucidimagination.com > > > >>> > > > >>> > > > > > > --------------------------------------------------------------------- > > > >>> To unsubscribe, e-mail: > > > java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org> > > <mailto:java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org>> > > > <mailto:java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org> > > <mailto:java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org>>> > > > >>> <mailto:java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org> > > <mailto:java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org>> > > > <mailto:java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org> > > <mailto:java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org>>>> > > > >>> For additional commands, e-mail: > > > java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org> > > <mailto:java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org>> > > > <mailto:java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org> > > <mailto:java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org>>> > > > >>> <mailto:java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org> > > <mailto:java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org>> > > > <mailto:java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org> > > <mailto:java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org>>>> > > > >>> > > > >>> > > > >>> > > > >> > > > >> > > > >> > > > > > > > > > > > > -- > > > > - Mark > > > > > > > > http://www.lucidimagination.com > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: > > java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org> > > <mailto:java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org>> > > > <mailto:java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org> > > <mailto:java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org>>> > > > > For additional commands, e-mail: > > java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org> > > <mailto:java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org>> > > > <mailto:java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org> > > <mailto:java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org>>> > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: > > java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org> > > <mailto:java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org>> > > > <mailto:java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org> > > <mailto:java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org>>> > > > For additional commands, e-mail: > > java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org> > > <mailto:java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org>> > > > <mailto:java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org> > > <mailto:java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org>>> > > > > > > > > > > > > -- > > - Mark > > > > http://www.lucidimagination.com > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org> > > <mailto:java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org>> > > For additional commands, e-mail: > java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org> > > <mailto:java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org>> > > > > > > > -- > - Mark > > http://www.lucidimagination.com > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > <mailto:java-dev-unsubscr...@lucene.apache.org> > For additional commands, e-mail: java-dev-h...@lucene.apache.org > <mailto:java-dev-h...@lucene.apache.org> > > -- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org