Re: 2.9 NRT w.r.t. sorting and field cache

John Wang Tue, 22 Sep 2009 17:35:14 -0700

Thanks Mike for your valuable time!

Sorry to be a pest, I am trying to write a fair perf test and to understand
the feature. If there are other experts on the subject of index reader
warming, please chime in.


I am not seeing the connection between given an IndexReader and the
FieldCacheImpl API, e.g. how to warm up the FieldCache for this particular
segment?

Are you suggesting to just do a IndexSearcher.search on the given index for
warming up within the IndexReaderWarmer impl? In which case the searcher
would need to know the incoming searches pretty well I guess.

Thanks

-John



On Wed, Sep 23, 2009 at 7:57 AM, Mark Miller <markrmil...@gmail.com> wrote:

> Oh - yeah - also - youll be passed a segment reader if thats what makes
> sense. And sense it does, you will be passed one every time. You can
> warm a multireader the same way though, so no reason to pin it down.
>
> Mark Miller wrote:
> > Come on dude :) Spend a half ounce of effort first. Mike's time is too
> > valuable !
> >
> > Luckily mine is not.
> >
> > There is no default impl - the class is dead simple (and the class has
> > been pointed out like 3 times in this thread - I'm not even fully
> > following and I know where to find it):
> >
> >   public static abstract class IndexReaderWarmer {
> >     public abstract void warm(IndexReader reader) throws IOException;
> >   }
> >
> > Now pass something in that warms the reader. Load a fieldcache - do a
> > search. Do the hokey pokey and turn your self around ...
> >
> > Investigation time: 5 seconds.
> >
> > John Wang wrote:
> >
> >> Hi Michael:
> >>
> >>      Thanks for the pointer!
> >>
> >>       Pardon my ignorance, but I am still no seeing the connection
> >> between this api to per/segment loading of FieldCache. (the api takes
> >> in an IndexReader instead of maybe SegmentReader[])
> >>
> >>       Can you point me to maybe the default impl of IndexReaderWarmer
> >> to help me understand?
> >>
> >> Thanks
> >>
> >> -John
> >>
> >> On Wed, Sep 23, 2009 at 7:17 AM, Michael McCandless
> >> <luc...@mikemccandless.com <mailto:luc...@mikemccandless.com>> wrote:
> >>
> >>     This is exactly why we added IndexWriter.setMergedSegmentWarmer --
> you
> >>     can warm the reader w/o blocking ongoing updates.
> >>
> >>     Mike
> >>
> >>     On Tue, Sep 22, 2009 at 7:15 PM, Mark Miller
> >>     <markrmil...@gmail.com <mailto:markrmil...@gmail.com>> wrote:
> >>     > Right - when a large segment is invalidated, you will have a
> bigger
> >>     > fieldcache piece to reload - pre 2.9, you'd be reloading the
> *whole*
> >>     > field cache every time though. Sounds like you are trying to
> >>     deal with
> >>     > those large segments changing anyway :) They are always an issue
> >>     when
> >>     > doing RT it seems.
> >>     >
> >>     > I don't believe deletes invalidate a field cache - terms from
> >>     deleted
> >>     > docs stay in a field cache and segmentreaders use their
> >>     freqStream as
> >>     > the fieldcache key. Only when the deletes are merged out would
> they
> >>     > invalidate - but because your writing a new segment anyway ...
> >>     >
> >>     > - Mark
> >>     >
> >>     > John Wang wrote:
> >>     >> I understand what you are saying. Let me detail what I am
> >>     trying to say:
> >>     >>
> >>     >> When "currently processed segments" are flushed down, merge may
> >>     >> happen. When merges happen, some of those "stable segments" will
> be
> >>     >> invalidated, and so will the fieldcache data keyed by them.
> >>     >>
> >>     >> In a high update environment, such scenarios can happen quite
> >>     often.
> >>     >>
> >>     >> The way the default mergePolicy works is that small segments get
> >>     >> merged into the larger segments. Eventually, what will be
> >>     invalidated
> >>     >> would be a large segment, and when that happens, a large chunk
> >>     of the
> >>     >> field cache would be invalidated.
> >>     >>
> >>     >> Furthermore, in the case where there are high updates, the stable
> >>     >> segments can be invalidate much sooner when there are deletes
> >>     in those
> >>     >> segments, and I would guess the corresponding FieldCache needs
> >>     to be
> >>     >> adjusted. Not sure how it is handled right now.
> >>     >>
> >>     >> Just my two cents, and of course when I find the time I will
> >>     need to
> >>     >> run some tests to see.
> >>     >>
> >>     >> -John
> >>     >>
> >>     >> On Tue, Sep 22, 2009 at 3:59 PM, Uwe Schindler <u...@thetaphi.de
> >>     <mailto:u...@thetaphi.de>
> >>     >> <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de>>> wrote:
> >>     >>
> >>     >>     The NRT reader coming from the IndexWriter.getReader() has
> only
> >>     >>     changes in the currently processed segments, the other
> segments
> >>     >>     keep stable (and even their IndexReader keys used for the
> >>     >>     FieldCache). The rest of the segments keep stable. For the
> >>     >>     consumer it looks like a normal reader (it is in fact a
> >>     >>     ReadOnlyDirectoryReader) supporting
> >>     getSequentialSubReaders() and
> >>     >>     so on.
> >>     >>
> >>     >>
> >>     >>
> >>     >>     -----
> >>     >>     Uwe Schindler
> >>     >>     H.-H.-Meier-Allee 63, D-28213 Bremen
> >>     >>     http://www.thetaphi.de
> >>     >>     eMail: u...@thetaphi.de <mailto:u...@thetaphi.de>
> >>     <mailto:u...@thetaphi.de <mailto:u...@thetaphi.de>>
> >>     >>
> >>     >>
> >>
> ------------------------------------------------------------------------
> >>     >>
> >>     >>     *From:* John Wang [mailto:john.w...@gmail.com
> >>     <mailto:john.w...@gmail.com>
> >>     >>     <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com>>]
> >>     >>     *Sent:* Tuesday, September 22, 2009 9:32 AM
> >>     >>     *To:* java-dev@lucene.apache.org
> >>     <mailto:java-dev@lucene.apache.org>
> >>     <mailto:java-dev@lucene.apache.org
> >>     <mailto:java-dev@lucene.apache.org>>
> >>     >>     *Subject:* Re: 2.9 NRT w.r.t. sorting and field cache
> >>     >>
> >>     >>
> >>     >>
> >>     >>     Thanks Mark for the pointer!
> >>     >>
> >>     >>     I guess my point is with NRT, and when segment files change
> >>     often,
> >>     >>     this would be an issue, no?
> >>     >>
> >>     >>     Anyway, I can run some tests.
> >>     >>
> >>     >>     Thanks
> >>     >>
> >>     >>     -John
> >>     >>
> >>     >>     On Tue, Sep 22, 2009 at 3:21 PM, Mark Miller
> >>     >>     <markrmil...@gmail.com <mailto:markrmil...@gmail.com>
> >>     <mailto:markrmil...@gmail.com <mailto:markrmil...@gmail.com>>>
> wrote:
> >>     >>
> >>     >>     1483 - indexsearcher pulls out a readers subreaders
> >>     >>     (segmentreaders) and sends a collector over them one by one,
> >>     >>     rather than using the multireader. So only fc for seg
> >>     readers that
> >>     >>     change need to be reloaded.
> >>     >>
> >>     >>     - Mark
> >>     >>
> >>     >>
> >>     >>
> >>     >>     http://www.lucidimagination.com (mobile)
> >>     >>
> >>     >>
> >>     >>     On Sep 22, 2009, at 1:27 AM, John Wang <john.w...@gmail.com
> >>     <mailto:john.w...@gmail.com>
> >>     >>     <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com>>>
> >>     wrote:
> >>     >>
> >>     >>>     Hi Yonik:
> >>     >>>
> >>     >>>          Actually that is what I am looking for. Can you
> >>     please point
> >>     >>>     me to where/how sorting is done per-segment?
> >>     >>>
> >>     >>>          When heaving indexing introduces or modifies
> >>     segments, would
> >>     >>>     it cause reloading of FieldCache at query time and thus
> would
> >>     >>>     impact search performance?
> >>     >>>
> >>     >>>     thanks
> >>     >>>
> >>     >>>     -John
> >>     >>>
> >>     >>>     On Tue, Sep 22, 2009 at 1:05 PM, Yonik Seeley
> >>     >>>     <yo...@lucidimagination.com
> >>     <mailto:yo...@lucidimagination.com>
> >>     <mailto:yo...@lucidimagination.com
> >>     <mailto:yo...@lucidimagination.com>>>
> >>     >>>     wrote:
> >>     >>>
> >>     >>>     On Tue, Sep 22, 2009 at 12:56 AM, John Wang
> >>     <john.w...@gmail.com <mailto:john.w...@gmail.com>
> >>     >>>     <mailto:john.w...@gmail.com <mailto:john.w...@gmail.com>>>
> >>     wrote:
> >>     >>>     > Looking at the code, seems there is a disconnect between
> >>     >>>     how/when field
> >>     >>>     > cache is loaded when IndexWriter.getReader() is called.
> >>     >>>
> >>     >>>     I'm not sure what you mean by "disconnect"
> >>     >>>
> >>     >>>     > Is FieldCache updated?
> >>     >>>
> >>     >>>     FieldCache entries are populated on demand, as they always
> >>     have been.
> >>     >>>
> >>     >>>
> >>     >>>     > Otherwise, are we reloading FieldCache for each
> >>     >>>     > reader instance?
> >>     >>>
> >>     >>>     Searching/sorting is now per-segment, and so is the use of
> the
> >>     >>>     FieldCache.  Segments that don't change shouldn't have to
> >>     reload
> >>     >>>     their
> >>     >>>     FieldCache entries.
> >>     >>>
> >>     >>>     -Yonik
> >>     >>>     http://www.lucidimagination.com
> >>     >>>
> >>     >>>
> >>
> ---------------------------------------------------------------------
> >>     >>>     To unsubscribe, e-mail:
> >>     java-dev-unsubscr...@lucene.apache.org
> >>     <mailto:java-dev-unsubscr...@lucene.apache.org>
> >>     >>>     <mailto:java-dev-unsubscr...@lucene.apache.org
> >>     <mailto:java-dev-unsubscr...@lucene.apache.org>>
> >>     >>>     For additional commands, e-mail:
> >>     java-dev-h...@lucene.apache.org
> >>     <mailto:java-dev-h...@lucene.apache.org>
> >>     >>>     <mailto:java-dev-h...@lucene.apache.org
> >>     <mailto:java-dev-h...@lucene.apache.org>>
> >>     >>>
> >>     >>>
> >>     >>>
> >>     >>
> >>     >>
> >>     >>
> >>     >
> >>     >
> >>     > --
> >>     > - Mark
> >>     >
> >>     > http://www.lucidimagination.com
> >>     >
> >>     >
> >>     >
> >>     >
> >>     >
> >>
> ---------------------------------------------------------------------
> >>     > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> >>     <mailto:java-dev-unsubscr...@lucene.apache.org>
> >>     > For additional commands, e-mail: java-dev-h...@lucene.apache.org
> >>     <mailto:java-dev-h...@lucene.apache.org>
> >>     >
> >>     >
> >>
> >>
> ---------------------------------------------------------------------
> >>     To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> >>     <mailto:java-dev-unsubscr...@lucene.apache.org>
> >>     For additional commands, e-mail: java-dev-h...@lucene.apache.org
> >>     <mailto:java-dev-h...@lucene.apache.org>
> >>
> >>
> >>
> >
> >
> >
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

Re: 2.9 NRT w.r.t. sorting and field cache

Reply via email to