Re: Future projects

John Wang Fri, 03 Apr 2009 12:17:22 -0700

By default bobo DOES use a flavor of the field cache data structure with
some addition information for performance. (e.g. minDocid,maxDocid,freq per
term)
Bobo is architected as a platform where clients can write their own
"FacetHandlers" in which each FacetHandler manages its own view of memory
structure, and thus can be more complicated that field cache.
At LinkedIn, we write FacetHandlers for geo lat/lon filtering and social
network faceting.


-John

On Fri, Apr 3, 2009 at 3:35 AM, Michael McCandless <
[email protected]> wrote:

> On Thu, Apr 2, 2009 at 5:56 PM, Jason Rutherglen
> <[email protected]> wrote:
> >> I think I need to understand better why delete by Query isn't
> > viable in your situation...
> >
> > The delete by query is a separate problem which I haven't fully
> > explored yet.
>
> Oh, I had thought we were tugging on this thread in order to explore
> delete-by-docID in the writer.  OK.
>
> > Tracking the segment genealogy is really an
> > interim step for merging field caches before column stride
> > fields gets implemented.
>
> I see -- meaning in Bobo you'd like to manage your own memory resident
> field caches, and merge them whenever IW has merged a segment?  Seems
> like you don't need genealogy for that.
>
> > Actually CSF cannot be used with Bobo's
> > field caches anyways which means we'd need a way to find out
> > about the segment parents.
>
> CSF isn't really designed yet.  How come it can't be used with Bobo's
> field caches?  We can try to accommodate Bobo's field cache needs when
> designing CSF.
>
> >> Does it operate at the segment level? Seems like that'd give
> > you good enough realtime performance (though merging in RAM will
> > definitely be faster).
> >
> > We need to see how Bobo integrates with LUCENE-1483.
>
> Lucene's internal field cache usage is now entirely at the segment
> level (ie, Lucene core should never request full field cache array at
> the MultiSegmentReader level).  I think Bobo must have to do the same,
> if it handles near realtime updates, to get adequate performance.
>
> Though... since we have LUCENE-831 (rework API Lucene exposes for
> accessing arrays-of-atomic-types-per-segment) and LUCENE-1231 (CSF = a
> more efficient impl (than uninversion) of the API we expose in
> LUCENE-831) on deck, we should try to understand Bobo's needs.
>
> EG how come Bobo made its own field cache impl?  Just because
> uninversion is too slow?
>
> > It seems like we've been talking about CSF for 2 years and there
> > isn't a patch for it? If I had more time I'd take a look. What
> > is the status of it?
>
> I think Michael is looking into it?  I'd really like to get it into
> 2.9.  We should do it in conjunction with 831 since they are so tied.
>
> > I'll write a patch that implements a callback for the segment
> > merging such that the user can decide what information they want
> > to record about the merged SRs (I'm pretty sure there isn't a
> > way to do this with MergePolicy?)
>
> Actually I think you can do this w/ a simple MergeScheduler wrapper or
> by subclassing CMS.  I'll put a comment on the issue.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Future projects

Reply via email to