On Fri, Nov 8, 2013 at 2:22 AM, Ravikumar Govindarajan < [email protected]> wrote:
> Wow, this saving of filters in a custom-codec is super-cool. > > Let me describe the problem I was thinking about. > > Assuming we have the RAMDir and Disk swap approach, I was just starting to > deliberate on the Read path. > > PrimeDocCache looks like a challenge for this approach, as the same row > will now be present across multiple segments. Each segment will have a > "PrimeDoc" field per-row, but during merge this info gets duplicated for > each row. > > I was thinking of recording the "start-doc" of each row to a separate file, > via a custom codec, like you have done for FilterCache. > > During warm-up, it can read the entire file containing "start-docs" and > populate the PrimeDocCache. > I like the idea, I tend to prototype to figure out how hard and how performant a solution will be. :-) Let's see if we can make it work. Aaron > > -- > Ravi > > > > > On Fri, Nov 8, 2013 at 5:04 AM, Aaron McCurry <[email protected]> wrote: > > > So filter cache is really just a place holder for keeping Lucene Filters > > around between queries. The DefaultFilterCache class does nothing, > however > > I have implemented one that make use of regularly. > > > > > > > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur-core/src/main/java/org/apache/blur/manager/AliasBlurFilterCache.java;h=92491d0ceb3e7ce09902110e3bac5fa485959dab;hb=apache-blur-0.2 > > > > If you write your own and you want to build a logical bitset cache for > the > > filter (so it's faster) take a look at the > > "org.apache.blur.filter.FilterCache" > > class. It wraps an existing filter, loads it into the block cache and > > writes it disk (via the Directory). The filters live with the segment so > > if the segment gets removed so will the on disk "filter" and the > in-memory > > cache of it. > > > > On Thu, Nov 7, 2013 at 8:08 AM, Ravikumar Govindarajan < > > [email protected]> wrote: > > > > > Great. In such a case, it will benefit me for doing a "rowid" > > filter-cache. > > > > > > I saw Blur having a DefaultFilterCache class. Is this the class that > need > > > to be customized? Will NRT re-opens [reader-close/open, with > > > applyAllDeletes] take care of auto-invalidating such a cache? > > > > > > > Filtering is a query operation so for each new segment (NRT re-opens) the > > Lucene Filter API handles creating a new new filter for that segment. > The > > delete operations are up to how you code the Filter. But that's all > Lucene > > code. > > > > The DefaultFilterCache just allows you to cache the filter objects > > themselves and it provides callbacks when table/shards are opened and > > closed. > > > > Aaron > > > > > > > > > > -- > > > Ravi > > > > > > > > > On Thu, Nov 7, 2013 at 5:44 PM, Aaron McCurry <[email protected]> > > wrote: > > > > > > > Yes. But I believe the "rowId" needs to be "rowid". > > > > > > > > Aaron > > > > > > > > > > > > On Thu, Nov 7, 2013 at 5:16 AM, Ravikumar Govindarajan < > > > > [email protected]> wrote: > > > > > > > > > Does Blur permit queries with rowId? > > > > > > > > > > Ex: > > > > > docs.body:hello AND rowId:123 > > > > > > > > > > Is it possible to optimize such queries with filter-caching etc...? > > > > > > > > > > -- > > > > > Ravi > > > > > > > > > > > > > > >
