On Mon, Apr 7, 2014 at 5:32 PM, Alan Woodward <a...@flax.co.uk> wrote: > Does FilterDirectoryReader do what you want? > https://lucene.apache.org/core/4_7_1/core/org/apache/lucene/index/FilterDirectoryReader.html
Yes, indeed, precisely what the doctor ordered. > > Alan Woodward > www.flax.co.uk > > > On 7 Apr 2014, at 22:19, Benson Margulies wrote: > > Typically, an app gets a directory reader, which is a composite > reader. To get a filter down there into the leaves of the composite > reader, does anyone have a suggestion about where to enter the > modularity? > > I sort of want to insert myself at > org.apache.lucene.index.StandardDirectoryReader#open(org.apache.lucene.store.Directory, > org.apache.lucene.index.IndexCommit) wrapping the segment readers, or > I could make a sort of filtering composite reader that wraps each of > the segment readers in a filter. > > > On Mon, Apr 7, 2014 at 1:02 PM, Shai Erera <ser...@gmail.com> wrote: > > Given that DPF delegates indexing to another PF anyway (currently Lucene41), > > I think this might be the case. We would need to test of course. The key > > point is that this FilterAtomicReader will be able to serve anything as > > direct, even DV, so it might eliminate DVF too. We need to experiment and > > benchmark! > > > Shai > > > On Apr 7, 2014 7:32 PM, "david.w.smi...@gmail.com" > > <david.w.smi...@gmail.com> wrote: > > > Aaaah, nice idea to simply use FilterAtomicReader — of course! So this > > would ultimately be a new IndexReaderFactory that creates > > FilterAtomicReaders for a subset of the fields you want to do this on. > > Cool! With that, I don’t think there would be a need for > > DirectPostingsFormat as a postings format, would there be? > > > ~ David > > > > On Mon, Apr 7, 2014 at 10:58 AM, Shai Erera <ser...@gmail.com> wrote: > > > The only problem is how the Codec makes a dynamic decision on whether to > > use the wrapped Codec for reading vs pre-load data into in-memory > > structures, because Codecs are loaded through reflection by the SPI loading > > mechanism. > > > There is also a TODO in DirectPF to allow wrapping arbitrary PFs, just > > mentioning in case you want to tackle DPF. > > > I think that if we allowed passing something like a CodecLookupService, > > with an SPILookupService default impl, you could easily pass that to > > DirectoryReader which will use your runtime logic to load the right PF (e.g. > > DPF) instead of the one the index was created with. > > > But it sounds like the core problem is that when we load a Codec/PF/DVF > > for reading, we cannot pass it any arguments, and so we must make an > > index-time decision about how we're going to read the data later on. If we > > could somehow support that, I think that will help you to achieve what you > > want too. > > > E.g. currently it's an all-or-nothing decision, but if we could pass a > > parameter like "50% available heap", the Codec/PF/DVF could cache the > > frequently accessed postings instead of loading all of them into memory. > > But, that can also be achieved at the IndexReader level, through a custom > > FilterAtomicReader. And if you could reuse DPF's structures (like > > DirectTermsEnum, DirectFields...), it should be easier to do this. So > > perhaps we can think about a DirectAtomicReader which does that? I believe > > it can share some code w/ DPF, as long as we don't make these APIs public, > > or make them @super.experimental and @super.expert. > > > Just throwing some ideas... > > > Shai > > > > On Mon, Apr 7, 2014 at 5:35 PM, david.w.smi...@gmail.com > > <david.w.smi...@gmail.com> wrote: > > > Benson, I like your idea. > > > I think your idea can be achieved as a codec, one that wraps another > > codec that establishes the on-disk format. By default the wrapped codec can > > be Lucene’s default codec. I think, if implemented, this would be a change > > to DPF instead of an additional DPF-variant codec. > > > ~ David > > > > On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies <bimargul...@gmail.com> > > wrote: > > > On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir <rcm...@gmail.com> wrote: > > On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies > > <bimargul...@gmail.com> wrote: > > > > My takeaway from the prior conversation was that various people > > didn't > > entirely believe that I'd seen a dramatic improvement in query perfo > > using D-P-F, and so would not smile upon a patch intended to > > liberate > > D-P-F from codecs. It could be that the effect I saw has to do with > > the fact that our system depends on hitting and scoring 50% of the > > documents in an index with a lot of documents. > > > > I dont understand the word "liberate" here. why is it such a problem > > that this is a codec? > > > I don't want to have to declare my intentions at the time I create > > the index. I don't want to have to use D-P-F for all readers all the > > time. Because I want to be able to decide to open up an index with an > > arbitrary on-disk format and get the in-memory cache behavior of > > D-P-F. Thus 'liberate' -- split the question of 'keep a copy in > > memory' from the choice of the on-disk format. > > > > > i do not think we should give it any more status than that, it wastes > > too much ram. > > > It didn't seem like 'waste' when it solved a big practical for us. We > > had an application that was too slow, and had plenty of RAM available, > > and we were able to trade space for time by applying D-P-F. > > > Maybe I'm going about this backwards; if I can come up with a small, > > inconspicuous proposed change that does what I want, there won't be > > any disagreement. > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org