Interested / easily amused parties are welcomed to observe the proceedings in https://github.com/apache/lucene-solr/pull/44. It's a PR _only_ to offer visibility! So far, I've got a 'delegating codec' that interposes the direct posting idea atop any other codec. Next comes the filtering.
I'm not sure that I ever concisely reported the situation that got me started on this: a profile in which _time in the codec_ dominated my application. So the RAMDirectory was useless, since that removes no codec CPU time, but the D-P-F did the job, since it does. On Mon, Apr 7, 2014 at 5:34 PM, Benson Margulies <bimargul...@gmail.com> wrote: > On Mon, Apr 7, 2014 at 5:32 PM, Alan Woodward <a...@flax.co.uk> wrote: >> Does FilterDirectoryReader do what you want? >> https://lucene.apache.org/core/4_7_1/core/org/apache/lucene/index/FilterDirectoryReader.html > > Yes, indeed, precisely what the doctor ordered. > >> >> Alan Woodward >> www.flax.co.uk >> >> >> On 7 Apr 2014, at 22:19, Benson Margulies wrote: >> >> Typically, an app gets a directory reader, which is a composite >> reader. To get a filter down there into the leaves of the composite >> reader, does anyone have a suggestion about where to enter the >> modularity? >> >> I sort of want to insert myself at >> org.apache.lucene.index.StandardDirectoryReader#open(org.apache.lucene.store.Directory, >> org.apache.lucene.index.IndexCommit) wrapping the segment readers, or >> I could make a sort of filtering composite reader that wraps each of >> the segment readers in a filter. >> >> >> On Mon, Apr 7, 2014 at 1:02 PM, Shai Erera <ser...@gmail.com> wrote: >> >> Given that DPF delegates indexing to another PF anyway (currently Lucene41), >> >> I think this might be the case. We would need to test of course. The key >> >> point is that this FilterAtomicReader will be able to serve anything as >> >> direct, even DV, so it might eliminate DVF too. We need to experiment and >> >> benchmark! >> >> >> Shai >> >> >> On Apr 7, 2014 7:32 PM, "david.w.smi...@gmail.com" >> >> <david.w.smi...@gmail.com> wrote: >> >> >> Aaaah, nice idea to simply use FilterAtomicReader — of course! So this >> >> would ultimately be a new IndexReaderFactory that creates >> >> FilterAtomicReaders for a subset of the fields you want to do this on. >> >> Cool! With that, I don’t think there would be a need for >> >> DirectPostingsFormat as a postings format, would there be? >> >> >> ~ David >> >> >> >> On Mon, Apr 7, 2014 at 10:58 AM, Shai Erera <ser...@gmail.com> wrote: >> >> >> The only problem is how the Codec makes a dynamic decision on whether to >> >> use the wrapped Codec for reading vs pre-load data into in-memory >> >> structures, because Codecs are loaded through reflection by the SPI loading >> >> mechanism. >> >> >> There is also a TODO in DirectPF to allow wrapping arbitrary PFs, just >> >> mentioning in case you want to tackle DPF. >> >> >> I think that if we allowed passing something like a CodecLookupService, >> >> with an SPILookupService default impl, you could easily pass that to >> >> DirectoryReader which will use your runtime logic to load the right PF (e.g. >> >> DPF) instead of the one the index was created with. >> >> >> But it sounds like the core problem is that when we load a Codec/PF/DVF >> >> for reading, we cannot pass it any arguments, and so we must make an >> >> index-time decision about how we're going to read the data later on. If we >> >> could somehow support that, I think that will help you to achieve what you >> >> want too. >> >> >> E.g. currently it's an all-or-nothing decision, but if we could pass a >> >> parameter like "50% available heap", the Codec/PF/DVF could cache the >> >> frequently accessed postings instead of loading all of them into memory. >> >> But, that can also be achieved at the IndexReader level, through a custom >> >> FilterAtomicReader. And if you could reuse DPF's structures (like >> >> DirectTermsEnum, DirectFields...), it should be easier to do this. So >> >> perhaps we can think about a DirectAtomicReader which does that? I believe >> >> it can share some code w/ DPF, as long as we don't make these APIs public, >> >> or make them @super.experimental and @super.expert. >> >> >> Just throwing some ideas... >> >> >> Shai >> >> >> >> On Mon, Apr 7, 2014 at 5:35 PM, david.w.smi...@gmail.com >> >> <david.w.smi...@gmail.com> wrote: >> >> >> Benson, I like your idea. >> >> >> I think your idea can be achieved as a codec, one that wraps another >> >> codec that establishes the on-disk format. By default the wrapped codec can >> >> be Lucene’s default codec. I think, if implemented, this would be a change >> >> to DPF instead of an additional DPF-variant codec. >> >> >> ~ David >> >> >> >> On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies <bimargul...@gmail.com> >> >> wrote: >> >> >> On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir <rcm...@gmail.com> wrote: >> >> On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies >> >> <bimargul...@gmail.com> wrote: >> >> >> >> My takeaway from the prior conversation was that various people >> >> didn't >> >> entirely believe that I'd seen a dramatic improvement in query perfo >> >> using D-P-F, and so would not smile upon a patch intended to >> >> liberate >> >> D-P-F from codecs. It could be that the effect I saw has to do with >> >> the fact that our system depends on hitting and scoring 50% of the >> >> documents in an index with a lot of documents. >> >> >> >> I dont understand the word "liberate" here. why is it such a problem >> >> that this is a codec? >> >> >> I don't want to have to declare my intentions at the time I create >> >> the index. I don't want to have to use D-P-F for all readers all the >> >> time. Because I want to be able to decide to open up an index with an >> >> arbitrary on-disk format and get the in-memory cache behavior of >> >> D-P-F. Thus 'liberate' -- split the question of 'keep a copy in >> >> memory' from the choice of the on-disk format. >> >> >> >> >> i do not think we should give it any more status than that, it wastes >> >> too much ram. >> >> >> It didn't seem like 'waste' when it solved a big practical for us. We >> >> had an application that was too slow, and had plenty of RAM available, >> >> and we were able to trade space for time by applying D-P-F. >> >> >> Maybe I'm going about this backwards; if I can come up with a small, >> >> inconspicuous proposed change that does what I want, there won't be >> >> any disagreement. >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org