Eventually, I'll care about how to set this up in Solr. For now I think I'll see if I can figure out the luceneutils benchmark.
On Mon, Apr 7, 2014 at 1:02 PM, Shai Erera <ser...@gmail.com> wrote: > Given that DPF delegates indexing to another PF anyway (currently Lucene41), > I think this might be the case. We would need to test of course. The key > point is that this FilterAtomicReader will be able to serve anything as > direct, even DV, so it might eliminate DVF too. We need to experiment and > benchmark! > > Shai > > On Apr 7, 2014 7:32 PM, "david.w.smi...@gmail.com" > <david.w.smi...@gmail.com> wrote: >> >> Aaaah, nice idea to simply use FilterAtomicReader — of course! So this >> would ultimately be a new IndexReaderFactory that creates >> FilterAtomicReaders for a subset of the fields you want to do this on. >> Cool! With that, I don’t think there would be a need for >> DirectPostingsFormat as a postings format, would there be? >> >> ~ David >> >> >> On Mon, Apr 7, 2014 at 10:58 AM, Shai Erera <ser...@gmail.com> wrote: >>> >>> The only problem is how the Codec makes a dynamic decision on whether to >>> use the wrapped Codec for reading vs pre-load data into in-memory >>> structures, because Codecs are loaded through reflection by the SPI loading >>> mechanism. >>> >>> There is also a TODO in DirectPF to allow wrapping arbitrary PFs, just >>> mentioning in case you want to tackle DPF. >>> >>> I think that if we allowed passing something like a CodecLookupService, >>> with an SPILookupService default impl, you could easily pass that to >>> DirectoryReader which will use your runtime logic to load the right PF (e.g. >>> DPF) instead of the one the index was created with. >>> >>> But it sounds like the core problem is that when we load a Codec/PF/DVF >>> for reading, we cannot pass it any arguments, and so we must make an >>> index-time decision about how we're going to read the data later on. If we >>> could somehow support that, I think that will help you to achieve what you >>> want too. >>> >>> E.g. currently it's an all-or-nothing decision, but if we could pass a >>> parameter like "50% available heap", the Codec/PF/DVF could cache the >>> frequently accessed postings instead of loading all of them into memory. >>> But, that can also be achieved at the IndexReader level, through a custom >>> FilterAtomicReader. And if you could reuse DPF's structures (like >>> DirectTermsEnum, DirectFields...), it should be easier to do this. So >>> perhaps we can think about a DirectAtomicReader which does that? I believe >>> it can share some code w/ DPF, as long as we don't make these APIs public, >>> or make them @super.experimental and @super.expert. >>> >>> Just throwing some ideas... >>> >>> Shai >>> >>> >>> On Mon, Apr 7, 2014 at 5:35 PM, david.w.smi...@gmail.com >>> <david.w.smi...@gmail.com> wrote: >>>> >>>> Benson, I like your idea. >>>> >>>> I think your idea can be achieved as a codec, one that wraps another >>>> codec that establishes the on-disk format. By default the wrapped codec >>>> can >>>> be Lucene’s default codec. I think, if implemented, this would be a change >>>> to DPF instead of an additional DPF-variant codec. >>>> >>>> ~ David >>>> >>>> >>>> On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies <bimargul...@gmail.com> >>>> wrote: >>>>> >>>>> On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir <rcm...@gmail.com> wrote: >>>>> > On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies >>>>> > <bimargul...@gmail.com> wrote: >>>>> > >>>>> >> >>>>> >> My takeaway from the prior conversation was that various people >>>>> >> didn't >>>>> >> entirely believe that I'd seen a dramatic improvement in query perfo >>>>> >> using D-P-F, and so would not smile upon a patch intended to >>>>> >> liberate >>>>> >> D-P-F from codecs. It could be that the effect I saw has to do with >>>>> >> the fact that our system depends on hitting and scoring 50% of the >>>>> >> documents in an index with a lot of documents. >>>>> >> >>>>> > >>>>> > I dont understand the word "liberate" here. why is it such a problem >>>>> > that this is a codec? >>>>> >>>>> I don't want to have to declare my intentions at the time I create >>>>> the index. I don't want to have to use D-P-F for all readers all the >>>>> time. Because I want to be able to decide to open up an index with an >>>>> arbitrary on-disk format and get the in-memory cache behavior of >>>>> D-P-F. Thus 'liberate' -- split the question of 'keep a copy in >>>>> memory' from the choice of the on-disk format. >>>>> >>>>> >>>>> > >>>>> > i do not think we should give it any more status than that, it wastes >>>>> > too much ram. >>>>> >>>>> It didn't seem like 'waste' when it solved a big practical for us. We >>>>> had an application that was too slow, and had plenty of RAM available, >>>>> and we were able to trade space for time by applying D-P-F. >>>>> >>>>> Maybe I'm going about this backwards; if I can come up with a small, >>>>> inconspicuous proposed change that does what I want, there won't be >>>>> any disagreement. >>>>> >>>>> >>>>> > >>>>> > --------------------------------------------------------------------- >>>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>> > For additional commands, e-mail: dev-h...@lucene.apache.org >>>>> > >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>> >>>> >>> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org