On Mon, Apr 7, 2014 at 5:32 PM, Alan Woodward <a...@flax.co.uk> wrote:
> Does FilterDirectoryReader do what you want?
> https://lucene.apache.org/core/4_7_1/core/org/apache/lucene/index/FilterDirectoryReader.html

Yes, indeed, precisely what the doctor ordered.

>
> Alan Woodward
> www.flax.co.uk
>
>
> On 7 Apr 2014, at 22:19, Benson Margulies wrote:
>
> Typically, an app gets a directory reader, which is a composite
> reader. To get a filter down there into the leaves of the composite
> reader, does anyone have a suggestion about where to enter the
> modularity?
>
> I sort of want to insert myself at
> org.apache.lucene.index.StandardDirectoryReader#open(org.apache.lucene.store.Directory,
> org.apache.lucene.index.IndexCommit) wrapping the segment readers, or
> I could make a sort of filtering composite reader that wraps each of
> the segment readers in a filter.
>
>
> On Mon, Apr 7, 2014 at 1:02 PM, Shai Erera <ser...@gmail.com> wrote:
>
> Given that DPF delegates indexing to another PF anyway (currently Lucene41),
>
> I think this might be the case. We would need to test of course. The key
>
> point is that this FilterAtomicReader will be able to serve anything as
>
> direct, even DV, so it might eliminate DVF too. We need to experiment and
>
> benchmark!
>
>
> Shai
>
>
> On Apr 7, 2014 7:32 PM, "david.w.smi...@gmail.com"
>
> <david.w.smi...@gmail.com> wrote:
>
>
> Aaaah, nice idea to simply use FilterAtomicReader — of course!  So this
>
> would ultimately be a new IndexReaderFactory that creates
>
> FilterAtomicReaders for a subset of the fields you want to do this on.
>
> Cool!  With that, I don’t think there would be a need for
>
> DirectPostingsFormat as a postings format, would there be?
>
>
> ~ David
>
>
>
> On Mon, Apr 7, 2014 at 10:58 AM, Shai Erera <ser...@gmail.com> wrote:
>
>
> The only problem is how the Codec makes a dynamic decision on whether to
>
> use the wrapped Codec for reading vs pre-load data into in-memory
>
> structures, because Codecs are loaded through reflection by the SPI loading
>
> mechanism.
>
>
> There is also a TODO in DirectPF to allow wrapping arbitrary PFs, just
>
> mentioning in case you want to tackle DPF.
>
>
> I think that if we allowed passing something like a CodecLookupService,
>
> with an SPILookupService default impl, you could easily pass that to
>
> DirectoryReader which will use your runtime logic to load the right PF (e.g.
>
> DPF) instead of the one the index was created with.
>
>
> But it sounds like the core problem is that when we load a Codec/PF/DVF
>
> for reading, we cannot pass it any arguments, and so we must make an
>
> index-time decision about how we're going to read the data later on. If we
>
> could somehow support that, I think that will help you to achieve what you
>
> want too.
>
>
> E.g. currently it's an all-or-nothing decision, but if we could pass a
>
> parameter like "50% available heap", the Codec/PF/DVF could cache the
>
> frequently accessed postings instead of loading all of them into memory.
>
> But, that can also be achieved at the IndexReader level, through a custom
>
> FilterAtomicReader. And if you could reuse DPF's structures (like
>
> DirectTermsEnum, DirectFields...), it should be easier to do this. So
>
> perhaps we can think about a DirectAtomicReader which does that? I believe
>
> it can share some code w/ DPF, as long as we don't make these APIs public,
>
> or make them @super.experimental and @super.expert.
>
>
> Just throwing some ideas...
>
>
> Shai
>
>
>
> On Mon, Apr 7, 2014 at 5:35 PM, david.w.smi...@gmail.com
>
> <david.w.smi...@gmail.com> wrote:
>
>
> Benson, I like your idea.
>
>
> I think your idea can be achieved as a codec, one that wraps another
>
> codec that establishes the on-disk format.  By default the wrapped codec can
>
> be Lucene’s default codec.  I think, if implemented, this would be a change
>
> to DPF instead of an additional DPF-variant codec.
>
>
> ~ David
>
>
>
> On Mon, Apr 7, 2014 at 9:22 AM, Benson Margulies <bimargul...@gmail.com>
>
> wrote:
>
>
> On Mon, Apr 7, 2014 at 9:14 AM, Robert Muir <rcm...@gmail.com> wrote:
>
> On Thu, Apr 3, 2014 at 12:27 PM, Benson Margulies
>
> <bimargul...@gmail.com> wrote:
>
>
>
> My takeaway from the prior conversation was that various people
>
> didn't
>
> entirely believe that I'd seen a dramatic improvement in query perfo
>
> using D-P-F, and so would not smile upon a patch intended to
>
> liberate
>
> D-P-F from codecs. It could be that the effect I saw has to do with
>
> the fact that our system depends on hitting and scoring 50% of the
>
> documents in an index with a lot of documents.
>
>
>
> I dont understand the word "liberate" here. why is it such a problem
>
> that this is a codec?
>
>
> I don't want to have to declare my intentions at the time I create
>
> the index. I don't want to have to use D-P-F for all readers all the
>
> time. Because I want to be able to decide to open up an index with an
>
> arbitrary on-disk format and get the in-memory cache behavior of
>
> D-P-F. Thus 'liberate' -- split the question of 'keep a copy in
>
> memory' from the choice of the on-disk format.
>
>
>
>
> i do not think we should give it any more status than that, it wastes
>
> too much ram.
>
>
> It didn't seem like 'waste' when it solved a big practical for us. We
>
> had an application that was too slow, and had plenty of RAM available,
>
> and we were able to trade space for time by applying D-P-F.
>
>
> Maybe I'm going about this backwards; if I can come up with a small,
>
> inconspicuous proposed change that does what I want, there won't be
>
> any disagreement.
>
>
>
>
> ---------------------------------------------------------------------
>
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
>
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to