> I think this (truly componentizing SegmentReader) makes tons of sense. > After all, a SegmentReader is just a bunch of separate components > handling different parts of the index. > > This is really orthogonal to LUCENE-831 (the field cache is just one > component). They can land in either order... > > Earwin do you want to take an initial stab (patch) at this? Okay. The only problem is, I'll take my time, can't cut on sleeping any more than now :)
> I think it'll be interesting how the components API handles near > real-time search, because we want/expect components to be able to > merge themselves efficiently "in RAM" when possible. EG if field > cache already has certain fields loaded, they can be merged in RAM; if > not, they should be merged on disk. If field cache has pending > changes (in a future world when CSF makes it possible to suddenly > change say the price of certain documents), then the components must > properly implement clone (ideally incremental copy-on-write cloning). Can we outline some requirements for the plugin API? Do we want to attach/detach them to IndexReader after it is created, or only during construction? We probably want them to support (know the difference between) SegmentReader/MultiSegmentReader. What about ParallelReader (does anybody use it at all?), FilterIndexReader, MultiReader? For a hierarchy of readers, API should probably support the notion of different plugin instances per-subreader. Do we want plugins supporting more than one interface, or is it an unnecessary complication? Like: indexReader.bindPlugin(instance).to(Iface1.class, Iface2.class); And then: indexReader.plugin(Iface1.class) == indexReader.plugin(Iface2.class) > Mike > > On Sun, Apr 12, 2009 at 7:34 PM, Earwin Burrfoot <[email protected]> wrote: >> To support my dream of kicking fieldCache out of the core and to add >> some extensibility to Lucene, I want to introduce IndexReaderPlugins. >> Rough pseudocode follows: >> >> interface IndexReaderPlugin { >> void attach(SegmentReader reader); >> void detach(SegmentReader reader); >> >> void attach(MultiSegmentReader reader); >> void detach(MultiSegmentReader reader); >> } >> >> IndexReader.java: >> private Map<Class, IndexReaderPlugin> plugins; >> >> on opening/closing toplevel/segment reader we iterate over plugins: >> for(IndexReaderPlugin plugin : plugins) >> plugin.attach(reader); >> >> the map is passed to toplevel reader initially, and then shared with >> lowlevel readers, we can also retrieve plugins: >> public <T> T plugin(Class<T> pluginType); >> >> then we can do something like: >> indexReader.plugin(ValueSource.class).doSomething // lucene code >> indexReader.plugin(FieldsCache.class).forField(LAST_UPDATE_TIME).doSomething >> // my code >> filter.apply(indexReader.plugin(FilterCache.class)) // my code >> >> Benefits are numerous. We get rid of alien code like: >> +++ src/java/org/apache/lucene/index/SegmentReader.java (working copy) >> @@ -83,6 +86,8 @@ >> + protected ValueSource valueSource; >> + >> @@ -555,6 +560,8 @@ >> + >> + valueSource = new CachingValueSource(this, new >> UninversionValueSource(this)); >> >> If I don't need ValueSource attached to my readers, I won't have it. >> If I need my custom caches attached to my readers, I can do it in a >> natural way instead of hacking around MergeScheduler, or comparing >> subreader lists. >> If I want, I can replace Lucene's native ValueSource with my own >> implementation, and all Lucene classes that use it will happily accept >> it. >> >> On second thought, we shouldn't share plugin map across subreaders. If >> we allow attach(SegmentReader reader) to return an instance of plugin >> (plugin decides if it is the same instance always, or per-reader), and >> populate the map for subreader with results of attach invoked on >> toplevel reader map, we'll turn this code: >> segmentReader.plugin(SomeClass.class).segmentReaderDependentMethod(segmentReader); >> into: >> segmentReader.plugin(SomeClass.class).segmentReaderDependentMethod(); >> which makes more sense >> >> Any way the general idea is still the same. >> >> -- >> Kirill Zakharenko/Кирилл Захаренко ([email protected]) >> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 >> ICQ: 104465785 >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Kirill Zakharenko/Кирилл Захаренко ([email protected]) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
