Do we have any plan to decouple the index process? Lucene was design for search, but according the question people ask in the thread it beyonds search functionality sometimes. Like we might want to customize our scoring function based on payload. Sometimes i dont need to store TF/IDF information. We can pre-calculate features and store into the system. But i still need to store the extra TF/IDF information. And sometimes, i think we want to load the whole postings into memory to speed up the performance. In that case, we really want to customize the functionality/process of Inverted index. The main problem is, the implementation is highly coupled with the index chain. It's not easy to re-write a new one. Do we have plan to make the index chain change more easier?
Flexible index chain logic, flexible codecs format. Thanks, On Fri, Nov 30, 2012 at 10:02 AM, Michael McCandless < [email protected]> wrote: > On Fri, Nov 30, 2012 at 12:25 PM, Wu, Stephen T., Ph.D. > <[email protected]> wrote: > > Is there any (preliminary) code checked in somewhere that I can look at, > > that would help me understand the practical issues that would need to be > > addressed? > > > > If I understand you correctly, it's a little different from what's > happening > > in your blog posts: > > > http://blog.mikemccandless.com/2012/07/building-new-lucene-postings-format.h > > tml > > > http://blog.mikemccandless.com/2012/08/lucenes-new-blockpostingsformat-thank > > s.html > > Those posts deal with making your own codec, but not about changing > what's > > stored in the postings? I guess I misunderstood "postings format" > before. > > I don't know of any examples of adding an entirely new attribute to > the postings, except via payloads. > > All the examples we have are of Codecs/PostingsFormats/etc. storing > all the usual attributes (term & its stats (docFreq/totalTermFreq), > doc, freq, position, offsets, payload) in "interesting" ways. > > Maybe we can make this more concrete: what new attribute are you > needing to record in the postings and access at search time? > > Mike McCandless > > http://blog.mikemccandless.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
