On Wed, Oct 9, 2013 at 7:18 PM, Michael McCandless < [email protected]> wrote:
> On Wed, Oct 9, 2013 at 7:13 PM, Benson Margulies <[email protected]> > wrote: > > On Tue, Oct 8, 2013 at 5:50 PM, Michael McCandless < > > [email protected]> wrote: > > > >> DirectPostingsFormat? > >> > >> It stores all terms + postings as simple java arrays, uncompressed. > >> > > > > This definitely speeded things up in my benchmark, but I'm greedy for > more. > > I just made a codec that returns it as the postings guy, is that the > whole > > recipe?. Does it make sense to extend it any further to any of the other > > codec pieces? > > Yes, that's all you should need to do (you should have seen RAM usage > go up too, to confirm :) ). > > Really this just addressed one "hotspot" (decoding terms/postings from > the index); the query matching + scoring is also costly, and if you do > "other stuff" (highlighting, spell correction) that can be costly too > ... what kind of queries are you running / where are the hotspots in > profiling? > Profile shows a lot of time in org.apache.lucene.search.BooleanScorer$ BooleanScorerCollector.collect(int). We know that a typical query inspects about 1/2 of the documents in the index. > > Mike McCandless > > http://blog.mikemccandless.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
