[ https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933888#action_12933888 ]
Michael McCandless commented on LUCENE-2694: -------------------------------------------- Phew that was fast! Wow, you nuked the terms dict cache :) Nice! Though it makes me a bit nervous... like there'll always be a risk we've missed some path through Lucene that does two lookups... And, even for external reasons (eg same query arrives to Lucene, looking for next page or something), the cache is useful. EG, a straight TermQuery (not spawned by MTQ) is now hitting the terms dict twice. Once inside Sim.idfExplain, where it calls searcher.docFreq(term), and then again to pull the scorers per sub reader. Probably, TermQuery should pull the PerReaderTermState, up front, if it wasn't already handed it? And then pass the docFreq to Sim.idfExplain. Should we add a PerReaderTermState.docFreq(), which just sums up across all subs? Does TermState really need field()? Seems wasteful to have to store that... eg an MTQ will store many TermStates against the same field. I think we should keep TermState lean. Also, I think it shouldn't need that clone method? I think instead of duplicating docs/docsAndPositions (and soon bulkPostings) on TermsEnum, once for TermState and once without, we should just add a seek(TermState)? And then the single docs/docsAndPositions/etc. method can be used to get the enum for that term. (Likewise for Terms) Also, we should remove docFreq and ord from TermsEnum since you should get it from TermState? I think IndexReader can offer the sugar methods (that take either BytesRef term or String field + TermState state). Also: I tried to run the benchmark on beast but unfortunately there's a bug somewhere (even though Lucene core tests pass) -- I see different results for some fuzzy queries. Nice work!! Getting to single term lookup for all queries will be awesome! > MTQ rewrite + weight/scorer init should be single pass > ------------------------------------------------------ > > Key: LUCENE-2694 > URL: https://issues.apache.org/jira/browse/LUCENE-2694 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-2694.patch, LUCENE-2694.patch > > > Spinoff of LUCENE-2690 (see the hacked patch on that issue)... > Once we fix MTQ rewrite to be per-segment, we should take it further and make > weight/scorer init also run in the same single pass as rewrite. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org