[ https://issues.apache.org/jira/browse/LUCENE-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737871#action_12737871 ]
Michael McCandless commented on LUCENE-1749: -------------------------------------------- This was an excellent idea, and it's great that it uncovered some dangerous and very unexpected places where we are passing top-level reader to the FieldCache (eg that explain() could suddenly populate the FieldCache w/ top-level values is quite shocking!). ReaderUtil.subSearcher is doing the same thing as DirectoryReader.readerIndex. I love the RAMUsageEstimator... we have other places that estimate RAM (eg IndexWriter does so for added & deleted docs) that we should eventually cutover to this new API. I particularly love the new class named Insanity: {code} public static Insanity[] checkSanity(FieldCache cache) {code} MultiDocIdSet/Iterator makes me a bit nervous, because it's further "propogating" a non-segment-based iterator deeper into Lucene than I think we want to. It's similar to eg using DirectoryReader.MultiTermDocs (what Lucene used to do), instead of stepping through the segments yourself. Also, shouldn't explain most closely match what was done during searching (ie, run "per segment")? So simply pushing explain down to the sub-reader that has the doc seems appropriate? Ie we want it to share as much of the code path as possible with how searching was in fact done? EG for ConstantScoreQuery.explain, it seems like we should 1) locate the sub-reader that this doc falls in, and 2) get a scorer against that reader, then 3) build up the explanation from that? Likewise for CustomScoreQuery? In fact.... maybe we should simply fix IndexSearcher.explain to do this for all queries? Ie, get the top-level weight, locate sub-reader that has the doc, un-base the doc, and then invoke QueryWeight.explain with that sub-reader and un-based doc? Then we don't have to do anything special for each query. I think QueryWeight.scorer() shouldn't be expected to handle a "top level reader" being passed in. Ie, higher up in Lucene we should do that switch, so that we don't have to do it (this "valuesFromSubReaders" arg) for every scorer. Hmm: why do we even have explain at both the QueryWeight and Scorer "levels"? It seems like we should pick one level and do it there, consistently. Most queries seem to only implement the QueryWeight one and often simply throw UOE in the Scorer's explain, but eg PhraseQuery implements in both places. (BTW: I'll be offline for approx the next 36 hours or so!) > FieldCache introspection API > ---------------------------- > > Key: LUCENE-1749 > URL: https://issues.apache.org/jira/browse/LUCENE-1749 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Hoss Man > Priority: Minor > Fix For: 2.9 > > Attachments: fieldcache-introspection.patch, > LUCENE-1749-hossfork.patch, LUCENE-1749.patch, LUCENE-1749.patch, > LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, > LUCENE-1749.patch, LUCENE-1749.patch > > > FieldCache should expose an Expert level API for runtime introspection of the > FieldCache to provide info about what is in the FieldCache at any given > moment. We should also provide utility methods for sanity checking that the > FieldCache doesn't contain anything "odd"... > * entries for the same reader/field with different types/parsers > * entries for the same field/type/parser in a reader and it's subreader(s) > * etc... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org