[ https://issues.apache.org/jira/browse/LUCENE-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886760#comment-13886760 ]
David Smiley commented on LUCENE-5418: -------------------------------------- How ironic that I was contemplating this very same issue yesterday (shared on IRC #lucene-dev) as I work on LUCENE-5408 and now I see you guys were just thinking about it. Rob's right; the problem isn't just advance(), it's next() too. There may be a place to share some code that Mike is committing here in his facet module with a static utility class I coded yesterday in LUCENE-5408 (not yet posted). It's a BitsDocIdSet and it's roughly similar to Mike's SlowBitsDocIdSetIterator: {code:java} /** Utility class that wraps a {@link Bits} with a {@link DocIdSet}. */ private static class BitsDocIdSet extends DocIdSet { final Bits bits;//not null public BitsDocIdSet(Bits bits) { if (bits == null) throw new NullPointerException("bits arg should be non-null"); this.bits = bits; } @Override public DocIdSetIterator iterator() throws IOException { return new DocIdSetIterator() { final Bits bits = BitsDocIdSet.this.bits;//copy reference to reduce outer class access int docId = -1; @Override public int docID() { return docId; } @Override public int nextDoc() throws IOException { return advance(docId + 1); } @Override public int advance(int target) throws IOException { for (docId = target; docId < bits.length(); docId++) { if (bits.get(docId)) return docId; } return NO_MORE_DOCS; } @Override public long cost() { return bits.length(); } }; } @Override public Bits bits() throws IOException { return bits;//won't be null } //we don't override isCacheable because we want the default of false }//class BitsDocIdSet {code} So Mike; you've got just the DISI portion, and you're also incorporating acceptDocs. For me I elected to have acceptDocs be pre-incorporated into the Bits I pass through. I'll post my intermediate progress on LUCENE-5408. So any way; how about we have something in the "utils" package to share? > Don't use .advance on costly (e.g. distance range facets) filters > ----------------------------------------------------------------- > > Key: LUCENE-5418 > URL: https://issues.apache.org/jira/browse/LUCENE-5418 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 5.0, 4.7 > > Attachments: LUCENE-5418.patch > > > If you use a distance filter today (see > http://blog.mikemccandless.com/2014/01/geospatial-distance-faceting-using.html > ), then drill down on one of those ranges, under the hood Lucene is using > .advance on the Filter, which is very costly because we end up computing > distance on (possibly many) hits that don't match the query. > It's better performance to find the hits matching the Query first, and then > check the filter. > FilteredQuery can already do this today, when you use its > QUERY_FIRST_FILTER_STRATEGY. This essentially accomplishes the same thing as > Solr's "post filters" (I think?) but with a far simpler/better/less code > approach. > E.g., I believe ElasticSearch uses this API when it applies costly filters. > Longish term, I think Query/Filter ought to know itself that it's expensive, > and cases where such a Query/Filter is MUST'd onto a BooleanQuery (e.g. > ConstantScoreQuery), or the Filter is a clause in BooleanFilter, or it's > passed to IndexSearcher.search, we should also be "smart" here and not call > .advance on such clauses. But that'd be a biggish change ... so for today > the "workaround" is the user must carefully construct the FilteredQuery > themselves. > In the mean time, as another workaround, I want to fix DrillSideways so that > when you drill down on such filters it doesn't use .advance; this should give > a good speedup for the "normal path" API usage with a costly filter. > I'm iterating on the lucene server branch (LUCENE-5376) but once it's working > I plan to merge this back to trunk / 4.7. -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org