[ 
https://issues.apache.org/jira/browse/LUCENE-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886760#comment-13886760
 ] 

David Smiley commented on LUCENE-5418:
--------------------------------------

How ironic that I was contemplating this very same issue yesterday (shared on 
IRC #lucene-dev) as I work on LUCENE-5408 and now I see you guys were just 
thinking about it.  Rob's right; the problem isn't just advance(), it's next() 
too.  

There may be a place to share some code that Mike is committing here in his 
facet module with a static utility class I coded yesterday in LUCENE-5408 (not 
yet posted).  It's a BitsDocIdSet and it's roughly similar to Mike's 
SlowBitsDocIdSetIterator:
{code:java}

  /** Utility class that wraps a {@link Bits} with a {@link DocIdSet}. */
  private static class BitsDocIdSet extends DocIdSet {
    final Bits bits;//not null

    public BitsDocIdSet(Bits bits) {
      if (bits == null)
        throw new NullPointerException("bits arg should be non-null");
      this.bits = bits;
    }

    @Override
    public DocIdSetIterator iterator() throws IOException {
      return new DocIdSetIterator() {
        final Bits bits = BitsDocIdSet.this.bits;//copy reference to reduce 
outer class access
        int docId = -1;

        @Override
        public int docID() {
          return docId;
        }

        @Override
        public int nextDoc() throws IOException {
          return advance(docId + 1);
        }

        @Override
        public int advance(int target) throws IOException {
          for (docId = target; docId < bits.length(); docId++) {
            if (bits.get(docId))
              return docId;
          }
          return NO_MORE_DOCS;
        }

        @Override
        public long cost() {
          return bits.length();
        }
      };
    }

    @Override
    public Bits bits() throws IOException {
      return bits;//won't be null
    }

    //we don't override isCacheable because we want the default of false
  }//class BitsDocIdSet
{code}

So Mike; you've got just the DISI portion, and you're also incorporating 
acceptDocs.  For me I elected to have acceptDocs be pre-incorporated into the 
Bits I pass through.  I'll post my intermediate progress on LUCENE-5408.  So 
any way; how about we have something in the "utils" package to share?

> Don't use .advance on costly (e.g. distance range facets) filters
> -----------------------------------------------------------------
>
>                 Key: LUCENE-5418
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5418
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, 4.7
>
>         Attachments: LUCENE-5418.patch
>
>
> If you use a distance filter today (see 
> http://blog.mikemccandless.com/2014/01/geospatial-distance-faceting-using.html
>  ), then drill down on one of those ranges, under the hood Lucene is using 
> .advance on the Filter, which is very costly because we end up computing 
> distance on (possibly many) hits that don't match the query.
> It's better performance to find the hits matching the Query first, and then 
> check the filter.
> FilteredQuery can already do this today, when you use its 
> QUERY_FIRST_FILTER_STRATEGY.  This essentially accomplishes the same thing as 
> Solr's "post filters" (I think?) but with a far simpler/better/less code 
> approach.
> E.g., I believe ElasticSearch uses this API when it applies costly filters.
> Longish term, I think  Query/Filter ought to know itself that it's expensive, 
> and cases where such a Query/Filter is MUST'd onto a BooleanQuery (e.g. 
> ConstantScoreQuery), or the Filter is a clause in BooleanFilter, or it's 
> passed to IndexSearcher.search, we should also be "smart" here and not call 
> .advance on such clauses.  But that'd be a biggish change ... so for today 
> the "workaround" is the user must carefully construct the FilteredQuery 
> themselves.
> In the mean time, as another workaround, I want to fix DrillSideways so that 
> when you drill down on such filters it doesn't use .advance; this should give 
> a good speedup for the "normal path" API usage with a costly filter.
> I'm iterating on the lucene server branch (LUCENE-5376) but once it's working 
> I plan to merge this back to trunk / 4.7.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to