I think we need to do more testing to determine the overall impact on
performance.  I've just run a micro benchmark to measure performance of
the current master branch with and without the query component.  What
I've done:

  * Created an index of 150,000 documents, each containing a few
    thousand random words of content and between 1 and 1000 "readers"
    (the index was a little over 5GB on disk)

  * 100 virtual users, each with their own set of up to 1000 principals
    (representing their group memberships)

  * A background thread performing a soft commit against Solr once per
    second

  * 10 threads concurrently performing a total of 10,000 very simple
    queries.  Each is just a search on single dictionary word, with the
    results limited to the ACLs of one of our 100 virtual users.

Confused?  Me too.  But what I see:

Without the query component, over the 10,000 queries:

  Average query time (reported by Solr): 14ms
  Standard deviation: 97ms

With the query component, over the 10,000 queries:

  Average query time (reported by Solr): 1ms
  Standard deviation: 3ms

Seems almost too good to be true (and you can never trust a
microbenchmark :), but at least it's not a total disaster!  Keep in mind
that a 14x improvement in performance is unlikely--the queries I'm doing
here are so simple that the bulk of the query time is taken up by the
overhead of the ACL filters.

I'd expect to see some performance improvement, though, because the
query component is able to cache its ACL filters across commits, whereas
the standard Solr caches will be completely flushed after each commit
(soft or otherwise).  That's why we see such high variability without
the query component--a query that hits a cold cache needs to do a lot
more work than one that can reuse a cached filter, and with frequent
commits that happens quite often.

Cheers,

Mark


"Roma, David" <[email protected]> writes:

> Sounds good..
>
> Does this also provide a performance boost? Would be interesting to
> see some numbers if we are at the point where performance tests
> require minimal effort.
>
> Cheers,
> Dave.
>
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Carl Hall
> Sent: Tuesday, 26 June 2012 1:13 AM
> To: Mark Triggs
> Cc: OAE dev list
> Subject: Re: [oae-dev] Status of Nakamura query component
>
> I'd like to see the new query component come back into action. Without
> it, we easily stand the chance of an ACL list or deleted items cache
> breaking queries.
>
> For those wondering what this component does, it was found that when a
> large number of things are deleted or very large ACL is encountered,
> we add enough terms to the query to go over the 1024 limit
> (configurable). Mark introduced a query component that works around
> adding these to the query to keep from hitting the limit but achieve
> the same result (ACLs applied, deleted items not shown).

-- 
Mark Triggs
<[email protected]>
_______________________________________________
oae-dev mailing list
[email protected]
http://collab.sakaiproject.org/mailman/listinfo/oae-dev

Reply via email to