[ 
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267176#comment-15267176
 ] 

Jeff Wartes commented on LUCENE-7258:
-------------------------------------

Ok, yeah, that’s a reasonable thing to assume. We usually think of it in terms 
of cpu work, but filter caches would be an equally great way to mitigate 
allocations. But a cache is really only useful when you’ve got non-uniform 
query distributions, or enough time-locality at your query rate that your rare 
queries haven’t faced a cache eviction yet. 

I’m indexing address-type data. Not uncommon. I think that if my typical 
geospatial search were based on some hyper-local phone location, we’d be done 
talking, since a filter cache would be useless.  

So maybe we should assume I’m not doing that.

Let’s assume I can get away with something coarse. Let’s assume I can convert 
all location based queries to the center point of a city. Let’s further assume 
that I only care about one radius per city. Finally, let’s assume I’m only 
searching in the US. There are some 40,000 cities in the US, so those 
assumptions yield 40,000 possible queries. That’s not too bad. 

With a 100M-doc core, I think that’s about 12.5Mb per filter cache entry. It 
could be less, I think, particularly with the changes in SOLR-8922, but since 
we’re only going with coarse queries, it’s reasonable to assume there’s going 
to be a lot of hits. 
I don’t need every city in the cache, of course, so maybe… 5%? That’s only some 
25G of heap. 
Doable, especially since it saves allocation size and you could probably trade 
in more of the eden space. (Although this would make warmup more of a pain) I’d 
probably have to cross the CompressedOops boundary at 32G of heap to do that 
too though, so add another 16G to get back to baseline.

Fortunately, the top 5% of cities probably maps to more than 5% of queries. 
More populated cities are also more likely targets for searching in most query 
corpuses. So assuming it’s the biggest 5% that are in the cache, maybe we can 
assume a 15% hit rate? 20%?

Ok, so now I’ve spent something like 41G of heap, and I’ve reduced allocations 
by 20%. Is this pretty good?

I suppose it’s worth noting that this also assumes a perfect cache eviction 
policy, (I’m pretty interested in SOLR-8241) and that there’s no other filter 
cache pressure. (At the least, I’m using facets - SOLR-8171)


> Tune DocIdSetBuilder allocation rate
> ------------------------------------
>
>                 Key: LUCENE-7258
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7258
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial
>            Reporter: Jeff Wartes
>         Attachments: 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> allocation_plot.jpg
>
>
> LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but 
> didn't actually reduce garbage generation for my Solr index.
> Since something like 40% of my garbage (by space) is now attributed to 
> DocIdSetBuilder.growBuffer, I charted a few different allocation strategies 
> to see if I could tune things more. 
> See here: http://i.imgur.com/7sXLAYv.jpg 
> The jump-then-flatline at the right would be where DocIdSetBuilder gives up 
> and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index 
> curve/cutoff looked similar)
> Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is 
> terrible from an allocation standpoint if you're doing a lot of expansions, 
> and is especially terrible when used to build a short-lived data structure 
> like this one.
> By the time it goes with the FBS, it's allocated around twice as much memory 
> for the buffer as it would have needed for just the FBS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to