[ 
https://issues.apache.org/jira/browse/SOLR-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237853#comment-15237853
 ] 

Jeff Wartes commented on SOLR-8944:
-----------------------------------

Results from applying this patch were quite positive, but for more subtle 
reasons than I'd expected.

To my surprise, the quantity of garbage generated (by size) over my test run 
was mostly unchanged, as was the frequency of collections. However, the garbage 
collector (ParNew) seemed to have a *much* easier time with what was being 
generated. Avg GC pause went down 45%, and max GC pause for the run was cut in 
half. 

I'm not sure I can even speculate on what makes for easier work within ParNew.

>From an allocation rate standpoint, I'm guessing that my test run sits near 
>the edge of where the DocIdSetBuilder's buffer remains efficient from an 
>allocation size perspective. Naively that looks like about a hit rate 
>threshold of 25%, but suspect it's a lot more complicated than that, since 
>DocIdSetBuilder grows the buffer in 1/8th increments and throws away the old 
>allocations, which generates more garbage. (By contrast, SOLR-8922 uses 1/64 
>as the threshold instead of 1/128, but allocates additional space in 2x 
>increments, and doesn't throw away what's already been allocated)

Looking at some before/after memory snapshots, the allocation size attributed 
to long[] in FixedBitSet is indeed down, but mostly replaced by lots of int[] 
allocations attributed to DocIdSetBuilder.growBuffer, as we might expect given 
that overall allocation size didn't change much.

In general, this is a desirable enough patch for my index that I'd be willing 
to move it into a Lucene issue just on it's face, but it still feels like there 
is some room for improvement. I suppose I should have made this a Lucene issue 
in the first place, but given that I'm running with and testing with Solr I 
wasn't sure how that fit.



> Improve geospatial garbage generation
> -------------------------------------
>
>                 Key: SOLR-8944
>                 URL: https://issues.apache.org/jira/browse/SOLR-8944
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Jeff Wartes
>              Labels: spatialrecursiveprefixtreefieldtype
>         Attachments: 
> SOLR-8944-Use-DocIdSetBuilder-instead-of-FixedBitSet.patch
>
>
> I’ve been continuing some analysis into JVM garbage sources in my Solr index. 
> (5.4, 86M docs/core, 56k 99.9th percentile hit count with my query corpus)
> After applying SOLR-8922, I find my biggest source of garbage by a literal 
> order of magnitude (by size) is the long[] allocated by FixedBitSet. From the 
> backtraces, it appears the biggest source of FixBitSet creation in my case 
> (by two orders of magnitude) is my use of queries that involve geospatial 
> filtering.
> Specifically, IntersectsPrefixTreeQuery.getDocIdSet, here:
> https://github.com/apache/lucene-solr/blob/569b6ca9ca439ee82734622f35f6b6342c0e9228/lucene/spatial-extras/src/java/org/apache/lucene/spatial/prefix/IntersectsPrefixTreeQuery.java#L60
> Has this been considered for optimization? I can think of a few paths:
> 1. Persistent Object pools - FixedBitSet size is allocated based on maxDoc, 
> which presumably changes less frequently than queries are issued. If an 
> existing FixedBitSet were not available from a pool, the worst case (create a 
> new one) would be no worse than the current behavior. The complication would 
> be enforcement around when to return the object to the pool, but it looks 
> like this has some lifecycle hooks already.
> 2. I note that a thing called a SparseFixedBitSet already exists, and puts 
> considerable effort into allocating smaller chunks only as necessary. Is this 
> not usable for this purpose? How significant is the performance difference?
> I'd be happy to spend some time on a patch, but I was hoping for a little 
> more data around the current choices before choosing an approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to