[ https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264309#comment-15264309 ]
Jeff Wartes commented on LUCENE-7258: ------------------------------------- I'm not sure I understand how the dangers of large FBS size would be any different with a pooling mechanism than they are right now. If a query needs several of them, then it needs several of them, whether they're freshly allocated or not. The only real difference I see might be whether that memory exists in the tenured space, rather than thrashing the eden space every time. I don't think it'd need to be per-thread. I don't mind points of synchronization if they're tight and well understood. Allocation rate by count is generally lower here. One thought: https://gist.github.com/randomstatistic/87caefdea8435d6af4ad13a3f92d2698 To anticipate some objections, there are likely lockless data structures you could use, and yes, you might prefer to control size in terms of memory instead of count. I can think of a dozen improvements per minute I spend looking at this. But you get the idea. Anyone anywhere who knows for *sure* they're done with a FBS can offer it up for reuse, and anyone can potentially get some reuse by just changing their "new" to "request". If everybody does this, you end up with a fairly steady pool of FBS instances large enough for most uses. If only some places use it, there's no chance of an unbounded leak, you might get some gain, and worst-case you haven't lost much. If nobody uses it, you've lost nothing. Last I checked, something like a full 50% of (my) allocations by size were FixedBitSets despite a low allocation rate by count, or I wouldn't be harping on the subject. As a matter of principle, I'd gladly pay heap to reduce GC. The fastest search algorithm in the world doesn't help me if I'm stuck waiting for the collector to finish all the time. > Tune DocIdSetBuilder allocation rate > ------------------------------------ > > Key: LUCENE-7258 > URL: https://issues.apache.org/jira/browse/LUCENE-7258 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spatial > Reporter: Jeff Wartes > Attachments: > LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, > LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, > allocation_plot.jpg > > > LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but > didn't actually reduce garbage generation for my Solr index. > Since something like 40% of my garbage (by space) is now attributed to > DocIdSetBuilder.growBuffer, I charted a few different allocation strategies > to see if I could tune things more. > See here: http://i.imgur.com/7sXLAYv.jpg > The jump-then-flatline at the right would be where DocIdSetBuilder gives up > and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index > curve/cutoff looked similar) > Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is > terrible from an allocation standpoint if you're doing a lot of expansions, > and is especially terrible when used to build a short-lived data structure > like this one. > By the time it goes with the FBS, it's allocated around twice as much memory > for the buffer as it would have needed for just the FBS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org