> On Nov 15, 2018, at 7:50 PM, Benjamin Manes <ben.ma...@gmail.com> wrote: > > Jeremy Manson describes the process fairly well in an old blog post > <http://jeremymanson.blogspot.com/2010/02/garbage-collection-softreferences.html>. > > Because soft references are based on the amount of memory given to JVM, the > cost increases as more memory is added. This can be surprising > <https://bugs.java.com/bugdatabase/view_bug.do;jsessionid=cfd518f51afc7780e5188276b5f9?bug_id=6912889>, > e.g. if abused to fill up the heap then full collections will happen much > more frequently. When resolving by increasing the instance size the problem > gets worse.
There are two factors taken into consideration when deciding to process a soft reference. The first is the amount of free memory in heap and the second would be time since last access. So, increasing heap will result in more soft references being accumulated but I’m not sure that this is the only cause of longer pause times. With generational collectors at some point scan for roots will dominate and that activity is linear to the size of heap. IOWs, larger heap yields longer scan for root times. The next inflation would be copy costs (compaction). Finally, processing SoftReference is generally more expensive than processing any other reference type. That said, IME, it’s ~ constant for any application irregardless of the size of heap. > > I believe that this is less of a problem on region-based collectors, like G1, > compared to generational ones. The regions are independant and can be > evacuated much more aggressively, so the impact of impact on pause times may > be lessened. Of course, that also negatively impact application cache hit > rates if used in that manner. SoftReference.. in fact any reference type can cause the collector to degenerate into a condition where the only way to recover is via Full GC. It’s a bug and it’s a bugger to reproduce so.. a bugger to debug but I’ve seen it in many many different environments in G1 running in 7. 8. 9. 10, and 11. > > Does it even make sense to use weakValues > > Yes, weak values can be aggressively collected, so the impact is much > smaller. However their use cases is often quite different. I’m not sure what you mean here by aggressively collected. The WeakReference collection policy is less expensive to calculate than Soft, final for Phantom but I wouldn’t say it’s aggressively collected. But then, I’d need to understand what you mean by aggressively collected. > > what is the recommended way to cache objects? > > Prefer an explicit size bound on strongly referenced objects, expose a > setting, and report the statistics for monitoring. > > As soft references are evicted using a global LRU when there is GC pressure, > very poor victims may be chosen. For example a performance sensitive > application cache might have its entries removed in favor of keeping noisy, > low value entries from a cache buried deep within an external dependency. It > can be difficult to predict and replicate performance problems, since the > cache is reactive to environmental conditions. > > An application designed cache can also leverage algorithmic improvements. > Modern eviction policies can significantly outperform > <https://github.com/ben-manes/caffeine/wiki/Efficiency> LRU when frequency > provides a better indicator. In those cases it may also be more GC hygienic > as low value items are quickly evicted, whereas LRU can to cause unnecessary > old gen promotion due to aging items much more slowly. > > Historically the arguments in favor of soft caches was application simplicity > (less tuning) and better concurrency (no explicit locking as GC maintains > it). Neither bore out except in demo code and microbenchmarks, and tend to > result in worse behavior overall. You should strive to remove soft references > from your application and generally be very weary of their usage when > encountered. SoftReference buts stress on the allocators, mutators, and the collector… however all reference types put some level of stress on the allocators and the collector (and sometimes the mutators). That said, reference types allow the garbage collector, which has a global view, to make decisions about object lifecycle that otherwise would be difficult if not impossible to make. There are use cases for them as has been pointed out here. So, if you need them…. you need them... Kind regards, Kirk > > On Thursday, November 15, 2018 at 10:22:05 AM UTC-8, Siva Velusamy wrote: > In the thread below, gil@ makes the following statement: > > > For most GC algorithms, or at least for the more costly parts of such > > algorithms, > > GC efficiency is roughly linear to EmptyHeap/LiveSet. ... > > This should [hopefully] make it obvious why using SoftReferences is a > > generally terrible idea. > > I'm not following that conclusion. Is that because SoftReferenced objects are > still considered to be part of the LiveSet in the calculation above, and that > leads to increased GC cost? > > As a follow up, what is the recommended way to cache objects? Currently many > places in our codebase use Guava's CacheBuilder with softValues > <https://google.github.io/guava/releases/snapshot/api/docs/com/google/common/cache/CacheBuilder.html#softValues-->. > Does it even make sense to use weakValues, or is the suggestion to just use > strong values, but try to restrict the number of entries? > > ---------- Forwarded message --------- > From: Gil Tene <g...@azul.com <>> > Date: Sat, Nov 10, 2018 at 8:51 AM > Subject: Re: Sorting a very large number of objects > To: mechanical-sympathy <mechanica...@googlegroups.com <>> > > > > > On Friday, November 9, 2018 at 7:08:23 AM UTC-8, Shevek wrote: > Hi, > > I'm trying to sort/merge a very large number of objects in Java, and > failing more spectacularly than normal. The way I'm doing it is this: > > * Read a bunch of objects into an array. > * Sort the array, then merge neighbouring objects as appropriate. > * Re-fill the array, re-sort, re-merge until compaction is "not very > successful". > * Dump the array to file, repeat for next array. > * Then stream all files through a final merge/combine phase. > > This is failing largely because I have no idea how large to make the > array. Estimating the ongoing size using something like JAMM is too > slow, and my hand-rolled memory estimator is too unreliable. > > The thing that seems to be working best is messing around with the array > size in order to keep some concept of runtime.maxMemory() - > runtime.totalMemory() + runtime.freeMemory() within a useful bound. > > But there must be a better solution. I can't quite think a way around > this with SoftReference because I need to dump the data to disk when the > reference gets broken, and defeating me right now. > > Other alternatives would include keeping all my in-memory data > structures in serialized form, and paying the ser/deser cost to compare, > but that's expensive - my main overhead right now is gc. Serialization > is protobuf, although that's changeable, since it's annoying the hell > out of me (please don't say thrift - but protobuf appears to have no way > to read from a stream into a reusable object - it has to allocate the > world every single time). > > In general, whenever I see "my overhead is gc" and "unknown memory size" > together, I see it as a sign of someone pushing heap utilization high and > getting into the inefficient GC state. Simplistically, you should be able to > drop the GC cost to an arbitrary % of overall computation cost by increasing > the amount (or relative portion) of empty heap in your set. So GC should > never be "a bottleneck" from a throughput point of view unless you have > constraints (such as a minimum required live set and a maximum possible heap > size) that force you towards a high utilization of the heap (in terms of > LiveSet/HeapSize). The answer to such a situation is generally "get some more > RAM for this problem" rather than put in tons of work to fit this in". > > For most GC algorithms, or at least for the more costly parts of such > algorithms, GC efficiency is roughly linear to EmptyHeap/LiveSet. Stated > otherwise, GC cost grows with LiveSet/EmptyHeap or > LiveSet/(HeapSize-LiveSet). As you grow the amount you try to cram into a > heap of a given size, you increase the GC cost to the square of your cramming > efforts. And for every doubling of the empty heap [for a given live set] you > will generally half the GC cost. > > This should [hopefully] make it obvious why using SoftReferences is a > generally terrible idea. > > > Issues: > * This routine is not the sole tenant of the JVM. Other things use RAM. > > You can try to establish what an "efficient enough" heap utilization level is > for your use case (a level that keeps overall GC work as a % of CPU spend to > e.g. below 10%), and keep your heap use to a related fraction of whatever > heap size you get to have on the system you land on. > > * This has to be deployed and work on systems whose memory config is > unknown to me. > > Can anybody please give me pointers? > > S. > > -- > You received this message because you are subscribed to the Google Groups > "mechanical-sympathy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to mechanical-sympathy+unsubscr...@googlegroups.com <>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. > > -- > You received this message because you are subscribed to the Google Groups > "mechanical-sympathy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to mechanical-sympathy+unsubscr...@googlegroups.com > <mailto:mechanical-sympathy+unsubscr...@googlegroups.com>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.