Nobody took the bait on this yet? :-)
Certainly there's a lot of semi-myth on this topic, on both sides. Here's my
source of mythology (or urban legend, as one might have it):
http://www.ibm.com/developerworks/java/library/j-jtp09275/index.html
My concern here is not so much about leaking of the StringBuilder held in a
field, as Chris seemed to be responding to. I'd expect the ObjectInputStream to
be GC'd at some point along with any StringBuilders it contains references to.
I'm more concerned about defeating any optimizations that might occur if local
variables are converted to fields. The article referenced above mentions escape
analysis and possible stack allocation of locally-created objects. Offhand I
don't know if stack allocation occurs for any of the builders in
ObjectInputStream, but it certainly cannot occur if references are stored in fields.
It would good if there were some evidence we could discuss instead of myths and
urban legends. :-) Perhaps the original poster (Robert Stupp) can rerun some
benchmarks with and without the conversion from locals to fields. (Earlier in
the thread he posted some significant performance improvements, but my suspicion
is that most of that improvement came from the conversion to use Unsafe.)
I'm mindful that this may require a lot of effort. I think it would take a fair
bit of work to come up with a benchmark that shows any difference between the
two cases. I'm also mindful that one's intuition is often wrong.
s'marks
On 1/31/14 5:53 PM, Vitaly Davidovich wrote:
One would have to measure of course, but intuitively, it seems like a good
change to reuse the stringbuilder. There's also the issue that the local
stringbuilder before was default-sized, and for large content, it would
generate even more garbage as the underlying array is expanded.
The "temporary short lived allocations are cheap" is, unfortunately, a
semi-myth, or at least somewhat misguided. It's true that individually they
may be cheap, but they have macro effects. The higher the allocation rate,
the more young GCs we get. Every young GC (on hotspot collectors, at least)
is a STW pause. Bringing threads to a safepoint has some cost, especially if
there are many of them on large many-core machines. With larger heaps these
days, young gens tend to be larger as well. When GC runs, it trashes the cpu
caches for the mutators, so when they resume, they may get cache misses. At
each young GC, some objects may get promoted prematurely to tenured.
So no, I wouldn't say it's quite inexpensive :). When you have no option but
to allocate, it's nice to have collectors that can handle those necessary
allocations well. Otherwise, if it's a perf sensitive area and avoiding
allocations doesn't obfuscate or make the code significantly less
maintainable, it's usually better to avoid allocations.
Just my $.02
Sent from my phone
On Jan 31, 2014 2:06 PM, "Stuart Marks" <stuart.ma...@oracle.com
<mailto:stuart.ma...@oracle.com>> wrote:
On 1/31/14 10:46 AM, Chris Hegarty wrote:
I think your patch can be split into two logical, independent, parts.
The
first is the use of unsafe to access the String UTF length. The seconds
is
to reuse, where possible, the existing StringBuilder instances,
skipping of
primitive/object reading/writing where applicable, and general cleanup.
Since this is a very sensitive area I would like to consider these
separately. To that end, I have taken the changes that are applicable
to the
latter, removed any subjective stylistic changes, and made some
additional
cleanup improvements.
http://cr.openjdk.java.net/~chegar/serial_stupp.00/webrev/
<http://cr.openjdk.java.net/%7Echegar/serial_stupp.00/webrev/>
I agree with splitting the Unsafe usages so they can be reviewed
separately. New
and changed usage of Unsafe will require exacting scrutiny.
In general, the cleanups and refactorings look fine.
I have a question about the change to reuse StringBuilder instances. This
replaces freshly allocated StringBuilders stored in local variables with
reuse of a StringBuilder stored in a field of BlockDataInputStream, which
in turn is stored in a field of ObjectInputStream. Thus, instead of
creating of lots of temporaries that become gc-eligible very quickly, this
creates a single long-lived object whose size is the high-water mark of
the longest string that's been read. The change does reduce allocation
pressure, but the point of generational GC is to make allocation and
collection of temporaries quite inexpensive. Is this the right tradeoff?
s'marks