Re: RFR (S) CR 6857566: (bf) DirectByteBuffer garbage creation can outpace reclamation

Peter Levart Mon, 07 Oct 2013 13:38:02 -0700


On 10/07/2013 04:10 PM, Alan Bateman wrote:

On 06/10/2013 23:56, Peter Levart wrote:
:
It's not so much about helping to achieve better throughput (as Inoted deallocating can not be effectively parallelized) but toovercome the latency of waking-up the ReferenceHandler thread. Here'smy attempt at doing this:
http://cr.openjdk.java.net/~plevart/jdk8-tl/DyrectBufferAlloc/webrev.01/
This is much simplified from my 1st submission of similar strategy. Itried to be as undisruptive to current logic of Reference processingas possible, but of course you decide if this is still too risky forinclusion into JDK8. Cleaner is unchanged - it processes it's thunksynchronously and ReferenceHandler thread invokes it directly.ReferenceHandler logic is the same - I just factored-out the contentof the loop into a private method to be able to call it from nio Bitswhere the bulk of change lies.
:

So what do you think? Is this still too risky for JDK8?
I looked at the latest webrev and I think the approach looks good.
I should explain that I did look into this issue about 3-4 years agoand at the time I experimented with the allocating threads waitinguntil the reference handler had drained the pending list. I didn'tthink of doing the assist at the time, hence I was interested to seethe # allocations where it helped.

Ask not what reference handler can do for you, ask what you can do forreference handler! ;-)

I first saw the idea in Clif Click's lock-free hash table. Thenrecently, in the new Doug Lea's ConcurrentHashMap. This last one wasfresh and got me thinking...

On the patch then I agree with Aleksey that moving the staticinitializer makes it less obvious that the only change is registeringthe shared secret (it's not a big deal of course).


I'll try to make Sdiff not "move" it. (See the answer to Aleksey)

The back-off before retrying looks good, I just wonder if 1ms is toolow to start with.

The tests show that in majority of calls (at least on fast CPUs), threaddoes not sleep at all and if it really must sleep, a single sleep(1) isusually enough. So why sleep more? We have exponential back-off, so onslower CPUs, where longer sleeps might be needed, a couple of iterationsmore will be enough to reach the right point in time. I'll try themeasurement on Raspberry PI. I wonder how it behaves there...

Somewhere I read that the resolution of Thread.sleep() is not 1 ms, butmuch more. In that case Thread.sleep(1) sleeps much more. That must havebeen some time ago or on some other platform, because if I run this oncurrent JDK8/Linux:


        for (long d = 1; d < 100; d++) {
            long t0 = System.nanoTime();
            Thread.sleep(d);
            long t1 = System.nanoTime();

System.out.println("sleep(" + d + ") takes " + (t1-t0) + "ns");

        }

I get:

sleep(1) takes 1079078 ns
sleep(2) takes 2058245 ns
sleep(3) takes 3060258 ns
sleep(4) takes 4060121 ns
sleep(5) takes 5061263 ns
sleep(6) takes 6063189 ns
sleep(7) takes 7075132 ns
sleep(8) takes 8071381 ns
sleep(9) takes 9062244 ns
...

...which seems pretty accurate.

On the interrupt then I think it's okay to just set the interruptstatus as you are doing.


I think too.

I see you switched the tracking for the management interface to useAtomicLong. Are you looking to improve the concurrency or is thereanother reason?

With looping over tryReserveMemory and helping ReferenceHandler whichcalls unreserveMemory from other threads too, the number of monitoracquire/releases per allocation request would increase significantly ifwe kept synchronized blocks. And with multiple contended threads theoverhead would increase too. So I got rid of locks, because it could bedone: There's a single accumulator used for regulating reserve/unreserve(totalCapacity) and this can be maintained with READ-CAS-RETRY onreserve and ATOMIC ADD on unreserve. Other values are just formanagement interface, which doesn't require a "snapshot" view, so theycan each be maintained with ATOMIC ADD independently.

A minor coding convention but the break before "else" and "finally" isinconsistent in these areas. Another consistency point is thatmaxsleeps is a constant and so should probably be in uppercase.


I should've set-up the IDEA's Code Style correctly. Will correct this.

A related piece of work is the FileChannel map implementation wherethere is a gc + retry if mmap fails. This could be changed to have asimilar back-off/retry.

I see. Would it make sense to do it in same patch or separately. Thistoo, will need JavaLangRefAccess.tryHandlePendingReference(), I think,since it similarly ties unmap0 to a Cleaner referencingMappedByteBuffer. The tryMap0 would just be a call to map0 with catchingof OOME and returning true/false, right?

Do you happen to know what defines the limit of how many bytes or blockscan be mapped at one time? Is this some parameter for VM or is this justplain OS limit?

On the test then the copyright date is 2001-2007 so I assume this wascopied from somewhere :-) I agree with Aleksey on the test duration,especially if you can provoke OOME in less than 10 or 20 seconds onsome machines.


Right.

As regards whether this should go into JDK 8 then the updated proposalis significantly less risky that the original proposal that changedthe implementation to use weak references.
That said, this is a 13 year old issue that hasn't come up very often(to my knowledge anyway, perhaps because those making heavy use ofdirect buffers are pooling buffers rather than allocating andunreferencing). In additional we are close to the end of JDK 8 (ZBB isin 2.5 weeks time) and technically we have been in ramp down (meaningP1-P3 only) since mid-July.

Ok then, I'll finish this nevertheless and then it can sit and wait forJDK9.


Regards, Peter


-Alan.

Re: RFR (S) CR 6857566: (bf) DirectByteBuffer garbage creation can outpace reclamation

Reply via email to