On 10/07/2013 04:10 PM, Alan Bateman wrote:
On 06/10/2013 23:56, Peter Levart wrote:
:

It's not so much about helping to achieve better throughput (as I noted deallocating can not be effectively parallelized) but to overcome the latency of waking-up the ReferenceHandler thread. Here's my attempt at doing this:

http://cr.openjdk.java.net/~plevart/jdk8-tl/DyrectBufferAlloc/webrev.01/

This is much simplified from my 1st submission of similar strategy. I tried to be as undisruptive to current logic of Reference processing as possible, but of course you decide if this is still too risky for inclusion into JDK8. Cleaner is unchanged - it processes it's thunk synchronously and ReferenceHandler thread invokes it directly. ReferenceHandler logic is the same - I just factored-out the content of the loop into a private method to be able to call it from nio Bits where the bulk of change lies.

:

So what do you think? Is this still too risky for JDK8?

I looked at the latest webrev and I think the approach looks good.

I should explain that I did look into this issue about 3-4 years ago and at the time I experimented with the allocating threads waiting until the reference handler had drained the pending list. I didn't think of doing the assist at the time, hence I was interested to see the # allocations where it helped.

Ask not what reference handler can do for you, ask what you can do for reference handler! ;-)

I first saw the idea in Clif Click's lock-free hash table. Then recently, in the new Doug Lea's ConcurrentHashMap. This last one was fresh and got me thinking...


On the patch then I agree with Aleksey that moving the static initializer makes it less obvious that the only change is registering the shared secret (it's not a big deal of course).

I'll try to make Sdiff not "move" it. (See the answer to Aleksey)


The back-off before retrying looks good, I just wonder if 1ms is too low to start with.

The tests show that in majority of calls (at least on fast CPUs), thread does not sleep at all and if it really must sleep, a single sleep(1) is usually enough. So why sleep more? We have exponential back-off, so on slower CPUs, where longer sleeps might be needed, a couple of iterations more will be enough to reach the right point in time. I'll try the measurement on Raspberry PI. I wonder how it behaves there...

Somewhere I read that the resolution of Thread.sleep() is not 1 ms, but much more. In that case Thread.sleep(1) sleeps much more. That must have been some time ago or on some other platform, because if I run this on current JDK8/Linux:

        for (long d = 1; d < 100; d++) {
            long t0 = System.nanoTime();
            Thread.sleep(d);
            long t1 = System.nanoTime();
System.out.println("sleep(" + d + ") takes " + (t1-t0) + " ns");
        }

I get:

sleep(1) takes 1079078 ns
sleep(2) takes 2058245 ns
sleep(3) takes 3060258 ns
sleep(4) takes 4060121 ns
sleep(5) takes 5061263 ns
sleep(6) takes 6063189 ns
sleep(7) takes 7075132 ns
sleep(8) takes 8071381 ns
sleep(9) takes 9062244 ns
...

...which seems pretty accurate.



On the interrupt then I think it's okay to just set the interrupt status as you are doing.

I think too.


I see you switched the tracking for the management interface to use AtomicLong. Are you looking to improve the concurrency or is there another reason?

With looping over tryReserveMemory and helping ReferenceHandler which calls unreserveMemory from other threads too, the number of monitor acquire/releases per allocation request would increase significantly if we kept synchronized blocks. And with multiple contended threads the overhead would increase too. So I got rid of locks, because it could be done: There's a single accumulator used for regulating reserve/unreserve (totalCapacity) and this can be maintained with READ-CAS-RETRY on reserve and ATOMIC ADD on unreserve. Other values are just for management interface, which doesn't require a "snapshot" view, so they can each be maintained with ATOMIC ADD independently.

A minor coding convention but the break before "else" and "finally" is inconsistent in these areas. Another consistency point is that maxsleeps is a constant and so should probably be in uppercase.

I should've set-up the IDEA's Code Style correctly. Will correct this.


A related piece of work is the FileChannel map implementation where there is a gc + retry if mmap fails. This could be changed to have a similar back-off/retry.


I see. Would it make sense to do it in same patch or separately. This too, will need JavaLangRefAccess.tryHandlePendingReference(), I think, since it similarly ties unmap0 to a Cleaner referencing MappedByteBuffer. The tryMap0 would just be a call to map0 with catching of OOME and returning true/false, right?

Do you happen to know what defines the limit of how many bytes or blocks can be mapped at one time? Is this some parameter for VM or is this just plain OS limit?

On the test then the copyright date is 2001-2007 so I assume this was copied from somewhere :-) I agree with Aleksey on the test duration, especially if you can provoke OOME in less than 10 or 20 seconds on some machines.

Right.


As regards whether this should go into JDK 8 then the updated proposal is significantly less risky that the original proposal that changed the implementation to use weak references.

That said, this is a 13 year old issue that hasn't come up very often (to my knowledge anyway, perhaps because those making heavy use of direct buffers are pooling buffers rather than allocating and unreferencing). In additional we are close to the end of JDK 8 (ZBB is in 2.5 weeks time) and technically we have been in ramp down (meaning P1-P3 only) since mid-July.

Ok then, I'll finish this nevertheless and then it can sit and wait for JDK9.

Regards, Peter


-Alan.





Reply via email to