Hi Kim,

On 06/09/2015 07:44 AM, Peter Levart wrote:
Hi Kim,

Thanks for taking a look at this code.

On 06/09/2015 04:03 AM, Kim Barrett wrote:
On May 31, 2015, at 3:32 PM, Peter Levart<peter.lev...@gmail.com>  wrote:
So, for the curious, here's the improved prototype:

     
http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/webrev.02/

And here are the improved benchmarks (with some results inline):

     http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/refproc/
While perusing the offered code (webrev.02) (not a careful review
yet), I think I've found a serious bug:

In Reference.java
in unhookPendingChunk
  253             // assert r.next == r;
  254             // move a chunk of pending/discovered references to a
  255             // temporary local r/next chain
...
  262                 rd.next = r;

I think that assertion suggested at line 253 does not hold, and line
262 may damage a queue.

The problem is that the application could explicitly enqueue a
reference while it is sitting in the pending list, waiting to be
processed. So the enqueue places the reference on the associated
queue's list, linked through the reference's next field. Then unhook
comes along and clobbers the reference's next field. Oops!

handlePendingChunk has similar problems with an application enqueue
before the reference has been "handled".

I haven't looked carefully at webrev.03, but at a quick glance it
looks like it has the same problem. (Which is what I expected, given
the description of the differences between webrev.02 and webrev.03.)

I'm not sure why unhook is using the next field at all, rather than
just continuing to use the discovered field. I've not studied this
idea carefully, but I think it might work to leave the chunks linked
through the discovered field until addChunk, with that being
responsible for self-looping the discovered field and transferring
linkage to the next field. That is, treat chunks as extensions of the
pending list until addChunk-time. There might be some that makes using
the discovered field that way a problem, but I'm not thinking of
anything right now.

Of course, this problem doesn't exist for FinalReference because
application code doesn't have access to them to perform the
problematic explicit enqueue.



Ops, you're right. Explicit enqueue-ing is what skipped my mind since I've been 1st working on finalizers and then generalized the solution... I'm afraid to use the 'discovered' field since it is in the domain of VM and I think Java side can only reset it to null while holding the lock. I also would not like to add another field to Reference just to support "chunking". Perhaps the chunk could be formed as a re-usable array of references. Let me think about this some more to figure out how to solve it most elegantly...

I'll follow-up when I have something.

Regards, Peter


I think I was too afraid to use the 'discovered' field. I now think the 'discovered' field is the right field to use to form "chunks" of pending references. If I read the VM reference handling code correctly, then 'discovered' field is treated as a normal instance field as far as GC is concerned as soon as the Reference is not active any more (reference.next != null). All pending references found by Java code navigating 'pending' chain are already "de-activated" by GC when discovered (reference.next == reference). So it is safe to manipulate the 'discovered' field in Java code. Here's the modified prototype that uses 'discovered' field to form chunks of references:

http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/webrev.05/

Note that enqueue-ing a chunk or a single Reference is a two-phase process...

1st the reference is ReferenceQueue.markEnqueued(Reference) which is a serialization point. After that point the Reference's .next field is only manipulated by a single thread. This happens in two places: - ReferenceQueue.enqueue(reference), line 60 - called by public Reference.enqueue()
- Reference.handlePendingChunk(chunk), line 317

2nd the chain of .next-field-connected References is added (RefereceQueue.addChunk(head, tail), line 83) to the queue.

With this arrangement, I get even less CPU overhead when running the benchmark. Most probably because unhooking the chunk of pending references now doesn't involve writes to many fields - just two writes per chunk: the 'discovered' field of the last Reference in chunk is reset to null and the 'pending' static field is set to point to the 1st of the remaining References:


Finalization throughput, ORIGINAL

  construction threads: 1
  instances per thread: 10000000
  cons. per 1 ms sleep: 100
~5s interval cumulative active pool queued t[ms] construct. destruct. in-flight con./ms des./ms con./ms des./ms thr. size tasks ------- ---------- ---------- ---------- ------- ------- ------- ------- ----- ----- ----------
    5000     441600          0     441600      88 0      88       0
   10002     883400          0     883400      88 0      88       0
   15005    1301400     244748    1056652      83 48      86      16
   20007    1747100     244748    1502352      89 0      87      12
   25009    2192600     244748    1947852      89 0      87       9
   30011    2625000     488320    2136680      86 48      87      16
   35013    3069900     488320    2581580      88 0      87      13
   40015    3514600     488320    3026280      88 0      87      12
   45017    3956200     488320    3467880      88 0      87      10
   50018    4397600     488320    3909280      88 0      87       9
   55020    4703200    4501263     201937      61 802      85      81
   60022    5152200    4501263     650937      89 0      85      74
   65023    5604600    4501263    1103337      90 0      86      69
   70024    6051400    4501263    1550137      89 0      86      64
   75025    6503100    4501263    2001837      90 0      86      59
   80026    6880800    6831578      49222      75 466      85      85
   85027    7329700    6831578     498122      89 0      86      80
   90027    7775900    6831578     944322      89 0      86      75
   95028    8221800    6831578    1390222      89 0      86      71
  100029    8666400    6831578    1834822      88 0      86      68
  105030    9111000    6831578    2279422      88 0      86      65
  110031    9559200    6831578    2727622      89 0      86      62
  115032   10000000    6831578    3168422      88 0      86      59
  120792   10000000   10000000          0       0 550      82      82

real    2m0.962s
user    0m32.641s
sys     0m1.897s


Finalization throughput, PATCHED

  construction threads: 1
  instances per thread: 10000000
  cons. per 1 ms sleep: 100
~5s interval cumulative active pool queued t[ms] construct. destruct. in-flight con./ms des./ms con./ms des./ms thr. size tasks ------- ---------- ---------- ---------- ------- ------- ------- ------- ----- ----- ---------- 5000 434800 0 434800 86 0 86 0 0 0 0 10001 868600 0 868600 86 0 86 0 0 0 0 15002 1294800 244498 1050302 85 48 86 16 0 0 0 20005 1727700 244498 1483202 86 0 86 12 0 0 0 25006 2162200 244498 1917702 86 0 86 9 0 0 0 30009 2592400 489422 2102978 85 48 86 16 0 0 0 35010 3026800 489422 2537378 86 0 86 13 0 0 0 40011 3417000 3349549 67451 78 571 85 83 0 1 0 45012 3851700 3349549 502151 86 0 85 74 0 0 0 50013 4286900 3349549 937351 87 0 85 66 0 0 0 55015 4713500 3611309 1102191 85 52 85 65 0 0 0 60017 5147400 3611309 1536091 86 0 85 60 0 0 0 65018 5582500 3611309 1971191 87 0 85 55 0 0 0 70019 5981300 5656682 324618 79 409 85 80 0 0 0 75019 6407400 6397329 10071 85 148 85 85 0 1 0 80020 6844300 6397329 446971 87 0 85 79 0 0 0 85021 7281100 6397329 883771 87 0 85 75 0 0 0 90022 7702600 7407984 294616 84 202 85 82 0 0 0 95022 8136700 7407984 728716 86 0 85 77 0 0 0 100024 8560400 8412857 147543 84 200 85 84 0 1 0 105024 8996200 8412857 583343 87 0 85 80 0 0 0 110025 9430300 8412857 1017443 86 0 85 76 0 0 0 115025 9853700 9480645 373055 84 213 85 82 0 0 0 120026 10000000 9480645 519355 29 0 83 78 0 0 0 125220 10000000 10000000 0 0 99 79 79 0 0 0

real    2m5.323s
user    0m13.868s
sys     0m0.971s

-----

WeakReference processing throughput, ORIGINAL

  construction threads: 1
  instances per thread: 10000000
  cons. per 1 ms sleep: 100
~5s interval cumulative active pool queued t[ms] construct. destruct. in-flight con./ms des./ms con./ms des./ms thr. size tasks ------- ---------- ---------- ---------- ------- ------- ------- ------- ----- ----- ----------
    5000     450800          0     450800      90 0      90       0
   10004     902500          0     902500      90 0      90       0
   15006    1352000     204006    1147994      89 40      90      13
   20009    1807000     204006    1602994      90 0      90      10
   25011    2254600     409643    1844957      89 41      90      16
   30013    2708900     409643    2299257      90 0      90      13
   35015    3132900    2934351     198549      84 504      89      83
   40017    3588100    2934351     653749      91 0      89      73
   45018    4036100    3152591     883509      89 43      89      70
   50020    4491100    3152591    1338509      90 0      89      63
   55022    4946200    3152591    1793609      90 0      89      57
   60023    5376000    4974927     401073      85 364      89      82
   65025    5824300    5555450     268850      89 116      89      85
   70027    6279100    5555450     723650      90 0      89      79
   75029    6726400    6299987     426413      89 148      89      83
   80030    7171800    6988921     182879      89 137      89      87
   85032    7627000    6988921     638079      91 0      89      82
   90033    8073200    7561200     512000      89 114      89      83
   95034    8527900    7561200     966700      90 0      89      79
  100035    8971400    8502933     468467      88 188      89      84
  105036    9426300    8502933     923367      90 0      89      80
  110037    9880500    8502933    1377567      90 0      89      77
  115038   10000000    8502933    1497067      23 0      86      73
  120390   10000000   10000000          0       0 279      83      83

real    2m0.557s
user    0m17.642s
sys     0m1.998s


WeakReference processing throughput, PATCHED

  construction threads: 1
  instances per thread: 10000000
  cons. per 1 ms sleep: 100
~5s interval cumulative active pool queued t[ms] construct. destruct. in-flight con./ms des./ms con./ms des./ms thr. size tasks ------- ---------- ---------- ---------- ------- ------- ------- ------- ----- ----- ---------- 5000 437800 0 437800 87 0 87 0 0 0 0 10001 873900 0 873900 87 0 87 0 0 0 0 15003 1308021 203402 1104619 86 40 87 13 0 0 0 20004 1746100 203402 1542698 87 0 87 10 0 0 0 25005 2180100 407360 1772740 86 40 87 16 0 0 0 30007 2616800 407360 2209440 87 0 87 13 0 0 0 35008 3026600 2902826 123774 81 499 86 82 0 1 0 40009 3463600 2902826 560774 87 0 86 72 0 0 0 45010 3900200 2902826 997374 87 0 86 64 0 0 0 50011 4331300 3121066 1210234 86 43 86 62 0 0 0 55012 4768314 3121066 1647248 87 0 86 56 0 0 0 60013 5187400 4923274 264126 83 360 86 82 0 0 0 65013 5619500 5579423 40077 86 131 86 85 0 1 0 70014 6061800 5579423 482377 88 0 86 79 0 0 0 75015 6497100 5688790 808310 87 21 86 75 1 1 0 80015 6937500 6462159 475341 88 154 86 80 0 0 0 85019 7378600 6462159 916441 88 0 86 76 0 0 0 90019 7813800 7083305 730495 87 124 86 78 0 0 0 95020 8250500 7083305 1167195 87 0 86 74 0 0 0 100021 8683100 7630681 1052419 86 109 86 76 0 0 0 105021 9122100 7630681 1491419 87 0 86 72 0 0 0 110022 9554091 8565311 988780 86 186 86 77 0 0 0 115023 9993400 8565311 1428089 87 0 86 74 0 0 0 120023 10000000 8565311 1434689 1 0 83 71 0 0 0 125372 10000000 10000000 0 0 268 79 79 0 0 0

real    2m5.518s
user    0m12.869s
sys     0m1.093s


The prototype contains other things as well - not all of them might be suitable for inclusion in JDK9. They could be selectively removed from it or tailored. But I think they deserve to be mentioned and considered:

Using FJPool as a means to execute Finalizer(s) and other j.l.r.Cleaner(s) which enables easy scaling to more than one worker thread with less overhead than ReferenceQueue based processing.

Public j.l.r.Cleaner interface. Soft, Weak or Phantom Reference implementing this interface is automatically "cleaned" by a "finalization" thread. j.l.r.PhantomCleaner is an example implementation of such PhantomReference which also maintains a doubly-linked list of instances to prevent their reclamation before they are executed. It's API is similar to sun.misc.Cleaner, but can be used by user code since it is executed by same thread(s) as Finalizer(s). Not all applications need their Cleaner Reference(s) to maintain a doubly-linked-list of instances, hence the public j.l.r.Cleaner interface which offers the "raw" functionality. For example, one can imagine a WeakReference based keys of a ConcurrentHashMap that get expunged automatically by finalization thread(s) after their referents become weakly reachable. As CHM already holds the WeakReference keys, no separate maintainance of a doubly-linked list of keys is needed.

Public j.l.r.Finalizator is an example of a FinalReference based j.l.r.Cleaner. While [Phantom|Weak|Soft]Cleaner is more safe since it doesn't allow access to it's referent when fired, some JDK code (File[Input|Output]Stream for example) still uses finalization which would require more effort to be converted to for example PhantomCleaner. Finalizator may allow easier transition of such code as it allows keeping the finalization logic access to the finalizee state. So perhaps it could be made package-private and exposed through SharedSecrets for internal use only?

Are any of the features of this prototype interesting for you and worth pursuing further?

Regards, Peter

Reply via email to