Re: Weak Reference proposal

Joris van der Wel Fri, 19 Feb 2016 03:25:06 -0800

Oh and note that if the current version of the proposal (459f825) were
implemented, it would not be usable for jsdom. This is because of the
authority constraints. jsdom supports being run within a different
browser (e.g. so that you can parse html within a Worker, which does
not provide access to the native DOM). As I read it, weak references
would not be available for scripts within a normal web page. So in
jsdom we would have to support the lowest common denominator, which
means no weak references.


I am kind of worried about the authority concept. One of my major use
cases for ecmascript is that I can share modules between server
(node.js) and the browser. I would rather not have an unnecessary
divide of ES features between them. If all sorts of (public) modules
start using weak references for the convenience, it would add a lot of
head aches for me because I can no longer use them easily in the
browser, even though they are not using any node.js api's which would
normally only make sense on the server (reading files, opening random
sockets, etc)

While admittedly I do not have much experience with the internals of
ecmascript engines, surely a side channel attack proof implementation
is possible? For example, would it help to add a random variance to
the timing of condemning objects as soon as they become unreachable?

Gr. Joris

On Wed, Feb 17, 2016 at 7:45 PM, Jonas Sicking <jo...@sicking.cc> wrote:
> Yeah, you are right. NodeIterators, and presumably Ranges, suffer from
> the observer problem. I.e. they want to be notified about mutations to
> the DOM, but only as long as the NodeIterator/Range stay alive.
>
> My understanding is that this is one of the more common scenarios
> where the need for weak-references come up. Where you want to register
> something as an observer, but don't want the notification mechanism to
> hold a strong reference to the observer.
>
> Fortunately though, neither NodeIterators nor Ranges expose this in
> their public API. I.e. there is no way to use them to detect when GC
> happens.
>
> / Jonas
>
> On Wed, Feb 17, 2016 at 5:23 AM, Joris van der Wel
> <jo...@jorisvanderwel.com> wrote:
>> Here is an example of using a NodeIterator:
>>
>>
>> ```
>> const jsdom = require("jsdom");
>> const document = jsdom.jsdom(`<a></a><b></b><c></c>`);
>>
>> let it = document.createNodeIterator(document.body);
>> console.log(it.nextNode().nodeName); // BODY
>> console.log(it.nextNode().nodeName); // A
>> console.log(it.nextNode().nodeName); // B
>> console.log(it.nextNode().nodeName); // C
>> console.log(it.nextNode()); // null
>>
>> it = document.createNodeIterator(document.body);
>> console.log(it.nextNode().nodeName); // BODY
>> document.body.removeChild(document.body.firstChild); // This remove
>> operation updates the internal state of the NodeIterator
>> console.log(it.nextNode().nodeName); // B
>> console.log(it.nextNode().nodeName); // C
>> console.log(it.nextNode()); // null
>> it = null;
>> ```
>>
>> In the case of NodeIterator, there are currently (read: in ES6) two
>> spec (DOM whatwg) compliant implementations possible:
>>
>> 1. Keep a history of all changes a Document has gone through, forever.
>> 2. Keep a list of all NodeIterators which have been created for a
>> Document, forever.
>>
>> jsdom uses solution #2. This not only leaks memory, but remove
>> operations become slower as more and more NodeIterator's are created.
>> (however as domenic described earlier we limit this list to 10 entries
>> by default).
>>
>> The conflict between the DOM spec and ES6 is that we can not detect if
>> a NodeIterator is still in use by code outside of jsdom:
>>
>> ```
>> it = document.createNodeIterator(document.body);
>> console.log(it.nextNode().nodeName); // BODY
>> // ... wait an hour ...
>> console.log(it.nextNode().nodeName); // A
>> it = null; // and only now we can stop updating the NodeIterator state
>> ```
>>
>> (There used to be a it.detach() method for this purpose, but this has
>> been removed from the spec.)
>>
>> Being able to keep a list of NodeIterator's weakly would be the only
>> solution if we want to avoid leaking resources.
>>
>> Weak references might also be required for MutationObserver, although
>> I've not yet looked at this feature extensively, so I could be wrong.
>> Other features which you could implement using a weak reference (like
>> in the live collections) could be implemented using ES6 Proxy instead.
>>
>> XMLHttpRequest, fetch, WebSocket, etc would even require a something
>> similar to a phantom reference (like in java) so that we can close the
>> connection when the object is no longer strongly or weakly referenced.
>>
>> I would also really like to use weak references not just for jsdom,
>> there are some uses cases where they can simplify my code.
>>
>> Gr. Joris
>>
>>
>> On Wed, Feb 17, 2016 at 9:41 AM, Jonas Sicking <jo...@sicking.cc> wrote:
>>>
>>> On Tue, Feb 16, 2016 at 11:02 PM, Domenic Denicola <d...@domenic.me> wrote:
>>> >> For each NodeIterator object iterator whose root’s node document is 
>>> >> node’s node document, run the NodeIterator pre-removing steps given node 
>>> >> and iterator.
>>> >
>>> > Rephrased: every time you remove a Node from a document, you must go 
>>> > through all of the document's NodeIterators and run some cleanup steps 
>>> > (which have the effect of changing observable properties and behavior of 
>>> > the NodeIterator).
>>>
>>> Could you implement all of this using MutationObservers? I.e. have the
>>> NodeIterators observe the relevant nodes using MutationObservers?
>>>
>>> The only case that I can think of where the DOM could use weak
>>> references is for the getElementsByTagName(x) function. This function
>>> will either return a new NodeList object, or an existing one. The
>>> reason it sometimes returns an existing one is for performance
>>> reasons. We saw a lot of code doing:
>>>
>>> var i;
>>> for (i = 0; i < document.getElementsByTagName("div").length; i++) {
>>>   var elem = document.getElementsByTagName("div")[i];
>>>   doStuffWith(elem);
>>> }
>>>
>>> This generated a ton of NodeList objects, which are expensive to
>>> allocate. Hence browsers started caching these objects and returned an
>>> existing object "sometimes".
>>>
>>> The gecko implementation of "sometimes" uses a hash map keyed on
>>> tagname containing weak references to the returned NodeList. This is
>>> observable by for example doing:
>>>
>>> document.getElementsByTagName("div").foopy = "foopy";
>>> if (document.getElementsByTagName("div").foopy != "foopy") {
>>>   // GC ran between the getElementsByTagName calls.
>>> }
>>>
>>> However this exact behavior is not defined by spec. But I believe that
>>> all major browsers do do something similar for performance reasons.
>>> (This API is as old as it is crummy. And it is no surprise that it is
>>> poorly used).
>>>
>>> But it likely would be possible to write an implementation of
>>> "sometimes" which doesn't use weak references, at the cost of higher
>>> memory usage.
>>>
>>> / Jonas
>>
>>
>>
>>
>> --
>> github.com/Joris-van-der-Wel



-- 
github.com/Joris-van-der-Wel
_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Weak Reference proposal

Reply via email to