Re: The non-deterministic iteration order of Immutable Collections
Hi Stuart, First, thanks for you detailed reply. On 25/03/2023 00:16, Stuart Marks wrote: ... Yes, this has come up before, but it's been mostly theoretical. That is, people worry about this when they hear of the idea of randomized iteration order, but I've never heard any followup. This is in fact the first time I've heard of an actual case where this is a real problem. So thanks for bringing it up. Here is a link to the GH issue has some more details, and links to further issues and PR's. https://github.com/elastic/elasticsearch/issues/94946 (And unfortunately, it's notoriously difficult to make code truly iteration-order dependent. We've had our own history of problems with this in the JDK. I'd be interested in hearing from you at some point about the exact pathology of how this occurred.) There's currently no debugging or experimental interface that will let one print or set the random seed that's in use. The obvious approach of using an agent to hack away at runtime doesn't work, because the unmodifiable collections implementations are used very early in startup and they're usually loaded before the agent runs. There are limitations on what an agent can do when redefining an already-loaded class, for example, removing the 'final' modifier from fields isn't supported. (Well I suppose one can always use Unsafe.) Yes, unfortunately that is one of the options that we're considering :-( Here's another approach that might work for you. 1. Write a little program that extracts the class bytes of ImmutableCollection.class from the runtime image, and use something like ASM or ClassFile to make the class public and to make the REVERSE and SALT32L fields public and non-final. 2. Launch a VM using the --patch-module option to use this class instead of the built-in one. 3. Write an agent, or some debugging library that's called at the right time, to use simple reflective access to get or set those fields as desired. This is a bit fiddly but it might be easier than rebuilding and deploying a custom JDK. Yes, certainly a bit fiddly, and maybe a little fragile - we currently support compiling on JDK 17, with a strong preference of running on the latest JDK ( currently JDK 20 ). It could be that we'd need to maintain a version of this for 17 through 20 ( and 21, as we test with EA builds of 21 ). Setting (formerly) final fields after the VM is initialized is often somewhat risky. In this case these particular fields are used only during iteration, and they don't actually change the layout of any data structures. So setting them to some desired value should apply to all subsequent iterations, even of existing data structures. I'll think about better ways to do this in the product. The best approach isn't obvious. The typical way of doing things like this using system properties is tricky, as it depends on order of class initialization at startup (and you know how fragile that is). Yeah, the set of possible solutions is somewhat curtailed, but "the simpler the better"! ;-) Something like a system property would be "good enough", if the initialization order could be enforced. Thanks, -Chris.
Re: The non-deterministic iteration order of Immutable Collections
If you need order using List instead of Set/Map is perfectly fine decision of your students. It’s good if they learn that early. :) -- http://bernd.eckenfels.net Von: core-libs-dev im Auftrag von Remi Forax Gesendet: Sonntag, März 26, 2023 1:42 PMAn: Jens Lideström Cc: core-libs-dev Betreff: Re: The non-deterministic iteration order of Immutable Collections - Original Message - > From: "Jens Lideström" > To: "core-libs-dev" > Sent: Sunday, March 26, 2023 11:38:07 AM > Subject: Re: The non-deterministic iteration order of Immutable Collections > I think Map#of and friends would be more useful and less error prone if they > where to return collections that have a fixed iteration order, where the order > is defined by the insertion order when the map is created. I agree. They are several use cases for Set.of()/Map.of(), for testing you want them to not have an order but for a defensive copy (Set.copyOf()/Map.copyOf()) you want them to keep the order. Currently, the implementation rotates toward the former instead of the later. A lot of my students struggle with the semantics of Set.of()/Map.of() because this choice makes the unmodifiable set/map not beginners friendly. To the point where a student will prefer to use a List instead of a Set, because List.copyOf() seems to work "correctly" compared to Set.copyOf(). Python 3.7 has changed the implementation of its set and dictionaries to keep the insertion order for this exact reason. Java could do the same. regards, Rémi
Re: The non-deterministic iteration order of Immutable Collections
On 26/03/2023 10:38, Jens Lideström wrote: : I encountered this as a problem when I tried to generate a module dependency graph using jdeps. I had my classpath wrong and got an error message. The error message contained a list of modules or jar-files (I don't remember the details, unfortunately). If you do remember, then please report a bug as this sounds like an issue with jdeps rather than an issue with the design choice that the iteration order of unmodifable maps be unspecified. -Alan.
Re: The non-deterministic iteration order of Immutable Collections
> The problem is that everyone will have to pay for the extra storage cost for maintaining insertion order. Even if people don't need it. I understand that. The main point of my message is that I think it is worth the cost, that a fixed iteration order should be the default, and that it should be considered to be a niche use case to prioritize the memory savings of a random iteration order. This use case should be catered to by third-party libraries or alternative API:s. These are a summary of the arguments (which I failed to provide in my original message): * Many, probably most, uses of maps in Java benefit from a fixed iteration order. * It is easy to get bugs or confusing behaviour by mistake when random iteration order is the default. * The memory cost of an insertion-based iteration order is not very high. * The Guava developers, and, as Remi notes, the Python developers, have also drawn this conclusion. (Argument from authority.) Best regard, Jens Lideström On 2023-03-26 12:48, Kasper Nielsen wrote: On Sun, 26 Mar 2023 at 10:38, Jens Lideström wrote: I think Map#of and friends would be more useful and less error prone if they where to return collections that have a fixed iteration order, where the order is defined by the insertion order when the map is created. The problem is that everyone will have to pay for the extra storage cost for maintaining insertion order. Even if people don't need it. Hence, why I suggested adding factory/copyOf methods for SequencedMap/SequencedSet for those people that need to rely on order. (Rereading Chris's email I know it wouldn't help his use-case). /Kasper
Re: The non-deterministic iteration order of Immutable Collections
- Original Message - > From: "Jens Lideström" > To: "core-libs-dev" > Sent: Sunday, March 26, 2023 11:38:07 AM > Subject: Re: The non-deterministic iteration order of Immutable Collections > I think Map#of and friends would be more useful and less error prone if they > where to return collections that have a fixed iteration order, where the order > is defined by the insertion order when the map is created. I agree. They are several use cases for Set.of()/Map.of(), for testing you want them to not have an order but for a defensive copy (Set.copyOf()/Map.copyOf()) you want them to keep the order. Currently, the implementation rotates toward the former instead of the later. A lot of my students struggle with the semantics of Set.of()/Map.of() because this choice makes the unmodifiable set/map not beginners friendly. To the point where a student will prefer to use a List instead of a Set, because List.copyOf() seems to work "correctly" compared to Set.copyOf(). Python 3.7 has changed the implementation of its set and dictionaries to keep the insertion order for this exact reason. Java could do the same. regards, Rémi
Re: The non-deterministic iteration order of Immutable Collections
On Sun, 26 Mar 2023 at 10:38, Jens Lideström wrote: > > I think Map#of and friends would be more useful and less error prone if they > where to return collections that have a fixed iteration order, where the > order is defined by the insertion order when the map is created. The problem is that everyone will have to pay for the extra storage cost for maintaining insertion order. Even if people don't need it. Hence, why I suggested adding factory/copyOf methods for SequencedMap/SequencedSet for those people that need to rely on order. (Rereading Chris's email I know it wouldn't help his use-case). /Kasper
Re: The non-deterministic iteration order of Immutable Collections
I think Map#of and friends would be more useful and less error prone if they where to return collections that have a fixed iteration order, where the order is defined by the insertion order when the map is created. ### My experience with jdeps I encountered this as a problem when I tried to generate a module dependency graph using jdeps. I had my classpath wrong and got an error message. The error message contained a list of modules or jar-files (I don't remember the details, unfortunately). The relevant part of this story for collection iteration order is this: Each time I re-run the jdeps command I got a different list of modules in the error message. It took me a long time and a lot of confusion before I realised that it was actually not a different list of modules, but instead the same list, in different order. I never came around to verify it, but a likely explanation for this confusing behaviour is that jdeps used Map#ofEntries internally, and printed the entries in the error message, which gave a new random order for each program execution. Map#of probably works fine for the main functionality of jdeps, which probably is to use the map for lookups, but for error message printouts the result was confusing. ### Map encounter order in general I think a large part of all maps that are used in Java applications end up being iterated over in some situations: Either they are displayed in a GUI, or serialised and sent over a network connection, or, as in my example, being displayed in an error message. In neither of these cases is it a strait-out bug to use a random iteration order, but in all of them it is likely bad and confusing to do so. Because of this I think that general purpose maps should by default preserve the insertion order of the entries. Not preserving insertion order should be considered to be a niche use case, used only to optimize memory footprint when the users opt in to it because they decide that it is acceptable. ### Other collection frameworks The Guava immutable map factories create implementations that preserve the iteration order: https://github.com/google/guava/wiki/ImmutableCollectionsExplained#how https://guava.dev/releases/23.0/api/docs/com/google/common/collect/ImmutableMap.Builder.html Because of the iteration order issue of Map#of I have continued to used these. ### Memory cost to store insertion order The Guava RegularImmutableMap uses a separate array of references to map entries to define the iteration order. This implementation gives an extra memory overhead per entry of one reference. The size of a map using the standard library immutable map implementation, MapN, is approximately the following. (MapN uses a memory efficient implementation which avoids the need to allocate entry objects.) Size per entry, measured in the number of references that are used: 1 - Reference to key 1 - Reference to value 1 - Hash table extra space at 66 % load 1 - Key object header 1 - Approximate size contents of a small key object 1 - Value object header 1 - Approximate size contents of a small value object Total: 7 The addition of one extra reference for tracking the insertion order gives a memory increase of 1/7 = 14 % in this scenario. To me this sounds like a good trade-off for a map with better behaviour. Best regards, Jens Lideström Java and Mumps programming enthusiast On 2023-03-25 01:16, Stuart Marks wrote: On 3/24/23 10:18 AM, Chris Hegarty wrote: I know that this has come up a number of times before, but I cannot seem to find anything directly relevant to my use-case, or recent. The iteration order of immutable Sets and Maps (created by the `of` factories) is clearly unspecified - good. Code should not depend upon this iteration order, and if so that code is broken. I completely agree and am aligned with this principle. The Elasticsearch code base heavily uses these collection implementations, and their iterators. On occasion we run into issues, where our code (because of a bug or a race), inadvertently has a dependency on the iteration order. To be clear, this is a bug in our code, and we want to reproduce and fix it. The problem we are encountering is that the iteration order is not determinable or reproducible, which makes it extremely challenging to reproduce the bug even when we have reproducible testcase to hand. ( whereas, for example, it is common practice in other parts of the code to log a seed used for randomization ) We very much like using the immutable collections, and we definitely don't want to pay the price of sorting things in the code, since we don't want to, and should not, depend upon the iteration order. The only concern is with reproducibility when we run into (our own) bugs. I don't (yet) want to be prescriptive in any potential solution. And I know that this has been discussed before. I mostly just want to start a conversation and see how much traction it gets. Hi
Re: The non-deterministic iteration order of Immutable Collections
On 3/24/23 10:53 AM, Kasper Nielsen wrote: Would java.util.SequencedMap.of(...) java.util.SequencedMap.copyOf(SequencedMap map) java.util.SequencedSet.of(...) java.util.SequencedSet.copyOf(SequencedSet set) solve your problem? I would love to see them included in JEP 431. Should be fairly simply to implement just have a side array that maintains the elements in order. These are fine ideas, but they won't be part of JEP 431. They might be reasonable as a future enhancement, though. s'marks
Re: The non-deterministic iteration order of Immutable Collections
On 3/24/23 10:18 AM, Chris Hegarty wrote: I know that this has come up a number of times before, but I cannot seem to find anything directly relevant to my use-case, or recent. The iteration order of immutable Sets and Maps (created by the `of` factories) is clearly unspecified - good. Code should not depend upon this iteration order, and if so that code is broken. I completely agree and am aligned with this principle. The Elasticsearch code base heavily uses these collection implementations, and their iterators. On occasion we run into issues, where our code (because of a bug or a race), inadvertently has a dependency on the iteration order. To be clear, this is a bug in our code, and we want to reproduce and fix it. The problem we are encountering is that the iteration order is not determinable or reproducible, which makes it extremely challenging to reproduce the bug even when we have reproducible testcase to hand. ( whereas, for example, it is common practice in other parts of the code to log a seed used for randomization ) We very much like using the immutable collections, and we definitely don't want to pay the price of sorting things in the code, since we don't want to, and should not, depend upon the iteration order. The only concern is with reproducibility when we run into (our own) bugs. I don't (yet) want to be prescriptive in any potential solution. And I know that this has been discussed before. I mostly just want to start a conversation and see how much traction it gets. Hi Chris, Yes, this has come up before, but it's been mostly theoretical. That is, people worry about this when they hear of the idea of randomized iteration order, but I've never heard any followup. This is in fact the first time I've heard of an actual case where this is a real problem. So thanks for bringing it up. (And unfortunately, it's notoriously difficult to make code truly iteration-order dependent. We've had our own history of problems with this in the JDK. I'd be interested in hearing from you at some point about the exact pathology of how this occurred.) There's currently no debugging or experimental interface that will let one print or set the random seed that's in use. The obvious approach of using an agent to hack away at runtime doesn't work, because the unmodifiable collections implementations are used very early in startup and they're usually loaded before the agent runs. There are limitations on what an agent can do when redefining an already-loaded class, for example, removing the 'final' modifier from fields isn't supported. (Well I suppose one can always use Unsafe.) Here's another approach that might work for you. 1. Write a little program that extracts the class bytes of ImmutableCollection.class from the runtime image, and use something like ASM or ClassFile to make the class public and to make the REVERSE and SALT32L fields public and non-final. 2. Launch a VM using the --patch-module option to use this class instead of the built-in one. 3. Write an agent, or some debugging library that's called at the right time, to use simple reflective access to get or set those fields as desired. This is a bit fiddly but it might be easier than rebuilding and deploying a custom JDK. Setting (formerly) final fields after the VM is initialized is often somewhat risky. In this case these particular fields are used only during iteration, and they don't actually change the layout of any data structures. So setting them to some desired value should apply to all subsequent iterations, even of existing data structures. I'll think about better ways to do this in the product. The best approach isn't obvious. The typical way of doing things like this using system properties is tricky, as it depends on order of class initialization at startup (and you know how fragile that is). s'marks
Re: The non-deterministic iteration order of Immutable Collections
> >> > >> I don't (yet) want to be prescriptive in any potential solution. And I > >> know that this has been discussed before. I mostly just want to start a > >> conversation and see how much traction it gets. > >> > > Would > > java.util.SequencedMap.of(...) > > java.util.SequencedMap.copyOf(SequencedMap map) > > java.util.SequencedSet.of(...) > > java.util.SequencedSet.copyOf(SequencedSet set) > > solve your problem? > > > > I would love to see them included in JEP 431. > > Should be fairly simply to implement just have a side > > array that maintains the elements in order. > > I do not think SequenceMap or SequencedSet should be use for copyOf(), Map > and Collection should be used instead because the iteration order is enough. Agreed, I realized it after having sent the mail. /Kasper
Re: The non-deterministic iteration order of Immutable Collections
maybe you can use some java agent to hack in the of function, and change it to some strange order, to make you see where goes wrong easier? Remi Forax 于2023年3月25日周六 06:13写道: > - Original Message - > > From: "Kasper Nielsen" > > To: "Chris Hegarty" > > Cc: "core-libs-dev" > > Sent: Friday, March 24, 2023 6:53:51 PM > > Subject: Re: The non-deterministic iteration order of Immutable > Collections > > >> > >> I don't (yet) want to be prescriptive in any potential solution. And I > >> know that this has been discussed before. I mostly just want to start a > >> conversation and see how much traction it gets. > >> > > Would > > java.util.SequencedMap.of(...) > > java.util.SequencedMap.copyOf(SequencedMap map) > > java.util.SequencedSet.of(...) > > java.util.SequencedSet.copyOf(SequencedSet set) > > solve your problem? > > > > I would love to see them included in JEP 431. > > Should be fairly simply to implement just have a side > > array that maintains the elements in order. > > I do not think SequenceMap or SequencedSet should be use for copyOf(), Map > and Collection should be used instead because the iteration order is enough. > > This is sadly not something acknowledged by JEP 431, but SequencedSet or > SequencedMap are not very useful as interfaces for typing the parameters of > public methods. Like NavigableSet/NavigableMap, they can be handy for > typing a specific implementation for a field or a local variable but for > methods, they are not general enough. > > > > > /Kasper > > regards, > Rémi >
Re: The non-deterministic iteration order of Immutable Collections
- Original Message - > From: "Kasper Nielsen" > To: "Chris Hegarty" > Cc: "core-libs-dev" > Sent: Friday, March 24, 2023 6:53:51 PM > Subject: Re: The non-deterministic iteration order of Immutable Collections >> >> I don't (yet) want to be prescriptive in any potential solution. And I >> know that this has been discussed before. I mostly just want to start a >> conversation and see how much traction it gets. >> > Would > java.util.SequencedMap.of(...) > java.util.SequencedMap.copyOf(SequencedMap map) > java.util.SequencedSet.of(...) > java.util.SequencedSet.copyOf(SequencedSet set) > solve your problem? > > I would love to see them included in JEP 431. > Should be fairly simply to implement just have a side > array that maintains the elements in order. I do not think SequenceMap or SequencedSet should be use for copyOf(), Map and Collection should be used instead because the iteration order is enough. This is sadly not something acknowledged by JEP 431, but SequencedSet or SequencedMap are not very useful as interfaces for typing the parameters of public methods. Like NavigableSet/NavigableMap, they can be handy for typing a specific implementation for a field or a local variable but for methods, they are not general enough. > > /Kasper regards, Rémi
Re: The non-deterministic iteration order of Immutable Collections
Check these: * https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/Map.html#unmodifiable: "The iteration order of mappings is unspecified and is subject to change." * https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/Set.html#unmodifiable: "The iteration order of set elements is unspecified and is subject to change." Unlike HashSet, the implementation will have a different iteration order for the same content if you restart the JVM, as stated here: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/ImmutableCollections.java#L54 In other words, it not only discourages callers to not rely on iteration order, it punishes them; any code that relies on it will break after the JVM is restarted. The problem the Elasticsearch team has is *finding* these errors. On 24/03/2023 18:53, Holo The Sage Wolf wrote: It is highly dependent on the exact implementation, but the situation you are describing us weird. In most cases the order *is* deterministic for a read-only view (which is the case in, for example, HashMap (and hence for HashSet)). In a deterministic implementation the only cases where the iteration order will change is if you modify the collection, change the implementation (e.g. downcast, or change version), or, in rare cases, when you copy the collection. If you are using non-determinsitic implementation (which I do not know if such implementation exists in the std) then the means of reproduction would probably comes down to set a seed correctly On Fri, Mar 24, 2023, 20:19 Chris Hegarty wrote: I know that this has come up a number of times before, but I cannot seem to find anything directly relevant to my use-case, or recent. The iteration order of immutable Sets and Maps (created by the `of` factories) is clearly unspecified - good. Code should not depend upon this iteration order, and if so that code is broken. I completely agree and am aligned with this principle. The Elasticsearch code base heavily uses these collection implementations, and their iterators. On occasion we run into issues, where our code (because of a bug or a race), inadvertently has a dependency on the iteration order. To be clear, this is a bug in our code, and we want to reproduce and fix it. The problem we are encountering is that the iteration order is not determinable or reproducible, which makes it extremely challenging to reproduce the bug even when we have reproducible testcase to hand. ( whereas, for example, it is common practice in other parts of the code to log a seed used for randomization ) We very much like using the immutable collections, and we definitely don't want to pay the price of sorting things in the code, since we don't want to, and should not, depend upon the iteration order. The only concern is with reproducibility when we run into (our own) bugs. I don't (yet) want to be prescriptive in any potential solution. And I know that this has been discussed before. I mostly just want to start a conversation and see how much traction it gets. Thanks, -Chris.
Re: The non-deterministic iteration order of Immutable Collections
https://github.com/openjdk/jdk/blob/97649489d078a3aa34a73e7f686e507f34155788/src/java.base/share/classes/java/util/ImmutableCollections.java#L76-L79 Looking at these, it seems you can add `-Xshare:dump` to VM arguments to obtain a deterministic iteration order for the same JDK builds. This can help you reproduce the problem reliably. On Fri, Mar 24, 2023 at 1:02 PM Kasper Nielsen wrote: > > > > > I don't (yet) want to be prescriptive in any potential solution. And I > > know that this has been discussed before. I mostly just want to start a > > conversation and see how much traction it gets. > > > Would > java.util.SequencedMap.of(...) > java.util.SequencedMap.copyOf(SequencedMap map) > java.util.SequencedSet.of(...) > java.util.SequencedSet.copyOf(SequencedSet set) > solve your problem? > > I would love to see them included in JEP 431. > Should be fairly simply to implement just have a side > array that maintains the elements in order. > > /Kasper
Re: The non-deterministic iteration order of Immutable Collections
> > I don't (yet) want to be prescriptive in any potential solution. And I > know that this has been discussed before. I mostly just want to start a > conversation and see how much traction it gets. > Would java.util.SequencedMap.of(...) java.util.SequencedMap.copyOf(SequencedMap map) java.util.SequencedSet.of(...) java.util.SequencedSet.copyOf(SequencedSet set) solve your problem? I would love to see them included in JEP 431. Should be fairly simply to implement just have a side array that maintains the elements in order. /Kasper
Re: The non-deterministic iteration order of Immutable Collections
It is highly dependent on the exact implementation, but the situation you are describing us weird. In most cases the order *is* deterministic for a read-only view (which is the case in, for example, HashMap (and hence for HashSet)). In a deterministic implementation the only cases where the iteration order will change is if you modify the collection, change the implementation (e.g. downcast, or change version), or, in rare cases, when you copy the collection. If you are using non-determinsitic implementation (which I do not know if such implementation exists in the std) then the means of reproduction would probably comes down to set a seed correctly On Fri, Mar 24, 2023, 20:19 Chris Hegarty wrote: > I know that this has come up a number of times before, but I cannot seem > to find anything directly relevant to my use-case, or recent. > > The iteration order of immutable Sets and Maps (created by the `of` > factories) is clearly unspecified - good. Code should not depend upon > this iteration order, and if so that code is broken. I completely agree > and am aligned with this principle. > > The Elasticsearch code base heavily uses these collection > implementations, and their iterators. On occasion we run into issues, > where our code (because of a bug or a race), inadvertently has a > dependency on the iteration order. To be clear, this is a bug in our > code, and we want to reproduce and fix it. The problem we are > encountering is that the iteration order is not determinable or > reproducible, which makes it extremely challenging to reproduce the bug > even when we have reproducible testcase to hand. ( whereas, for example, > it is common practice in other parts of the code to log a seed used for > randomization ) > > We very much like using the immutable collections, and we definitely > don't want to pay the price of sorting things in the code, since we > don't want to, and should not, depend upon the iteration order. The only > concern is with reproducibility when we run into (our own) bugs. > > I don't (yet) want to be prescriptive in any potential solution. And I > know that this has been discussed before. I mostly just want to start a > conversation and see how much traction it gets. > > Thanks, > -Chris. >