Re: The non-deterministic iteration order of Immutable Collections

Jens Lideström Sun, 26 Mar 2023 02:39:08 -0700

I think Map#of and friends would be more useful and less error prone if they 
where to return collections that have a fixed iteration order, where the order 
is defined by the insertion order when the map is created.

### My experience with jdeps

I encountered this as a problem when I tried to generate a module dependency
graph using jdeps. I had my classpath wrong and got an error message. The error
message contained a list of modules or jar-files (I don't remember the details,
unfortunately).

The relevant part of this story for collection iteration order is this: Each
time I re-run the jdeps command I got a different list of modules in the error
message.

It took me a long time and a lot of confusion before I realised that it was
actually not a different list of modules, but instead the same list, in
different order.

I never came around to verify it, but a likely explanation for this confusing
behaviour is that jdeps used Map#ofEntries internally, and printed the entries
in the error message, which gave a new random order for each program execution.

Map#of probably works fine for the main functionality of jdeps, which probably
is to use the map for lookups, but for error message printouts the result was
confusing.

### Map encounter order in general

I think a large part of all maps that are used in Java applications end up
being iterated over in some situations: Either they are displayed in a GUI, or
serialised and sent over a network connection, or, as in my example, being
displayed in an error message.

In neither of these cases is it a strait-out bug to use a random iteration
order, but in all of them it is likely bad and confusing to do so.

Because of this I think that general purpose maps should by default preserve
the insertion order of the entries.

Not preserving insertion order should be considered to be a niche use case,
used only to optimize memory footprint when the users opt in to it because they
decide that it is acceptable.

### Other collection frameworks

The Guava immutable map factories create implementations that preserve the
iteration order:

https://github.com/google/guava/wiki/ImmutableCollectionsExplained#how

https://guava.dev/releases/23.0/api/docs/com/google/common/collect/ImmutableMap.Builder.html

Because of the iteration order issue of Map#of I have continued to used these.

### Memory cost to store insertion order

The Guava RegularImmutableMap uses a separate array of references to map
entries to define the iteration order. This implementation gives an extra
memory overhead per entry of one reference.

The size of a map using the standard library immutable map implementation,
MapN, is approximately the following. (MapN uses a memory efficient
implementation which avoids the need to allocate entry objects.)

Size per entry, measured in the number of references that are used:

1 - Reference to key
1 - Reference to value
1 - Hash table extra space at 66 % load

1 - Key object header
1 - Approximate size contents of a small key object
1 - Value object header
1 - Approximate size contents of a small value object

Total: 7

The addition of one extra reference for tracking the insertion order gives a
memory increase of 1/7 = 14 % in this scenario. To me this sounds like a good
trade-off for a map with better behaviour.

Best regards,
Jens Lideström
Java and Mumps programming enthusiast

On 2023-03-25 01:16, Stuart Marks wrote:



On 3/24/23 10:18 AM, Chris Hegarty wrote:

I know that this has come up a number of times before, but I cannot seem to 
find anything directly relevant to my use-case, or recent.

The iteration order of immutable Sets and Maps (created by the `of` factories) 
is clearly unspecified - good. Code should not depend upon this iteration 
order, and if so that code is broken. I completely agree and am aligned with 
this principle.

The Elasticsearch code base heavily uses these collection implementations, and 
their iterators. On occasion we run into issues, where our code (because of a 
bug or a race), inadvertently has a dependency on the iteration order. To be 
clear, this is a bug in our code, and we want to reproduce and fix it.  The 
problem we are encountering is that the iteration order is not determinable or 
reproducible, which makes it extremely challenging to reproduce the bug even 
when we have reproducible testcase to hand. ( whereas, for example, it is 
common practice in other parts of the code to log a seed used for randomization 
)

We very much like using the immutable collections, and we definitely don't want 
to pay the price of sorting things in the code, since we don't want to, and 
should not, depend upon the iteration order. The only concern is with 
reproducibility when we run into (our own) bugs.

I don't (yet) want to be prescriptive in any potential solution. And I know 
that this has been discussed before. I mostly just want to start a conversation 
and see how much traction it gets.


Hi Chris,

Yes, this has come up before, but it's been mostly theoretical. That is, people 
worry about this when they hear of the idea of randomized iteration order, but 
I've never heard any followup. This is in fact the first time I've heard of an 
actual case where this is a real problem. So thanks for bringing it up.

(And unfortunately, it's notoriously difficult to make code truly 
iteration-order dependent. We've had our own history of problems with this in 
the JDK. I'd be interested in hearing from you at some point about the exact 
pathology of how this occurred.)

There's currently no debugging or experimental interface that will let one 
print or set the random seed that's in use. The obvious approach of using an 
agent to hack away at runtime doesn't work, because the unmodifiable 
collections implementations are used very early in startup and they're usually 
loaded before the agent runs. There are limitations on what an agent can do 
when redefining an already-loaded class, for example, removing the 'final' 
modifier from fields isn't supported. (Well I suppose one can always use 
Unsafe.)

Here's another approach that might work for you.

1. Write a little program that extracts the class bytes of 
ImmutableCollection.class from the runtime image, and use something like ASM or 
ClassFile to make the class public and to make the REVERSE and SALT32L fields 
public and non-final.

2. Launch a VM using the --patch-module option to use this class instead of the 
built-in one.

3. Write an agent, or some debugging library that's called at the right time, 
to use simple reflective access to get or set those fields as desired.

This is a bit fiddly but it might be easier than rebuilding and deploying a 
custom JDK.

Setting (formerly) final fields after the VM is initialized is often somewhat 
risky. In this case these particular fields are used only during iteration, and 
they don't actually change the layout of any data structures. So setting them 
to some desired value should apply to all subsequent iterations, even of 
existing data structures.

I'll think about better ways to do this in the product. The best approach isn't 
obvious. The typical way of doing things like this using system properties is 
tricky, as it depends on order of class initialization at startup (and you know 
how fragile that is).

s'marks

Re: The non-deterministic iteration order of Immutable Collections

Reply via email to