Re: Proposal: optimization of Map.copyOf and Collectors.toUnmodifiableMap

Peter Levart Tue, 19 Jun 2018 10:02:57 -0700

Hi Stuart,

On 06/19/2018 04:09 AM, Stuart Marks wrote:

On 6/17/18 4:23 PM, Peter Levart wrote:
My attempt to optimize the Map.copyOf() was also using just publicAPIs (with the addition of an internal class).
Well, it does speed-up Map.copyOf() by 30%. But I agree there areother approaches that could go even further. In particular when allintermediary copies could be avoided.
Here's one approach (which still uses just public APIs) and avoidsintermediary copying:
You mentioned "using just public APIs" a couple times. Are you viewingthat as a constraint? I don't think it is.

More a desire than a constraint. If there really is a solution that isbetter but needs SharedSecrets, then I'm all for it. OTOH, it surely isless maintenance when only public APIs are used, isn't it?

I'd like to use Map.forEach() which is already optimized in most JDKinternal implementations while the default implementation is not worsethan alternatives.

http://cr.openjdk.java.net/~plevart/jdk-dev/UnmodifiableMap_copyOf/webrev.02/
Why choosing Map.forEach for dumping the content of the map insteadof iteration over entrySet? Because in some Map implementations thisis the most direct iteration without producing garbage (for examplein IdentityHashMap), but in the worst case (default Map method forexample) it is delegated to iteration over entrySet.
Sure, entrySet().toArray() does a lot of allocation and copying. UsingforEach() can avoid this, but it doesn't necessarily preserve theright invariants. In particular, if the source is a concurrent map,the number of times forEach's BiConsumer is called might not matchsize(), if the map has changed size. So, the forEach approach has todeal with the possibility of it not being called exactly size() times,whereas we know that the array from toArray() can't change size (andthat its contents are consistent, for some definition of consistent).

Right, and my last approach detected that and has thrownConcurrentModificationException. Which might not be desired in ascenario where a live CHM is copied into an unmodifiable Map for example.

This change also publishes a reference to the Map under constructionbefore its constructor has returned. I'm not sure of all the memorymodel implications of this.

It publishes a Builder object that has a controlled access to theconstructing map. You can't get a reference to the constructing map fromit and it becomes useless before the constructor of the map returns.

This change also hands out an object that has access to the new Mapinstance's internals.


Yes, this is the Builder object.

You're aware of this, of course, because you snapshot the currentthread and null it out later, saying
    // prevent naughty Map implementations from modifying MapN after
    // it is fully constructed
So there are potential security vulnerabilities here, which requiressome serious thought. What you have *seems* reasonable, but myexperience is that things that are subtle but that seem reasonableactually end up having security holes.

This is not my invention. The same technique is used inObjectInputStream / ObjectOutputStream.

I'm trying to think of some alternatives.
The issue with the forEach() approach on an arbitrary map is that youhave to hand out a BiConsumer, and malicious code can call it evenafter forEach() has returned. What's necessary is to have a way toshut off the BiConsumer before reading out what it's accumulated. Idon't know of a simple, fast, and correct way to do this. (Choose anytwo!) The "enabled" state (whether a Thread instance or just aboolean) will probably have to be a volatile that must be checkedevery time. What are the performance implications? Anyway, while itseems promising, the forEach() approach seems like it leads off intothe weeds.

The technique used in my last webrev and taken from ObjectInputStream /ObjectOutputStream is simple, simple to prove correctness and fast as itneeds zero synchronization. It could be called "a single-threadedobject" idiom:


class SingleThreadedObject {
    private Thread thread;

    SingleThreadedObject() {
        thread = Thread.currentThread();
    }

    void use() {

if (Thread.currentThread() != thread) throw newIllegalStateException();


        ... anything ...
    }

    void close() {

if (Thread.currentThread() != thread) throw newIllegalStateException();

        thread = null;
    }
}

Claim: The thread that constructs the object is the only thread that canuse() or close() the object and only until close() is called by theconstructing thread. To prove the correctness of this claim, we haveconsider two scenarios:

- object is use()d and finally close()d in the same thread thatconstructed it - no need to prove anything here as it is obvious thatthis works by inspecting the code.

- object is attempted to be use()d or close()d in some other thread fromthe thread that constructed it.

To prove that such usage is always prevented, we have to ask ourselveswhat are the possible values of 'thread' field that can be seen by someother thread. There are two possibilities:- null value (uninitialized or already close()d) -> the usage isprevented as null != Thread.currentThread()- non-null value (initialized) -> the usage is prevented as the non-nullvalue is a different thread than current thread which is trying to use()or close() the object

This technique doesn't use any synchronization. Thread.currentThread()is fast (I think I heard that current thread reference or something notfar from it is kept in a CPU register all the time).

Unless you can verify the trustworthiness of the source. Suppose youcheck the source map to see if its getClass() == HashMap.class. Thenyou can be reasonably assured that forEach() won't misuse theBiConsumer. You can probably also rely on size() being stable.(Probably also include LinkedHashMap.)
The toUnmodifiableMap collectors can benefit from this if they'rechanged to use Map::copyOf, as in your current webrev.
You could also provide a similar fast path for ConcurrentHashMap. It'dhave to be resilient against size changes, but you can trust that itwon't misuse the BiConsumer.
Finally, other map implementations will just have to suffer throughthe multi-copy slow path.

Not necessarily. Here's another attempt that uses the same technique butrelaxes the possible concurrent source map scenario (where number ofpairs emitted by Map.forEach() doesn't agree with Map.size()):


http://cr.openjdk.java.net/~plevart/jdk-dev/UnmodifiableMap_copyOf/webrev.03/

The construction is moved entirely into a separate MapBuilder object.The builder is used for Map.of(...) methods too.

Here are the benchmark results that show improvements in every respecttouched by the patch:


http://cr.openjdk.java.net/~plevart/jdk-dev/UnmodifiableMap_copyOf/UnmodifiableMapBench_results.txt

Here are the JMH benchmarks used to produce these results:

http://cr.openjdk.java.net/~plevart/jdk-dev/UnmodifiableMap_copyOf/UnmodifiableMapCollectorBench.java
http://cr.openjdk.java.net/~plevart/jdk-dev/UnmodifiableMap_copyOf/UnmodifiableMapCopyOfBench.java
http://cr.openjdk.java.net/~plevart/jdk-dev/UnmodifiableMap_copyOf/UnmodifiableMapOfBench.java


So what do you think of this one?

Regards, Peter

Re: Proposal: optimization of Map.copyOf and Collectors.toUnmodifiableMap

Reply via email to