On 12/7/17 5:03 PM, Martin Buchholz wrote:
(I'm still trying to love this new API)

The changes to the jsr166 tck are fine.

I'm not convinced that the new implementation for ArrayList is progress.  The current implementation of toArray(T[]) does

             // Make a new array of a's runtime type, but my contents:
             return (T[]) Arrays.copyOf(elementData, size, a.getClass());

which does not have to pay the cost of zeroing the array, so is potentially faster.  (depends on whether the VM provides cheap pre-zeroed memory?! what does jmh say?)

If we're not getting type safety and not getting significantly better performance, all we have is a more natural API.  But toArray(IntFunction) also seems not very natural to me, and we'll have to live with the toArray(new String[0]) wart forever anyways.  (But is it really so bad?)  (Maybe toArray(Class<componentType>) is actually more natural?)

A lot of ideas flying around here. (Thanks John! :-)) But for this message let me focus on performance.

I think implicitly there's some concern about the new API and whether it might suffer inherent performance disadvantages compared to the existing APIs. It certainly doesn't seem to provide any inherent advantages. I spent some time doing benchmarking more rigorously, and I was able to get similar results to those from Alexey Shipilëv's article:

https://shipilev.net/blog/2016/arrays-wisdom-ancients/

That is, I was able to discern the difference between the copy-to-Object[] case, the copy to presized array case, and copy to zero-sized array case. Also added a potential new API toArray(Class<T>) for comparison.

Here are the results. For brevity, I ran only the tests with an ArrayList of size 1000:

Benchmark              (size)     (type)  Mode  Cnt     Score    Error  Units
ToArrayBench.clazz       1000  arraylist  avgt   30  1423.924 ± 11.437  ns/op
ToArrayBench.clazz0      1000  arraylist  avgt   30  1173.442 ± 11.614  ns/op
ToArrayBench.clazzDf     1000  arraylist  avgt   30  1179.648 ± 17.662  ns/op
ToArrayBench.lambda      1000  arraylist  avgt   30  1421.910 ± 21.320  ns/op
ToArrayBench.lambda0     1000  arraylist  avgt   30  1168.863 ± 11.109  ns/op
ToArrayBench.lambdaDf    1000  arraylist  avgt   30  1168.372 ±  9.512  ns/op
ToArrayBench.simple      1000  arraylist  avgt   30   462.371 ±  8.693  ns/op
ToArrayBench.sized       1000  arraylist  avgt   30  1417.483 ± 11.451  ns/op
ToArrayBench.zero        1000  arraylist  avgt   30  1182.932 ± 27.773  ns/op

Legend:

clazz - ArrayList override of toArray(Class<T>)

    T[] a = (T[])java.lang.reflect.Array.newInstance(Foo.class, size);
    System.arraycopy(elementData, 0, a, 0, size);

clazz0 - ArrayList override of toArray(Class<T>)

    T[] a = (T[])java.lang.reflect.Array.newInstance(clazz, 0);
    return (T[]) Arrays.copyOf(elementData, size, a.getClass());

classDf - Collection default method toArray(Class<T>)

    toArray((T[])java.lang.reflect.Array.newInstance(clazz, 0))

lambda - ArrayList override of toArray(IntFunc)

    T[] a = generator.apply(size);
    System.arraycopy(elementData, 0, a, 0, size);

lambda0 - ArrayList override of toArray(IntFunc)

    (T[]) Arrays.copyOf(elementData, size, generator.apply(0).getClass())

lambdaDf - Collection default method toArray(IntFunc)

    toArray(generator.apply(0))

simple - existing no-arg toArray() method

sized - existing toArray(T[]) called with presized array

    coll.toArray(new Foo[coll.size()])

zero - existing toArray(T[]) called with zero-sized array

    coll.toArray(new Foo[0])

----------

A couple observations from this. First, "lambda", the ArrayList override of toArray(IntFunc), is slower than "lambda0" or "lambdaDf", both of which create zero-length arrays and pass them elsewhere. So Martin, you're right, the override I put into ArrayList isn't buying anything. I'll take it out. Oddly enough, just inheriting the default might be sufficient.

(And the same probably applies to Arrays$ArrayList as well.)

Second, setting aside "simple" case (which is fast because doesn't have to do any array store checking) we see performance falling into two buckets: one that takes around 1400ns, and the other that takes around 1170ns. It turns out that the faster cases all end up calling the Arrays.copyOf() method. This is an intrinsic that can avoid the work of zeroing the array, because it allocates the array immediately before filling it with data from the source.

I would have thought that allocating the array immediately prior to filling it with System.arraycopy would have done the same, but apparently not.

We can see the effect of intrinsics by disabling the Arrays.copyOf intrinsic by applying the -XX:+UnlockDiagnosticVMOptions -XX:DisableIntrinsic=_copyOf options:

Benchmark              (size)     (type)  Mode  Cnt     Score    Error  Units
ToArrayBench.clazz       1000  arraylist  avgt   30  1413.922 ±  9.570  ns/op
ToArrayBench.clazz0      1000  arraylist  avgt   30  1427.857 ± 13.669  ns/op
ToArrayBench.clazzDf     1000  arraylist  avgt   30  1448.274 ± 15.734  ns/op
ToArrayBench.lambda      1000  arraylist  avgt   30  1423.556 ± 17.312  ns/op
ToArrayBench.lambda0     1000  arraylist  avgt   30  1422.177 ± 20.650  ns/op
ToArrayBench.lambdaDf    1000  arraylist  avgt   30  1464.320 ± 26.199  ns/op
ToArrayBench.simple      1000  arraylist  avgt   30   399.856 ±  4.893  ns/op
ToArrayBench.sized       1000  arraylist  avgt   30  1417.668 ±  8.979  ns/op
ToArrayBench.zero        1000  arraylist  avgt   30  1438.527 ± 11.521  ns/op

Here, we can see that everything (except "simple") has moved into the 1400ns range. (Not sure why "simple" appears to have gotten faster.)

My conclusion from this is that the choice of one API over another doesn't offer any inherent performance advantage. You can make any of the APIs go fast if you can arrange for it to call an intrinsic, even if this ends up allocating an "extra" zero-length array.

So the issue is really about which API we want to expose. I'll reserve that discussion for another message.

s'marks

Reply via email to