Re: RFR(m): 8139233 add initial compact immutable collection implementations
On 5/5/16 2:22 PM, Eddie Aftandilian wrote: FWIW, at Google we have a patch against our JDK that randomizes hash iteration order. For test execution we randomize with a unique seed per JVM invocation. For production we force a specific seed for all executions. This approach catches most issues during automated testing, but reduces the likelihood of an issue in production. Yes, I've heard of such a thing. From what I understand, this randomizes the iteration order of the *existing* collections like HashMap. It makes sense for this to be an opt-in feature for existing collections, since it seems pretty unwise to expose a behavioral change to large quantities of existing code. Unfortunately, from time to time the JDK does change the iteration order of the existing collections, and unsurprisingly, lots of things break. It would be interesting to see such a patch proposed for OpenJDK. It'd be good to enable it during test runs, for a start. s'marks
Re: RFR(m): 8139233 add initial compact immutable collection implementations
On 5/5/16 5:54 AM, Stephen Colebourne wrote: To be clear, I believe that the spec of Set.of() and Map.of() should require iteration order matching insertion order (because the order is very clearly defined in this case and anything else will be surprising). OK. This would be a spec change. The specification of the Set static factory methods contains the clause The iteration order of set elements is unspecified and is subject to change. There is the corresponding clause for the Map static factory methods. [1] You can certainly propose or discuss spec changes, but if you do, I'd like it to be decoupled from this code review thread. Thanks, s'marks [1] http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/e6c3d2856593
Re: RFR(m): 8139233 add initial compact immutable collection implementations
On 5/5/16 5:34 AM, Alan Bateman wrote: I understand the goal, just wondering if there is something less devious that would make sense here. One idea is to use some portion of the Version, say the build number, so that it at least changes each week or build. That would at keeping the ordering consistent from run to run but it might change when someone updates. Frankly I think we need to be more devious. We could change the iteration order on Wednesdays only, or perhaps only on full moons. :-) But really, if we're going to have randomized iteration order at all, we're better off doing it more frequently. That would flush out inadvertent order dependencies more quickly. If iteration order were to change, say, once per build, then order dependency bugs would masquerade as regressions. ("It works on b116 but fails on b117; what changed in b117?") I think that would be really confusing. s'marks
Re: RFR(m): 8139233 add initial compact immutable collection implementations
On 5/5/16 6:42 AM, Chris Hegarty wrote: I have some sympathy for this. Since we are specifying that these Collections are serializable, it would be nice for them to deserialize on an older release, but I don’t think we should compromise the serial form to achieve that. In fact, I agree with Stephen, I think we should go the same way as that of the serial form of 310 classes. So is there anything we can do for serial interoperability? What if, we agree the serial form and the private supporting implementation types in 9 ( say, some time this month ). Backport these to, at least, 8u, where they do not impact spec and are benign. JDK 9 is not scheduled to ship until March 2017, with the regular Critical Patch Updates that most vendors provide, I’d bet that a large number of 8u’s in production at that time would contain the required types to deserialize these collections. There’s no guarantee of course, but in practical terms at least we made a reasonable effort. Interesting approach. One could backport just the serial proxy and then have its readResolve() instantiate its existing collections wrapped with unmodifiable wrappers. Or, one could just backport the immutable implementations. There's not much code to backport. I guess the question is whether this is worth it. Is this something we know will happen, or would it be done "just in case" ? s'marks
Re: RFR(m): 8139233 add initial compact immutable collection implementations
On 5/5/16 4:34 AM, fo...@univ-mlv.fr wrote: The problem is that AbstractList is a public class, removing it from the hierarchy is not a backward compatible change, so it's not something that can be changed as an after through. (and AbstractSet and AbstractMap) I disagree that removing those as superclasses would be backward incompatible, because it's not specified anywhere that they're superclasses. The usual problem comes from serialization, which exposes the superclass chain as well as apparently private classes; the serialization proxy stuff should avoid these problems. If somebody were to write if (Set.of(...) instanceof AbstractSet) the behavior of their code would change, but I'm not worried about that. - in the constructor, you should use a local variable here, @SafeVarargs ListN(E... input) { // copy and check manually to avoid TOCTOU @SuppressWarnings("unchecked") E[] elements = (E[])new Object[input.length]; // implicit nullcheck of input for (int i = 0; i < input.length; i++) { elements[i] = Objects.requireNonNull(input[i]); } this.elements = elements; } Nice. Will fix. It turns out that this works well for ListN, but not SetN or MapN. 307 SetN(E... input) { 308 Objects.requireNonNull(input); 309 310 size = input.length; 311 elements = (E[])new Object[(int)Math.ceil(EXPAND_FACTOR * input.length)]; 312 for (int i = 0; i < input.length; i++) { 313 E e = Objects.requireNonNull(input[i]); 314 int idx = probe(e); 315 if (idx >= 0) { 316 throw new IllegalArgumentException("duplicate element: " + e); 317 } else { 318 elements[-(idx + 1)] = e; 319 } 320 } 321 } The problem is that, during construction, probe() needs to see the elements array and the actual elements as they're inserted. Hoisting the array into a local variable and assigning elements only at the end causes probe() to NPE. I've set aside this change for SetN and MapN. - the iterator should not be defined as inner class (with a reference to Set2) but be defined as a static class that contains a copy of e1 and e2, this will save an indirection and avoid to keep a link on the Set if it is transient. I started down this path. It was kind-of OK for Set2 and SetN. For MapN, it started to get hairy. Some of the Map's state needs to be passed to its entry set, and then it needs to be passed along to the entry set iterator. I started to hoist these into static nested classes, but the required new fields and constructors to initialize them started to clutter things up. We can revisit this if you really think this is an optimization, but for now I've set this aside. We can also revisit the restructuring of the iterator to advance the index in the constructor and in next(). s'marks
Re: RFR(m): 8139233 add initial compact immutable collection implementations
Where is the current commitment to unspecified iteration order? Is that in Java 9? Generally speaking, I see no problem with going from unspecified to specified iteration order; if code had to be able to deal with *any* order previously they can certainly deal with an order that happens to be the obvious sensible order. Furthermore, the code that is easiest to use should be the simplest, most desirable use case, and I'd definitely say that means using the predictable order the elements were provided in. Providing that as another, more awkwardly named method seems deeply undesirable: Set.of and Map.of are the short, easy to use method names, and they should have the simplest, easiest-to-understand behavior: insertion order iteration. On Thu, May 5, 2016 at 9:27 AM Peter Levart wrote: > Hi, > > > On 05/05/2016 02:54 PM, Stephen Colebourne wrote: > > On 5 May 2016 at 02:30, Stuart Marks wrote: > >>> I disagree with altering the iteration order. Guava's ImmutableSet and > >>> ImmutableMap have reliable iteration specified to match creation > >>> order. This aspect of the design is very useful. > >> > >> I think this is a reasonable thing to want, but it would be a different > API. > >> The current Set.of() and Map.of() APIs have unspecified iteration > order, and > >> I want to "enforce" that via randomizing the iteration order. Having > >> unspecified iteration order, but with the implementation giving a stable > >> order, is the worst of both worlds. It lets clients bake in inadvertent > >> order dependencies, and then when we do change it, it breaks them. (This > >> seems to happen every couple years with HashMap.) Randomizing iteration > >> order helps preserve implementation freedom. > >> > >> That said, it's a fine thing to have a different API that gives a Set > or Map > >> with a stable iteration order. Several people have asked for this. I've > >> filed JDK-8156070 to track this. > > To be clear, I believe that the spec of Set.of() and Map.of() should > > require iteration order matching insertion order (because the order is > > very clearly defined in this case and anything else will be > > surprising). > > > > Stephen > > So what if the API was specified to have iteration order matching > construction order. Are you afraid that would impact implementations so > that they would be less compact or less performant? At least for small > number of elements, there is a possible implementation that keeps > construction order and is even more compact and still has O(1) lookup > performance. For example an implementation for up to 255 elements: > > public class Set0xFF extends AbstractSet implements Serializable { > > /** > * A "salt" value used for randomizing iteration order. This is > initialized once > * and stays constant for the lifetime of the JVM. It need not be > truly random, but > * it needs to vary sufficiently from one run to the next so that > iteration order > * will vary between JVM runs. > */ > static final int SALT; > > static { > SALT = new Random().nextInt(); > } > > /** > * The reciprocal of load factor. Given a number of elements > * to store, multiply by this factor to get the table size. > */ > static final double EXPAND_FACTOR = 2.0; > > final E[] elements; > final byte[] index; > > @SafeVarargs > @SuppressWarnings("unchecked") > Set0xFF(E... input) { > if (input.length > 255) { > throw new IllegalArgumentException("To many elements for > this implementation"); > } > elements = input.clone(); > index = new byte[(int) Math.ceil(EXPAND_FACTOR * > elements.length)]; > for (int i = 0; i < elements.length; i++) { > E e = Objects.requireNonNull(elements[i]); > int idx = probe(e); > if (idx >= 0) { > throw new IllegalArgumentException("duplicate element: > " + e); > } else { > index[-(idx + 1)] = (byte) (i + 1); > } > } > } > > @Override > public int size() { > return elements.length; > } > > @Override > @SuppressWarnings("unchecked") > public boolean contains(Object o) { > Objects.requireNonNull(o); > return probe((E) o) >= 0; > } > > @Override > public Iterator iterator() { > return new Iterator() { > int idx = 0; > > @Override > public boolean hasNext() { > return idx < elements.length; > } > > @Override > public E next() { > int i = idx; > if (i >= elements.length) { > throw new NoSuchElementException(); > } > idx = i + 1; > return elements[i]; > } > }; > } > > // returns index into index arra
Re: RFR(m): 8139233 add initial compact immutable collection implementations
Hi, On 05/05/2016 02:54 PM, Stephen Colebourne wrote: On 5 May 2016 at 02:30, Stuart Marks wrote: I disagree with altering the iteration order. Guava's ImmutableSet and ImmutableMap have reliable iteration specified to match creation order. This aspect of the design is very useful. I think this is a reasonable thing to want, but it would be a different API. The current Set.of() and Map.of() APIs have unspecified iteration order, and I want to "enforce" that via randomizing the iteration order. Having unspecified iteration order, but with the implementation giving a stable order, is the worst of both worlds. It lets clients bake in inadvertent order dependencies, and then when we do change it, it breaks them. (This seems to happen every couple years with HashMap.) Randomizing iteration order helps preserve implementation freedom. That said, it's a fine thing to have a different API that gives a Set or Map with a stable iteration order. Several people have asked for this. I've filed JDK-8156070 to track this. To be clear, I believe that the spec of Set.of() and Map.of() should require iteration order matching insertion order (because the order is very clearly defined in this case and anything else will be surprising). Stephen So what if the API was specified to have iteration order matching construction order. Are you afraid that would impact implementations so that they would be less compact or less performant? At least for small number of elements, there is a possible implementation that keeps construction order and is even more compact and still has O(1) lookup performance. For example an implementation for up to 255 elements: public class Set0xFF extends AbstractSet implements Serializable { /** * A "salt" value used for randomizing iteration order. This is initialized once * and stays constant for the lifetime of the JVM. It need not be truly random, but * it needs to vary sufficiently from one run to the next so that iteration order * will vary between JVM runs. */ static final int SALT; static { SALT = new Random().nextInt(); } /** * The reciprocal of load factor. Given a number of elements * to store, multiply by this factor to get the table size. */ static final double EXPAND_FACTOR = 2.0; final E[] elements; final byte[] index; @SafeVarargs @SuppressWarnings("unchecked") Set0xFF(E... input) { if (input.length > 255) { throw new IllegalArgumentException("To many elements for this implementation"); } elements = input.clone(); index = new byte[(int) Math.ceil(EXPAND_FACTOR * elements.length)]; for (int i = 0; i < elements.length; i++) { E e = Objects.requireNonNull(elements[i]); int idx = probe(e); if (idx >= 0) { throw new IllegalArgumentException("duplicate element: " + e); } else { index[-(idx + 1)] = (byte) (i + 1); } } } @Override public int size() { return elements.length; } @Override @SuppressWarnings("unchecked") public boolean contains(Object o) { Objects.requireNonNull(o); return probe((E) o) >= 0; } @Override public Iterator iterator() { return new Iterator() { int idx = 0; @Override public boolean hasNext() { return idx < elements.length; } @Override public E next() { int i = idx; if (i >= elements.length) { throw new NoSuchElementException(); } idx = i + 1; return elements[i]; } }; } // returns index into index array at which element index + 1 is present; or if absent, // (-i - 1) where i is location where element index should be inserted private int probe(E pe) { int idx = Math.floorMod(pe.hashCode() ^ SALT, index.length); while (true) { int i = (int) index[idx] & 0xFF; if (i == 0) { return -idx - 1; } else if (elements[i - 1].equals(pe)) { return idx; } else if (++idx == index.length) { idx = 0; } } } } Your SetN occupies (without considering the elements themselves) the following number of bytes: 8 * size() (on 32 bit/compressed OOPS) or 16 * size() (on 64bit OOPS) (+ 2 object headers and alignments) Above Set0xFF occupies: 6 * size() (on 32 bit/compressed OOPS) or 10 * size() (on 64bit OOPS) (+ 3 object headers and alignments) An analogue Set0x for up to 65535 elements would occupy: 8 * size() (on 32 bit/compressed OOPS) or 12 * size() (on 64bit OOPS) (+ 3 object headers and alignments) An analogue Set0x for up to 2^31 - 1 elements that still
Re: RFR(m): 8139233 add initial compact immutable collection implementations
On 5 May 2016, at 01:21, Stuart Marks wrote: > Hi Daniel, > > On 5/4/16 3:08 AM, Daniel Fuchs wrote: >> I wonder about serial interoperability with earlier >> versions of the JDK. For instance it will not be possible to >> send a Set created with Set.of(...) in JDK 9 to a JDK 8 VM. >> I wonder if there is any good solution to that. >> You could of course use writeReplace() to return a plain >> HashSet - but then you would only get a plain HashSet on >> the other end. That might be fine if that other end >> is JDK 8, but maybe not so good if it's actually JDK 9. > > Right, in general, you can't serialize a "new" class in a JDK release and > deserialize it on an older JDK. If backward serialization compatibility were > a goal, one could serialize them as a HashSet within an unmodifiable wrapper. > But yes, deserializing that would increase the space consumption considerably. I have some sympathy for this. Since we are specifying that these Collections are serializable, it would be nice for them to deserialize on an older release, but I don’t think we should compromise the serial form to achieve that. In fact, I agree with Stephen, I think we should go the same way as that of the serial form of 310 classes. So is there anything we can do for serial interoperability? What if, we agree the serial form and the private supporting implementation types in 9 ( say, some time this month ). Backport these to, at least, 8u, where they do not impact spec and are benign. JDK 9 is not scheduled to ship until March 2017, with the regular Critical Patch Updates that most vendors provide, I’d bet that a large number of 8u’s in production at that time would contain the required types to deserialize these collections. There’s no guarantee of course, but in practical terms at least we made a reasonable effort. -Chris.
Re: RFR(m): 8139233 add initial compact immutable collection implementations
On 5 May 2016, at 01:21, Stuart Marks wrote: > Hi Daniel, > > On 5/4/16 3:08 AM, Daniel Fuchs wrote: >> I wonder about serial interoperability with earlier >> versions of the JDK. For instance it will not be possible to >> send a Set created with Set.of(...) in JDK 9 to a JDK 8 VM. >> I wonder if there is any good solution to that. >> You could of course use writeReplace() to return a plain >> HashSet - but then you would only get a plain HashSet on >> the other end. That might be fine if that other end >> is JDK 8, but maybe not so good if it's actually JDK 9. > > Right, in general, you can't serialize a "new" class in a JDK release and > deserialize it on an older JDK. If backward serialization compatibility were > a goal, one could serialize them as a HashSet within an unmodifiable wrapper. > But yes, deserializing that would increase the space consumption considerably. I have some sympathy for this. Since we are specifying that these Collections are serializable, it would be nice for them to deserialize on an older release, but I don’t think we should compromise the serial form to achieve that. In fact, I agree with Stephen, I think we should go the same way as that of the serial form of 310 classes. So is there anything we can do for serial interoperability? What if, we agree the serial form and the private supporting implementation types in 9 ( say, some time this month ). Backport these to, at least, 8u, where they do not impact spec and are benign. JDK 9 is not scheduled to ship until March 2017, with the regular Critical Patch Updates that most vendors provide, I’d bet that a large number of 8u’s in production at that time would contain the required types to deserialize these collections. There’s no guarantee of course, but in practical terms at least we made a reasonable effort. -Chris.
Re: RFR(m): 8139233 add initial compact immutable collection implementations
On 5 May 2016 at 02:30, Stuart Marks wrote: >> I disagree with altering the iteration order. Guava's ImmutableSet and >> ImmutableMap have reliable iteration specified to match creation >> order. This aspect of the design is very useful. > > > I think this is a reasonable thing to want, but it would be a different API. > The current Set.of() and Map.of() APIs have unspecified iteration order, and > I want to "enforce" that via randomizing the iteration order. Having > unspecified iteration order, but with the implementation giving a stable > order, is the worst of both worlds. It lets clients bake in inadvertent > order dependencies, and then when we do change it, it breaks them. (This > seems to happen every couple years with HashMap.) Randomizing iteration > order helps preserve implementation freedom. > > That said, it's a fine thing to have a different API that gives a Set or Map > with a stable iteration order. Several people have asked for this. I've > filed JDK-8156070 to track this. To be clear, I believe that the spec of Set.of() and Map.of() should require iteration order matching insertion order (because the order is very clearly defined in this case and anything else will be surprising). Stephen
Re: RFR(m): 8139233 add initial compact immutable collection implementations
On 05/05/2016 01:28, Stuart Marks wrote: Hi Alan, Yes, the unpredictability does introduce some risk of intermittent and hard-to-diagnose failures. On the other hand, the unspecified-but-mostly-stable iteration order we have with things like HashMap lets implicit order dependencies creep into code, which makes such failures even more rare and harder to diagnose. Plus, as hard as we try to make iteration order stable, there the times we make changes that do change the iteration order. Then, everything breaks. The goal here is to expose code that uses these collections to more frequent order changes in order to "toughen" it up by flushing out inadvertent order dependencies. That way, we can change the implementation at will without worrying about changes to iteration order. We'll see if this works. I can add an option to change the iteration order on every iteration, if you think that'll help. :-) I understand the goal, just wondering if there is something less devious that would make sense here. One idea is to use some portion of the Version, say the build number, so that it at least changes each week or build. That would at keeping the ordering consistent from run to run but it might change when someone updates. -Alan
Re: RFR(m): 8139233 add initial compact immutable collection implementations
- Mail original - > De: "Stuart Marks" > À: "Remi Forax" > Cc: "core-libs-dev" > Envoyé: Jeudi 5 Mai 2016 08:32:08 > Objet: Re: RFR(m): 8139233 add initial compact immutable collection > implementations > > On 5/4/16 1:38 AM, Remi Forax wrote: > > nice work ! > > Spoken like a true university professor: "nice work" followed by 100 lines of > criticism. :-) professional bias :) > > > first, all classes should be final otherwise, they are not truly immutable > > and all arrays should be marked as @Stable. > > OK, I'll make the classes final. > > On @Stable, I had a chat with Paul Sandoz about this, and he suggested > holding > back from adding this. Its semantics are quite particular and are subject to > change, and widespread use could easily turn into misuse. the third item of an immutable list stored in a static final should be a constant, no ? > > > List0, List1, > > why not using Collections.emptyList(), Collections.singletonList() here ? > > Serialization compatibility with future versions. I want all of these to > serialize to the same serial proxy in JDK 9, so that in JDK 9+ they can be > deserialized to whatever equivalent implementation classes exist in that > version > of the JDK. For example, there might be a single field-based implementation > that > covers a small range of sizes, or an implementation using value types. ok ! > > > ListN: > > - i don't think that extending AbstractList is a good idea here, > > AbstractList provide the failfast/modCount mecanism not needed here. > > Moreover, i think you should have some overriden methods on par with > > Arrays.asList() > > (i.e. a custom isEmpty, toArray, indexOf, contains, iterator, > > spliterator and forEach) > > otherwise people will use Arrays.asList just because it's faster :( > > The implementations all extend AbstractList/Set/Map mainly for implementation > convenience. This way it makes it easy to add and remove implementations > tailored for specific sizes, since so much code is shared with the abstract > superclasses. But they do come with some baggage, as you point out. As the > set > of implementation classes settles down, it would make more sense to override > more methods, and refactor to handle sharing, at which point maybe the > dependence on the abstract superclasses can be severed. The problem is that AbstractList is a public class, removing it from the hierarchy is not a backward compatible change, so it's not something that can be changed as an after through. > > > - in the constructor, you should use a local variable here, > > @SafeVarargs > > ListN(E... input) { > > // copy and check manually to avoid TOCTOU > > @SuppressWarnings("unchecked") > > E[] elements = (E[])new Object[input.length]; // implicit > > nullcheck of input > > for (int i = 0; i < input.length; i++) { > > elements[i] = Objects.requireNonNull(input[i]); > > } > > this.elements = elements; > > } > > Nice. Will fix. > > > List2: > > - same as for ListN, should not inherits from AbstractList. > > - i wonder why iterator() is not overriden like with Set2. > > Bringing up a List implementation with AbstractList requires only overriding > get() and size(), but not iterator(). On the other hand, bringing up a Set > implementation using AbstractSet requires overriding only iterator() and > size(). I think you will have to override iterator anyway (for the same reason the result of Arrays.asList() override the iterator). > > > Set2: > > - again, here, i don't think that inheriting from AbstractSet is a good > > idea. > > - in the iterator, pos (position ?) should be private and next() can be > > written without the '+=', > > public E next() { > > if (pos == 0) { > > pos = 1; > > return e0; > > } else if (pos == 1) { > > pos = 2; > > return e1; > > } else { > > throw new NoSuchElementException(); > > } > > } > > - the iterator should not be defined as inner class (with a reference to > > Set2) but > > be defined as a static class that contains a copy of e1 and e2, > > this will save an indirection and avoid to
Re: RFR(m): 8139233 add initial compact immutable collection implementations
On 5/4/16 1:38 AM, Remi Forax wrote: nice work ! Spoken like a true university professor: "nice work" followed by 100 lines of criticism. :-) first, all classes should be final otherwise, they are not truly immutable and all arrays should be marked as @Stable. OK, I'll make the classes final. On @Stable, I had a chat with Paul Sandoz about this, and he suggested holding back from adding this. Its semantics are quite particular and are subject to change, and widespread use could easily turn into misuse. List0, List1, why not using Collections.emptyList(), Collections.singletonList() here ? Serialization compatibility with future versions. I want all of these to serialize to the same serial proxy in JDK 9, so that in JDK 9+ they can be deserialized to whatever equivalent implementation classes exist in that version of the JDK. For example, there might be a single field-based implementation that covers a small range of sizes, or an implementation using value types. ListN: - i don't think that extending AbstractList is a good idea here, AbstractList provide the failfast/modCount mecanism not needed here. Moreover, i think you should have some overriden methods on par with Arrays.asList() (i.e. a custom isEmpty, toArray, indexOf, contains, iterator, spliterator and forEach) otherwise people will use Arrays.asList just because it's faster :( The implementations all extend AbstractList/Set/Map mainly for implementation convenience. This way it makes it easy to add and remove implementations tailored for specific sizes, since so much code is shared with the abstract superclasses. But they do come with some baggage, as you point out. As the set of implementation classes settles down, it would make more sense to override more methods, and refactor to handle sharing, at which point maybe the dependence on the abstract superclasses can be severed. - in the constructor, you should use a local variable here, @SafeVarargs ListN(E... input) { // copy and check manually to avoid TOCTOU @SuppressWarnings("unchecked") E[] elements = (E[])new Object[input.length]; // implicit nullcheck of input for (int i = 0; i < input.length; i++) { elements[i] = Objects.requireNonNull(input[i]); } this.elements = elements; } Nice. Will fix. List2: - same as for ListN, should not inherits from AbstractList. - i wonder why iterator() is not overriden like with Set2. Bringing up a List implementation with AbstractList requires only overriding get() and size(), but not iterator(). On the other hand, bringing up a Set implementation using AbstractSet requires overriding only iterator() and size(). Set2: - again, here, i don't think that inheriting from AbstractSet is a good idea. - in the iterator, pos (position ?) should be private and next() can be written without the '+=', public E next() { if (pos == 0) { pos = 1; return e0; } else if (pos == 1) { pos = 2; return e1; } else { throw new NoSuchElementException(); } } - the iterator should not be defined as inner class (with a reference to Set2) but be defined as a static class that contains a copy of e1 and e2, this will save an indirection and avoid to keep a link on the Set if it is transient. Good fixes, thanks. Some of this is stuff left over from when there were field-based implementations with larger sizes. SetN: - in the constructor, use a local variable like for ListN - the @SuppressWarnings for contains is wrong, probe should take an Object, not an E. Ah, nice cleanup. - the iterator should not be an inner class like Set2 and take the array as parameter. idx should be private. Instead of doing the loop in hasNext and next, the loop should be done once in the constructor and in next. So the code of hasNext will be simple and the JIT will be able to fold a call to hasNext() followed by a call to next(). OK, I'll look at this. - i don't understand how the serialization can work given that the SALT may be different between two VMs The salt is XORed with the element's hash to determine the position in the collection's array. Serializing it writes all the objects in the array from left to right, but we don't actually care what the order is. Upon deserialization, the objects' hash values are XORed with the deserializing JVM's salt and (probably) end up in different positions in the array of the newly deserialized collection. But again we don't care where they end up. Same goes for MapN's keys and values. MapN: - see SetN :) :-) SerialProxy: - I believe the class SerialProxy should be public, no ? No,
Re: RFR(m): 8139233 add initial compact immutable collection implementations
Hi Stephen, On 5/4/16 8:25 AM, Stephen Colebourne wrote: A new instance is being created each time for size-zero lists/sets/maps. This should return a singleton. Yes, this is a fairly obvious optimization. It's been on my personal to-do list, but I've moved this to a sub-task in JIRA: JDK-8156079. On the other hand, the tradeoff isn't so obvious. By analogy, we cache the small, integral boxed values. This saves memory but reduces locality. Still, this ought to be considered. The serialization proxy is reminiscent of those in JSR-310 but less space efficient. Once per stream, the proxy will write "java.util.ImmutableCollections.SerialProxy", whereas the JSR-310 one is "java.time.Ser". Thats about 29 bytes more than it needs too. ie. a package-scoped java.util.Ser would be more efficient. Good point. I'll refactor this to a top-level, private class with a shorter name. There are no messages for IndexOutOfBoundsException. Will fix to use Objects.checkIndex(). Like Peter, I'm not convinced that extending AbstractList is the way to go, although I can see arguments both ways. (I think this was Rémi.) Extending AbstractList makes it convenient to have a bunch of different variations, but it does come with some baggage. More discussion in my (forthcoming) reply to Rémi. equals() and hashCode() could be optimized on the size 0, 1 and 2 classes, as probably can some other methods. Should spliterator() should be overridden to add the IMMUTABLE flag? Yes, there are a bunch of methods that can and should be overridden in order to improve things. As you point out, spliterator() is one of them. This is already tracked by JDK-8133978, still future work. For maps, I implemented this years ago which may be of interest: http://commons.apache.org/proper/commons-collections/xref/org/apache/commons/collections4/map/Flat3Map.html#L72 Interesting. Thanks for the link. I disagree with altering the iteration order. Guava's ImmutableSet and ImmutableMap have reliable iteration specified to match creation order. This aspect of the design is very useful. I think this is a reasonable thing to want, but it would be a different API. The current Set.of() and Map.of() APIs have unspecified iteration order, and I want to "enforce" that via randomizing the iteration order. Having unspecified iteration order, but with the implementation giving a stable order, is the worst of both worlds. It lets clients bake in inadvertent order dependencies, and then when we do change it, it breaks them. (This seems to happen every couple years with HashMap.) Randomizing iteration order helps preserve implementation freedom. That said, it's a fine thing to have a different API that gives a Set or Map with a stable iteration order. Several people have asked for this. I've filed JDK-8156070 to track this. s'marks
Re: RFR(m): 8139233 add initial compact immutable collection implementations
On 5/4/16 6:22 AM, Alan Bateman wrote: This looks very good, but I just wonder about the iteration order varying from run to run in the Set/Map implementations. I recall the section added to the javadoc where it made it very clear that the iteration order is unspecified and subject to change and no issue with that. I'm just thinking about troubleshooting scenarios where it could make some issue harder to diagnose. Hi Alan, Yes, the unpredictability does introduce some risk of intermittent and hard-to-diagnose failures. On the other hand, the unspecified-but-mostly-stable iteration order we have with things like HashMap lets implicit order dependencies creep into code, which makes such failures even more rare and harder to diagnose. Plus, as hard as we try to make iteration order stable, there the times we make changes that do change the iteration order. Then, everything breaks. The goal here is to expose code that uses these collections to more frequent order changes in order to "toughen" it up by flushing out inadvertent order dependencies. That way, we can change the implementation at will without worrying about changes to iteration order. We'll see if this works. I can add an option to change the iteration order on every iteration, if you think that'll help. :-) s'marks
Re: RFR(m): 8139233 add initial compact immutable collection implementations
Hi Daniel, On 5/4/16 3:08 AM, Daniel Fuchs wrote: I wonder about serial interoperability with earlier versions of the JDK. For instance it will not be possible to send a Set created with Set.of(...) in JDK 9 to a JDK 8 VM. I wonder if there is any good solution to that. You could of course use writeReplace() to return a plain HashSet - but then you would only get a plain HashSet on the other end. That might be fine if that other end is JDK 8, but maybe not so good if it's actually JDK 9. Right, in general, you can't serialize a "new" class in a JDK release and deserialize it on an older JDK. If backward serialization compatibility were a goal, one could serialize them as a HashSet within an unmodifiable wrapper. But yes, deserializing that would increase the space consumption considerably. What worries me is that it will be very easy to forget about the serial incompatibility, and start using these new factory methods in e.g. java.lang.management, and then discover later on that it has introduced an interoperability issue. Is there any good way to prevent that - except thorough code reviews? Well I think you'd need to have some kind of interoperability tests. The most straightforward way to do this would be to keep old versions of the system around and test them against each other. This is tedious to set up and maintain and automate though. Maybe you could have some kind of serialization scanner that takes some serialized data and scans the serialized classes to make sure they're all in the expected set. But ultimately, if you're sending data that old versions need to process, you need to keep a close eye on everything that's serialized. The convenience of being able to serialize an entire object graph automatically works against you here. s'marks
Re: RFR(m): 8139233 add initial compact immutable collection implementations
Hi Peter, Great comments and questions. Responses below On 5/4/16 12:20 AM, Peter Levart wrote: - What's the point of 11 overloaded List.of() methods for 0 ... 10-arity if you then invoke the var-args ListN(E...input) constructor that does the array cloning with 8 of them? The same for Set.of()... I suggest that you don't clone the array in ListN(E... input) constructor but only in List.of(E ... input) method. Yes, it is silly to have fixed-arg overloads to provide a fast path, and then to turn around and use them in a varargs context that creates an array and then copies them, after which the constructor clones the array. The main point, though, is that the API allows for a fast path. Even if the current implementation doesn't take advantage of it, it can be updated later without changing the API. There's tension here between avoiding copying and central enforcement of all the invariants. I'll probably need to do something like adding non-cloning constructors and possibly fixed-arg constructor overloads to avoid varargs array creation. This is a bit more than I want to do at the moment, but I agree it's important. I've filed JDK-8156071 to track this. - {Set0, Set1, Set2}.contains(null) return false, but SetN.contains(null) throws NPE. All List{0,1,2,N}.contains(null) return false. What should be the correct behavior? ConcurrentHashMap.keySet().contains(null) throws NPE for example. Good catch. I did some investigation, and the results are interesting. All of the List implementations allow nulls. But the all the Queue and Deque implementations all disallow nulls, and they all return false from contains(null): - ArrayBlockingQueue - ArrayDeque - ConcurrentLinkedDeque - ConcurrentLinkedQueue - LinkedBlockingDeque - LinkedBlockingQueue - LinkedTransferQueue - PriorityBlockingQueue - PriorityQueue The following Set implementations disallow nulls, and they throw NPE from contains(null): - ConcurrentHashMap$KeySetView - ConcurrentSkipListSet The following Map implementations disallow null keys, and they throw NPE from containsKey(null) and containsValue(null): - ConcurrentHashMap - ConcurrentSkipListMap - Hashtable Things seem pretty consistent across the null-unfriendly collections: on Deques and Queues, contains(null) is allowed, but on Sets and Maps, contains(null) throws NPE. (I have no idea why things are this way.) Although there are no existing Lists that disallow nulls, based on the Queue and Deque behavior, the current behavior of contains(null) on the new List implementations seems reasonable. I will update/override the various contains() methods on the new Set and Map implementations to adjust them to throw NPE. This area could use more tests, too. I've filed JDK-8156074 to track this. - SetN.probe(E pe) and MapN.probe(K pk) invoke ee.equals(pe) and ek.equals(pk) respectively. Although it should be logically equivalent, it has been a practice in collections to invoke the .equals() method on the passed-in argument rather than on individual elements of the collection. Huh, I didn't know that. Good catch. Will fix. - Should unchecked exceptions from constructors be wrapped with InvalidObjectException in SerialProxy.readResolve() or does the Serialization infrastructure already perform the equivalent? Another good question! The serialization mechanism does deal with any exception that's thrown here, since readResolve is invoked reflectively. But basically it just rethrows runtime exceptions. Callers are presumably prepared to catch ObjectStreamExceptions, so wrapping exceptions thrown from the constructors in a suitable exception instance (probably InvalidObjectException) seems like a sensible thing to do. Thanks, s'marks
Re: RFR(m): 8139233 add initial compact immutable collection implementations
- Mail original - > De: "Peter Levart" > À: "Stephen Colebourne" , "core-libs-dev" > > Envoyé: Mercredi 4 Mai 2016 23:58:13 > Objet: Re: RFR(m): 8139233 add initial compact immutable collection > implementations > > > > On 05/04/2016 05:25 PM, Stephen Colebourne wrote: > > I disagree with altering the iteration order. Guava's ImmutableSet and > > ImmutableMap have reliable iteration specified to match creation > > order. This aspect of the design is very useful. > > > > Stephen > > Are they also O(1) on lookup at the same time? yes, the elements are stored twice (in two different arrays, one to keep the ordering (minus duplicates) the other to guarantee fast access. so it's 0(1) but it's not a 'compact' implementation. > > Altering iteration order from VM run to VM run is a double-edged sword. > I can understand that by doing so, programs written to depend on > iteration order are practically not possible. In the absence of such > "randomization", a simple change of hashCode algorithm of some element > of the collection (could be a user class) could break the program. But > as Alan pointed out it is also harder to reproduce problems that way. > OTOH one might argue that randomization actually prevents such problems > to appear in the first place. It forces the programs to be written in a > way so that they don't depend on any particular iteration order as it > uncovers any potential problems dependent on iteration order early in > the development phase. > > Regards, Peter > > cheers, Rémi
Re: RFR(m): 8139233 add initial compact immutable collection implementations
On 05/04/2016 05:25 PM, Stephen Colebourne wrote: I disagree with altering the iteration order. Guava's ImmutableSet and ImmutableMap have reliable iteration specified to match creation order. This aspect of the design is very useful. Stephen Are they also O(1) on lookup at the same time? Altering iteration order from VM run to VM run is a double-edged sword. I can understand that by doing so, programs written to depend on iteration order are practically not possible. In the absence of such "randomization", a simple change of hashCode algorithm of some element of the collection (could be a user class) could break the program. But as Alan pointed out it is also harder to reproduce problems that way. OTOH one might argue that randomization actually prevents such problems to appear in the first place. It forces the programs to be written in a way so that they don't depend on any particular iteration order as it uncovers any potential problems dependent on iteration order early in the development phase. Regards, Peter
Re: RFR(m): 8139233 add initial compact immutable collection implementations
A new instance is being created each time for size-zero lists/sets/maps. This should return a singleton. The serialization proxy is reminiscent of those in JSR-310 but less space efficient. Once per stream, the proxy will write "java.util.ImmutableCollections.SerialProxy", whereas the JSR-310 one is "java.time.Ser". Thats about 29 bytes more than it needs too. ie. a package-scoped java.util.Ser would be more efficient. There are no messages for IndexOutOfBoundsException. Like Peter, I'm not convinced that extending AbstractList is the way to go, although I can see arguments both ways. equals() and hashCode() could be optimized on the size 0, 1 and 2 classes, as probably can some other methods. Should spliterator() should be overridden to add the IMMUTABLE flag? For maps, I implemented this years ago which may be of interest: http://commons.apache.org/proper/commons-collections/xref/org/apache/commons/collections4/map/Flat3Map.html#L72 I disagree with altering the iteration order. Guava's ImmutableSet and ImmutableMap have reliable iteration specified to match creation order. This aspect of the design is very useful. Stephen On 4 May 2016 at 05:55, Stuart Marks wrote: > Hi all, > > This is a reimplementation of collections created by the JEP 269 convenience > factory methods. These implementations are overall quite a bit smaller than > their conventional collections counterparts, particularly at small sizes. > Lookup performance for the hash-based structures (Set and Map) is not > particularly fast, though in most cases it's comparable to TreeSet/TreeMap. > Further improvements are likely possible. > > There are no API changes in this changeset. > > Please review: > > http://cr.openjdk.java.net/~smarks/reviews/8139233/webrev.0/ > > JEP link: > > http://openjdk.java.net/jeps/269 > > Thanks, > > s'marks
Re: RFR(m): 8139233 add initial compact immutable collection implementations
On 04/05/2016 05:55, Stuart Marks wrote: Hi all, This is a reimplementation of collections created by the JEP 269 convenience factory methods. These implementations are overall quite a bit smaller than their conventional collections counterparts, particularly at small sizes. Lookup performance for the hash-based structures (Set and Map) is not particularly fast, though in most cases it's comparable to TreeSet/TreeMap. Further improvements are likely possible. There are no API changes in this changeset. Please review: http://cr.openjdk.java.net/~smarks/reviews/8139233/webrev.0/ This looks very good, but I just wonder about the iteration order varying from run to run in the Set/Map implementations. I recall the section added to the javadoc where it made it very clear that the iteration order is unspecified and subject to change and no issue with that. I'm just thinking about troubleshooting scenarios where it could make some issue harder to diagnose. -Alan
Re: RFR(m): 8139233 add initial compact immutable collection implementations
Hi Stuart, It will be good to have really immutable collections! But I have one comment: I wonder about serial interoperability with earlier versions of the JDK. For instance it will not be possible to send a Set created with Set.of(...) in JDK 9 to a JDK 8 VM. I wonder if there is any good solution to that. You could of course use writeReplace() to return a plain HashSet - but then you would only get a plain HashSet on the other end. That might be fine if that other end is JDK 8, but maybe not so good if it's actually JDK 9. What worries me is that it will be very easy to forget about the serial incompatibility, and start using these new factory methods in e.g. java.lang.management, and then discover later on that it has introduced an interoperability issue. Is there any good way to prevent that - except thorough code reviews? best regards, -- daniel On 04/05/16 06:55, Stuart Marks wrote: Hi all, This is a reimplementation of collections created by the JEP 269 convenience factory methods. These implementations are overall quite a bit smaller than their conventional collections counterparts, particularly at small sizes. Lookup performance for the hash-based structures (Set and Map) is not particularly fast, though in most cases it's comparable to TreeSet/TreeMap. Further improvements are likely possible. There are no API changes in this changeset. Please review: http://cr.openjdk.java.net/~smarks/reviews/8139233/webrev.0/ JEP link: http://openjdk.java.net/jeps/269 Thanks, s'marks
Re: RFR(m): 8139233 add initial compact immutable collection implementations
Hi Stuart, nice work ! first, all classes should be final otherwise, they are not truly immutable and all arrays should be marked as @Stable. List0, List1, why not using Collections.emptyList(), Collections.singletonList() here ? ListN: - i don't think that extending AbstractList is a good idea here, AbstractList provide the failfast/modCount mecanism not needed here. Moreover, i think you should have some overriden methods on par with Arrays.asList() (i.e. a custom isEmpty, toArray, indexOf, contains, iterator, spliterator and forEach) otherwise people will use Arrays.asList just because it's faster :( - in the constructor, you should use a local variable here, @SafeVarargs ListN(E... input) { // copy and check manually to avoid TOCTOU @SuppressWarnings("unchecked") E[] elements = (E[])new Object[input.length]; // implicit nullcheck of input for (int i = 0; i < input.length; i++) { elements[i] = Objects.requireNonNull(input[i]); } this.elements = elements; } - List2: - same as for ListN, should not inherits from AbstractList. - i wonder why iterator() is not overriden like with Set2. Set2: - again, here, i don't think that inheriting from AbstractSet is a good idea. - in the iterator, pos (position ?) should be private and next() can be written without the '+=', public E next() { if (pos == 0) { pos = 1; return e0; } else if (pos == 1) { pos = 2; return e1; } else { throw new NoSuchElementException(); } } - the iterator should not be defined as inner class (with a reference to Set2) but be defined as a static class that contains a copy of e1 and e2, this will save an indirection and avoid to keep a link on the Set if it is transient. SetN: - in the constructor, use a local variable like for ListN - the @SuppressWarnings for contains is wrong, probe should take an Object, not an E. - the iterator should not be an inner class like Set2 and take the array as parameter. idx should be private. Instead of doing the loop in hasNext and next, the loop should be done once in the constructor and in next. So the code of hasNext will be simple and the JIT will be able to fold a call to hasNext() followed by a call to next(). final static class SetNIterator implements Iterator { private final E[] elements; private int index = nextIndex(0); SetNIterator(E[] elements) { this.elements = elements; } private static int nextIndex(int index) { for(; index < elements.length; index++) { if (elements[index] != null) { return index; } } return -1; } @Override public boolean hasNext() { return index != -1; } @Override public E next() { if (index == -1) { throw new NoSuchElementException(); } E element = elements[index]; index = nextIndex(index + 1); return element; } } - i don't understand how the serialization can work given that the SALT may be different between two VMs MapN: - see SetN :) SerialProxy: - I believe the class SerialProxy should be public, no ? - The constructor should take a Object... to simplify all calls in the different writeReplace. - fields flags and array should be private regards, Rémi - Mail original - > De: "Stuart Marks" > À: "core-libs-dev" > Envoyé: Mercredi 4 Mai 2016 06:55:35 > Objet: RFR(m): 8139233 add initial compact immutable collection > implementations > > Hi all, > > This is a reimplementation of collections created by the JEP 269 convenience > factory methods. These implementations are overall quite a bit smaller than > their conventional collections counterparts, particularly at small sizes. > Lookup > performance for the hash-based structures (Set and Map) is not particularly > fast, though in most cases it's comparable to TreeSet/TreeMap. Further > improvements are likely possible. > > There are no API changes in this changeset. > > Please review: > > http://cr.openjdk.java.net/~smarks/reviews/8139233/webrev.0/ > > JEP link: > > http://openjdk.java.net/jeps/269 > > Thanks, > > s'marks >
Re: RFR(m): 8139233 add initial compact immutable collection implementations
Hi Stuart, Good to see these implementations finally appear. I have a few comments: - What's the point of 11 overloaded List.of() methods for 0 ... 10-arity if you then invoke the var-args ListN(E...input) constructor that does the array cloning with 8 of them? The same for Set.of()... I suggest that you don't clone the array in ListN(E... input) constructor but only in List.of(E ... input) method. - {Set0, Set1, Set2}.contains(null) return false, but SetN.contains(null) throws NPE. All List{0,1,2,N}.contains(null) return false. What should be the correct behavior? ConcurrentHashMap.keySet().contains(null) throws NPE for example. - SetN.probe(E pe) and MapN.probe(K pk) invoke ee.equals(pe) and ek.equals(pk) respectively. Although it should be logically equivalent, it has been a practice in collections to invoke the .equals() method on the passed-in argument rather than on individual elements of the collection. - Should unchecked exceptions from constructors be wrapped with InvalidObjectException in SerialProxy.readResolve() or does the Serialization infrastructure already perform the equivalent? Regards, Peter On 05/04/2016 06:55 AM, Stuart Marks wrote: Hi all, This is a reimplementation of collections created by the JEP 269 convenience factory methods. These implementations are overall quite a bit smaller than their conventional collections counterparts, particularly at small sizes. Lookup performance for the hash-based structures (Set and Map) is not particularly fast, though in most cases it's comparable to TreeSet/TreeMap. Further improvements are likely possible. There are no API changes in this changeset. Please review: http://cr.openjdk.java.net/~smarks/reviews/8139233/webrev.0/ JEP link: http://openjdk.java.net/jeps/269 Thanks, s'marks