I have been reading previous threads, the original bug request, and exploring 
the javadoc and implementation of toList() on Stream in JDK 16. I don’t want to 
waste time rehashing previous discussions, but I want to understand and 
prioritize the motivation of this change, and propose what I believe is a safer 
alternative name for this method based on the current implementation: 
Stream.toUnmodifiableList().

Big +1 to "let's not rehash previous discussions, but help us understand the motivation."  Stewarding the core libraries is a complex task, and there are rarely hard-and-fast rules for doing the Right Thing.

Your question seems to have two main aspects:

 - Why this method, why not others, and why now
 - Why take such a strong anti-mutability position with this method

The desire for a Stream::toList method has a long history; when we first did streams, it was one of the first convenience methods to be "requested".  We resisted then, for good reasons, but we knew this saga was not over.

"Convenience" methods are a constant challenge in the JDK.  On the one hand, they are, well, convenient, and we want Java to be easy and pleasant to program in.  On the other, the number of potentially-useful imaginable convenience methods is infinite, and the widespread perception is that they are so easy, that all that is needed is for someone to propose the idea.  (The (admittedly soft) criteria we use for judging whether a convenience method meets the bar is an interesting one, which we can have separately.)

There are basically two stable points with respect to convenience methods in API design; zero tolerance, and "don't worry, be happy". In the former, the methods of an API are like a basis (ideally, an orthonormal one) of a vector space; the minimum number of API points from which you can derive all possible usages.  At the other extreme, every reasonable combination of methods gets its own special form of expression.  Of course, both are extremes (Stream::count and IntStream::sum are conveniences for reduce, and even Haskell's Monad has multiple ways to represent bind), but APIs tend to align themselves in one direction or another.  And, as the JDK APIs go, Streams treats sparsity and orthogonality as virtues to be striven for.

Eclipse Collections chooses a different (and also valid!) philosophy: completeness, and it walks the walk.  (Having 81 (template-generated) implementations of HashMap is proof.) Similarly, Tagir's StreamEx is an example of an extension to Stream that takes the other approach.  And both are great!  But also, they are not how the JDK rolls.  Which is fine; it's a big ecosystem, and there's room for multiple philosophies, and each can find its fans and detractors.

The calls for a convenience for Stream::toList have come pretty much continuously since we first resisted it (but, we knew even then that if we had a lifetime budget for just one convenience method, it would end up being toList.)  We knew then that there would be questions to ask about what the ideal dial settings would be for toList, and were not yet ready to confront the question, nor did we want to add fuel to the demands for more convenience methods ("No toSet?  Inconsistent!")

When an API is new, and all things are possible, we tend to be in "imagine everything we could put into it" mode, and streams was no different.  It is wise to resist this temptation -- and maybe even over-rotate in the other direction -- to allow for some time for the spirit of what you've built to make itself clear; even creators are not always immediately clear on the nuances of their creation.  So we tried hard to resist the calls for unnecessary methods, knowing that they could always be added, but not taken away, and also, allowing for the true gaps to emerge from usage.  (The first method to be added, takeWhile(), was the very opposite of a convenience; it represented a reasonable use case that the original design didn't support.)

So, why toList now?  Well, a number of reasons.  Collecting to a list is one of the most common terminal operations, so any small irritant (like a clumsy locution) adds up.  And, as has been pointed out, it can be more efficient if it is brought into the stream core rather than held at arm's length through Collector.  So if we're going to compromise our principles in one place, after thinking about it for a long time, this seemed a worthy candidate.  (And still, we hesitate, because we knew it would be firing the starting gun for the "But where's toSet?" arguments.)

So yes, there are lots of good reasons to continue to Just Say No to conveniences, but, there are also reasonable times to make exceptions -- especially when it is not purely about convenience. And, data suggests that toList is 5-10x more popular than the next most popular collector, so there's a clear argument to say that toList is pretty special, and we can stop there.

List is a mutable interface.

This is true to an extent (though even the specification of List makes it clear the mutative methods are strictly optional), but even if it were absolutely true, I am still not sure how relevant it is to what streams should do.  When I wrote Collectors::toList, ArrayList was indeed the obvious default implementation choice -- but it was also obviously not a very good choice.  We didn't have an efficient unmodifiable collection at the time, and wrapping with unmodifiableList seemed like taxing a lot of well-behaved users for the would-be sins of the few.  But if we had efficient unmodifiable collections then, I would absolutely, positively have made that choice.

Streams is an API that takes functional principles to heart, sometimes even in ways that are uncomfortable to Java developers. (For example, it imposes constraints on the lambdas we pass to its methods, which are the Java analogues of purity and side-effect freedom -- which are not necessarily familiar constraints.)  Data structures are about managing and organizing data in memory, but streams are about capturing and composing behavior, not data. (Obviously, streams consume and produce data at their extreme points, but it tries to make the fewest possible assumptions about the form that data takes.)  Where Stream meets List, Stream is allowed to have an opinion about what kinds of lists it likes better, and an unmodifiable list seems far more in the spirit of Streams.  And of course, collect(toCollection(f)) lets you collect to whatever sort of collection you like.

A convention was established in 2014 with Collectors.toList() returning a 
mutable List (ArrayList).
I am having a hard time expressing just how much I disagree with the sentiment behind this claim.  I knew, when I was writing Collectors::toList, that I would someday be having this discussion; my best efforts to head this discussion off were memorialized in the specification for Collectors::toList:

There are no guarantees on the type, mutability, serializability, or thread-safety of the|List|returned; if more control over the returned|List|is required, use|toCollection(Supplier)| <https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toCollection-java.util.function.Supplier->.

I'd hope this would be interpreted as: "Dear developer who assumes that, just because this returns ArrayList today, that somehow it is reasonable to assume toList will always return an ArrayList: you are wrong, and I hope you have the good sense to never make this argument out loud."

The reason that this "reasonable-seeming" assumption -- that what the first implementation does is reasonable to take as normative, even when the spec says otherwise -- is so toxic, is that it cripples the ability of the platform to evolve.  There's a reason we write specifications for APIs; because implementations are intrinsically accidental and contextual, and context changes out from under us.  Even when writing it, I was aware of the degree to which programmers would be overwhelmingly tempted (despite what I hoped was their better judgment) to count on the mutability of the returned list if that is what they wanted.  Saying `toCollection(ArrayList::new)`, which guarantees exactly the characteristics such users would want, is Just Not That Hard.  Sure, saying toList() is easier, but the tradeoff there is you accept whatever (compliant) List the library wants to serve up, and the library gets some say in what that is, and which might even vary from tuesday to wednesday.   A toList() method should try to balance the competing concerns for what is the most reasonable default, and when the JDK improves in a way that shifts that balance, or the context shifts, the JDK should be able to improve with it.

So, this "establish a convention" claim is dangerous because it pushes us towards the assumption that everything the JDK does, even the things it *clearly specifies as implementation details that might change*, can never change.  Which means we would have to be *even more deliberate* about anything we do, which means the rate at which we can move forward is *even slower*.

But, you are making an even stronger claim than that!   We're not trying to change the implementation of Collectors::toList (which the spec makes clear should might happen.)  We're adding _another_ method with that name, somewhere else.  Which makes the above argument even more dangerous -- essentially, it says "don't use a word in any API ever, unless you are prepared to interpret it exactly the same way in all future contexts."  Surely, you see how this doesn't lead to a world we want to live in.

So, what should `Stream::toList` mean?  it should mean: return whatever kind of list that Streams thinks is the best all-around default implementation to use, based on the best understanding of what typical users want.  This involves balancing a lot of things, and that balance can move over time.

We could call this toUnmodifiableList, and there's surely a certain logic to that.  But, this is likely to have unintended consequences.  First, the fact that the name is fussier makes it even less attractive as a convenience, which is an argument to not do it at all.  Users who mostly count characters (which is sadly common) would be more likely to continue to use collect(toList()), even if the new method is better in multiple ways.  If we have Stream::toUnmodifiableList, it is *even more likely* to generate demands for other toXxxList conveniences.  Worse, it would likely generate arguments for a toList that works the same as collect(toList()) -- which takes an existing "accidental mutability" problem and guarantees that problem into the infinite future.  It's bad enough that collect(toList()) yields a mutable list -- it would be even worse for Stream::toList to do the same.  Most users don't need mutability, and are better off not getting it if they don't need it; they should ask for it if they need it.

[1] Example usages of Eclipse Collections toList:


// toList result is mutable for all of these usages with Eclipse Collections
List list1 = mutableSet.toList();
List list2 = mutableSet.asLazy().toList();
List list3 = mutableSet.asParallel(Executors.newWorkStealingPool(), 
10).toList();
List list4 = mutableSet.stream().collect(Collectors.toList());
List list5 = mutableSet.stream().collect(Collectors2.toList());

These are nice, but there's a subtle difference here that is salient.  Eclipse Collections attempts to integrate data management and behavioral composition into a single library.  This is a fine goal, but it does mean that the behavioral methods have more responsibility to fit with the data-management side of the story.

Streams took an almost opposite interpretation -- one reason NOT to do a Stream::toList method was that it overly coupled Streams to Collections.  Laundering stream-to-List via a specific collector (which is clearly more of a "plug in" than core functionality) seemed preferable.  We chose more of an arms-length relationship between Stream and Collections.  Again, different philosophies. (Adding Stream::toList goes back on that a bit, after thinking about it for a bunch of years, and deciding it was OK in this case.)


The primary cost here is a seeming "inconsistency", because people have been able to convince themselves that `toList()` means "to ArrayList", and now, there will be cases where that is not true. Given the choice between catering to explicitly wrong assumptions (the spec even says "don't make this assumption"!), and improving the platform over time, I choose the latter.  Consistency is a good baseline goal, but consistencies can be taken to foolish extremes.

Reply via email to