Re: RFR: 8180352: Add Stream.toList() method

Brian Goetz Fri, 05 Feb 2021 07:28:35 -0800

I have been reading previous threads, the original bug request, and exploring 
the javadoc and implementation of toList() on Stream in JDK 16. I don’t want to 
waste time rehashing previous discussions, but I want to understand and 
prioritize the motivation of this change, and propose what I believe is a safer 
alternative name for this method based on the current implementation: 
Stream.toUnmodifiableList().

Big +1 to "let's not rehash previous discussions, but help us understandthe motivation." Stewarding the core libraries is a complex task, andthere are rarely hard-and-fast rules for doing the Right Thing.


Your question seems to have two main aspects:

 - Why this method, why not others, and why now
 - Why take such a strong anti-mutability position with this method

The desire for a Stream::toList method has a long history; when we firstdid streams, it was one of the first convenience methods to be"requested". We resisted then, for good reasons, but we knew this sagawas not over.

"Convenience" methods are a constant challenge in the JDK. On the onehand, they are, well, convenient, and we want Java to be easy andpleasant to program in. On the other, the number of potentially-usefulimaginable convenience methods is infinite, and the widespreadperception is that they are so easy, that all that is needed is forsomeone to propose the idea. (The (admittedly soft) criteria we use forjudging whether a convenience method meets the bar is an interestingone, which we can have separately.)

There are basically two stable points with respect to conveniencemethods in API design; zero tolerance, and "don't worry, be happy". Inthe former, the methods of an API are like a basis (ideally, anorthonormal one) of a vector space; the minimum number of API pointsfrom which you can derive all possible usages. At the other extreme,every reasonable combination of methods gets its own special form ofexpression. Of course, both are extremes (Stream::count andIntStream::sum are conveniences for reduce, and even Haskell's Monad hasmultiple ways to represent bind), but APIs tend to align themselves inone direction or another. And, as the JDK APIs go, Streams treatssparsity and orthogonality as virtues to be striven for.

Eclipse Collections chooses a different (and also valid!) philosophy:completeness, and it walks the walk. (Having 81 (template-generated)implementations of HashMap is proof.) Similarly, Tagir's StreamEx is anexample of an extension to Stream that takes the other approach. Andboth are great! But also, they are not how the JDK rolls. Which isfine; it's a big ecosystem, and there's room for multiple philosophies,and each can find its fans and detractors.

The calls for a convenience for Stream::toList have come pretty muchcontinuously since we first resisted it (but, we knew even then that ifwe had a lifetime budget for just one convenience method, it would endup being toList.) We knew then that there would be questions to askabout what the ideal dial settings would be for toList, and were not yetready to confront the question, nor did we want to add fuel to thedemands for more convenience methods ("No toSet? Inconsistent!")

When an API is new, and all things are possible, we tend to be in"imagine everything we could put into it" mode, and streams was nodifferent. It is wise to resist this temptation -- and maybe evenover-rotate in the other direction -- to allow for some time for thespirit of what you've built to make itself clear; even creators are notalways immediately clear on the nuances of their creation. So we triedhard to resist the calls for unnecessary methods, knowing that theycould always be added, but not taken away, and also, allowing for thetrue gaps to emerge from usage. (The first method to be added,takeWhile(), was the very opposite of a convenience; it represented areasonable use case that the original design didn't support.)

So, why toList now? Well, a number of reasons. Collecting to a list isone of the most common terminal operations, so any small irritant (likea clumsy locution) adds up. And, as has been pointed out, it can bemore efficient if it is brought into the stream core rather than held atarm's length through Collector. So if we're going to compromise ourprinciples in one place, after thinking about it for a long time, thisseemed a worthy candidate. (And still, we hesitate, because we knew itwould be firing the starting gun for the "But where's toSet?" arguments.)

So yes, there are lots of good reasons to continue to Just Say No toconveniences, but, there are also reasonable times to make exceptions --especially when it is not purely about convenience. And, data suggeststhat toList is 5-10x more popular than the next most popular collector,so there's a clear argument to say that toList is pretty special, and wecan stop there.

List is a mutable interface.

This is true to an extent (though even the specification of List makesit clear the mutative methods are strictly optional), but even if itwere absolutely true, I am still not sure how relevant it is to whatstreams should do. When I wrote Collectors::toList, ArrayList wasindeed the obvious default implementation choice -- but it was alsoobviously not a very good choice. We didn't have an efficientunmodifiable collection at the time, and wrapping with unmodifiableListseemed like taxing a lot of well-behaved users for the would-be sins ofthe few. But if we had efficient unmodifiable collections then, I wouldabsolutely, positively have made that choice.

Streams is an API that takes functional principles to heart, sometimeseven in ways that are uncomfortable to Java developers. (For example, itimposes constraints on the lambdas we pass to its methods, which are theJava analogues of purity and side-effect freedom -- which are notnecessarily familiar constraints.) Data structures are about managingand organizing data in memory, but streams are about capturing andcomposing behavior, not data. (Obviously, streams consume and producedata at their extreme points, but it tries to make the fewest possibleassumptions about the form that data takes.) Where Stream meets List,Stream is allowed to have an opinion about what kinds of lists it likesbetter, and an unmodifiable list seems far more in the spirit ofStreams. And of course, collect(toCollection(f)) lets you collect towhatever sort of collection you like.

A convention was established in 2014 with Collectors.toList() returning a 
mutable List (ArrayList).

I am having a hard time expressing just how much I disagree with thesentiment behind this claim. I knew, when I was writingCollectors::toList, that I would someday be having this discussion; mybest efforts to head this discussion off were memorialized in thespecification for Collectors::toList:

There are no guarantees on the type, mutability, serializability, orthread-safety of the|List|returned; if more control over thereturned|List|is required, use|toCollection(Supplier)|<https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toCollection-java.util.function.Supplier->.

I'd hope this would be interpreted as: "Dear developer who assumes that,just because this returns ArrayList today, that somehow it is reasonableto assume toList will always return an ArrayList: you are wrong, and Ihope you have the good sense to never make this argument out loud."

The reason that this "reasonable-seeming" assumption -- that what thefirst implementation does is reasonable to take as normative, even whenthe spec says otherwise -- is so toxic, is that it cripples the abilityof the platform to evolve. There's a reason we write specifications forAPIs; because implementations are intrinsically accidental andcontextual, and context changes out from under us. Even when writingit, I was aware of the degree to which programmers would beoverwhelmingly tempted (despite what I hoped was their better judgment)to count on the mutability of the returned list if that is what theywanted. Saying `toCollection(ArrayList::new)`, which guarantees exactlythe characteristics such users would want, is Just Not That Hard. Sure,saying toList() is easier, but the tradeoff there is you accept whatever(compliant) List the library wants to serve up, and the library getssome say in what that is, and which might even vary from tuesday towednesday. A toList() method should try to balance the competingconcerns for what is the most reasonable default, and when the JDKimproves in a way that shifts that balance, or the context shifts, theJDK should be able to improve with it.

So, this "establish a convention" claim is dangerous because it pushesus towards the assumption that everything the JDK does, even the thingsit *clearly specifies as implementation details that might change*, cannever change. Which means we would have to be *even more deliberate*about anything we do, which means the rate at which we can move forwardis *even slower*.

But, you are making an even stronger claim than that! We're not tryingto change the implementation of Collectors::toList (which the spec makesclear should might happen.) We're adding _another_ method with thatname, somewhere else. Which makes the above argument even moredangerous -- essentially, it says "don't use a word in any API ever,unless you are prepared to interpret it exactly the same way in allfuture contexts." Surely, you see how this doesn't lead to a world wewant to live in.

So, what should `Stream::toList` mean? it should mean: return whateverkind of list that Streams thinks is the best all-around defaultimplementation to use, based on the best understanding of what typicalusers want. This involves balancing a lot of things, and that balancecan move over time.

We could call this toUnmodifiableList, and there's surely a certainlogic to that. But, this is likely to have unintended consequences. First, the fact that the name is fussier makes it even less attractiveas a convenience, which is an argument to not do it at all. Users whomostly count characters (which is sadly common) would be more likely tocontinue to use collect(toList()), even if the new method is better inmultiple ways. If we have Stream::toUnmodifiableList, it is *even morelikely* to generate demands for other toXxxList conveniences. Worse, itwould likely generate arguments for a toList that works the same ascollect(toList()) -- which takes an existing "accidental mutability"problem and guarantees that problem into the infinite future. It's badenough that collect(toList()) yields a mutable list -- it would be evenworse for Stream::toList to do the same. Most users don't needmutability, and are better off not getting it if they don't need it;they should ask for it if they need it.

[1] Example usages of Eclipse Collections toList:


// toList result is mutable for all of these usages with Eclipse Collections
List list1 = mutableSet.toList();
List list2 = mutableSet.asLazy().toList();
List list3 = mutableSet.asParallel(Executors.newWorkStealingPool(), 
10).toList();
List list4 = mutableSet.stream().collect(Collectors.toList());
List list5 = mutableSet.stream().collect(Collectors2.toList());

These are nice, but there's a subtle difference here that is salient. Eclipse Collections attempts to integrate data management and behavioralcomposition into a single library. This is a fine goal, but it doesmean that the behavioral methods have more responsibility to fit withthe data-management side of the story.

Streams took an almost opposite interpretation -- one reason NOT to do aStream::toList method was that it overly coupled Streams toCollections. Laundering stream-to-List via a specific collector (whichis clearly more of a "plug in" than core functionality) seemedpreferable. We chose more of an arms-length relationship between Streamand Collections. Again, different philosophies. (Adding Stream::toListgoes back on that a bit, after thinking about it for a bunch of years,and deciding it was OK in this case.)

The primary cost here is a seeming "inconsistency", because people havebeen able to convince themselves that `toList()` means "to ArrayList",and now, there will be cases where that is not true. Given the choicebetween catering to explicitly wrong assumptions (the spec even says"don't make this assumption"!), and improving the platform over time, Ichoose the latter. Consistency is a good baseline goal, butconsistencies can be taken to foolish extremes.

Re: RFR: 8180352: Add Stream.toList() method

Reply via email to