On Mar 3, 2021, at 8:09 AM, Brian Goetz <brian.go...@oracle.com> wrote: > >> the whole story about initializing a local variable with an array is weird, >> int data[] = {1, 2, 3}; >> compiles but >> int data[]; >> data = {1, 2, 3}; >> does not. > > True, and I hate it. This is related to the C-style array decl; for the > first year it was really important that Java seem friendly to C developers, > and this was one of the compromises. For the rest of time, it is just an > annoying irregularity.
Here are some thoughts on array handling and patterns. Basically, looking at pattern assignment, in the case of arrays, prompts me to think that the more rewarding goal is pattern declarations, not pattern assignments. So I’ll try to compare and contrast those, for arrays and other kinds of patterns. Backing up… First, I agree with Brian that “int data[] = …” is horrible; we should have only allowed “int[] data = …” and taken the hit early. But the nested-braces notation is good even if you never learned C. As everyone knows, the array declaration syntax uses the declaration type as context for decoding the nested initializer expressions. This allows the programmer to provide the array contents without re-asserting the declared array type when the array instance is created. The “stuff in braces” array initializer notation is also uniformly available in expressions of the form “new T[]{ … }”, with the “new T[]” header providing exactly the same context to the “{…}”, as in the array declaration case. (Bit of context: Bill Joy and I added the “new T[]{ … }” expression syntax in 1.1, so I’m partial to the “stuff in braces” notation, and I also think it can be extended and rationalized even further.) If the language required explicitly typed array initializers, we’d have to write stuff like this: int[][] as = new int[][]{ new int[]{ 1, 2 }, new int[]{ 3 } }; We would surely tire of it and add some project-coin-style sugar like today’s syntax: int[][] as = { { 1, 2 }, { 3 } }; The “var” feature similarly allows users to state a controlling type (or factory) in just one place. var as = new int[][] { { 1, 2 }, { 3 } }; (Here’s a factory example: var as = List.of(List.of(1, 2), List.of(3)); It’s an intriguing problem to cross-generalize “stuff in braces” syntax to factories and to non-array types. But I digress. We can revisit when/if we do construction expressions.) OK, so as Remi says, we don’t allow: int[][] as; as = { { 1, 2 }, { 3 } }; Nor, in fact, do we allow: var as; as = new int[][] { { 1, 2 }, { 3 } }; I think the two cases are parallel. Because there is contextual type information in the two halves (declaration and assignment) that communicates from one half to the other (from int[][] as to the nested exprs, or from the new int[][] expr to the var as), you can’t break up the declaration without breaking up the communication. So, in the case of int[][] as = {…}, the communication is a sort of very strong, explicit target typing, where the quasi-expression {…} is evaluated in the context of, not only an inferred type (as in a hypothetical “f({…})”) but an explicitly declared type (“int[][]” before “as”). Such an explicit header type deserves to be recycled as context to the rest of the declaration, if it is useful there, and indeed it is. (In “var as = new int[][]{…}” the communication is, well, source typing, or whatever is the opposite of target typing. The source type here is very strong and explicit, but it could also be “var as = List.of(…)” in which case the source type is implicit, inferred as a result of type checking.) Do arrays scale upward toward user-defined literals or templated construction expressions? Actually, I think we are pretty close. We have spoken at various times about an explicitly typed head followed by an implicitly typed tail for such things, and I think “new int[][] {…}” is a good “bellwether” example. And if that’s true, then the duality between “var as = new int[][]{ … }” and the stylistically different but semantically equivalent “int[][] as = { … }” is also a reusable concept: Allowing the type-rich head to be either a declaration type in a declaration or else an expression-prefix in an expression seems almost a forced move. (Precedents for the type-rich expression head would be a cast or a wrapping function call as with lambdas, or ad hoc syntax as with array or object creation expressions. Newer ideas will surely follow.) When I say this, I’m *not* painting a bikeshed for the templated expressions; the type-rich head doesn’t have to be “new T” or “(T)” per se, nor does the “tail stuff” have to use curly braces. We have spent lots of whiteboard time over the years (almost a decade) talking about specific bikeshed colors for this, or perhaps a whole rainbow of user selectable colors. But there’s no point in reviewing all that now. I do think that our experiences with target typing in pattern matching will help us do something similar with construction expressions (if we go there). The reason is that a construction expression is pretty must “merely” an arrow-reversed construction expression. (Specifically, the dynamic data flow is reversed. The static flow is probably still left to right.) So all the same information is there, just flowing around in a somewhat different direction. Going back to the thread topic, of pattern assignments, I think you get the clearest notations when you use pattern assignment within the context of a declaration, exactly because you have the most possible “communication” between the head type of the declaration and whatever type information is in the tail. So I’m not surprised that breaking a matching declaration, into a data-free head and a separate assignment of data to a bare name, doesn’t always work. In fact, I’m surprised it works as well as it does. One more thought: Deconstruction is the same as construction, except for data flow direction. In deconstruction, data flows from a pre-existing target object to its extracted components. In construction, data flows from (injected?) components to a (newly created) target object. It is very desirable, IMO, for the “two directions” to look and feel somehow similar, in their notations. For arrays, this suggests that while construction looks like this: int[][] output = { inputs… }; var output = new int[][]{ inputs… }; So a deconstruction should look something like this: int[][] {outputs…} = input; //maybe: var {outputs…} = (int[][]) input; I’m saying “something like” NOT “exactly like”! By example, the construction “new Box(x)” is only “something” like the pattern “Box(var x)”. And yet their similarity suggests what is true, that they do similar jobs. (One reverses the other.) For a standalone assignment that deconstructs an array, a turned-around array creation expression could make sense, but it’s really ugly: new int[][] {outputs…} = input; //yuck (Compared with deconstructing declarations, the standalone assignment syntax feels like a bridge too far, frankly.) And for factories, maybe: var output = List.of(inputs…); //<= reverses => List of(outputs…) = input; //maybe: var of(outputs…) = (List) input; And for partial extraction methods, maybe: var outputMap = m.with(k, inputVal); //<= reverses => Map<> with(k)(outputVal) = inputMap; //ignore m Their ugly cousins, the standalone assignments seem to want to take this ground: List.of(outputs…) = input; Map<>.with(k)(outputVal) = inputMap; But they shouldn’t, I think. There should a token somewhere that says, “yes, I do want to assign to stuff, not declare stuff”. Straw man: assign int[][] {outputs…} = input; assign List.of(outputs…) = input; assign Map<>.with(k)(outputVal) = inputMap; The extra syntax is needed if we privilege the convention to declare binding names as needed, rather than to rummage around and assign to them as needed. A better syntax (IMO) would be to mark each pattern binding variable in such a way that if unmarked, it is a newly bound variable, and if marked, it is assigned to a pre-existing variable (which must be in scope). int[][] {assign output, assign output2} = input; assign List.of(assign output, assign output2) = input; assign Map<>.with(k)(assign outputVal) = inputMap; This has two benefits: 1. It’s clear which variables are getting assigned to (and therefore require attention to non-local declarations, from surrounding code). 2. You can mix assignments and bindings in the same pattern. — John P.S. Another point, only slightly related: If we add syntax support for combined declarations of *frozen* arrays we will run into the limits of the compact array notation, and I think there will be some pressure to make it more flexible. To explain, this works OK: var as = new int[] { 1, 2, 3 }.freeze(); var as = Arrays.freeze(new int[] { 1, 2, 3 }); This doesn’t: var as = new int[][] { { 1, 2 }, { 3 } }.freeze(); because (subtly) the sub-arrays are mutable. Nor does this, although I have seen it used informally: int[] as = { 1, 2, 3 }.freeze(); Even if that is rationalized somehow, this has the same ambiguity problem with mutable subarrays: int[][] as = { { 1, 2 }, { 3 } }.freeze(); Happily, we can start playing with the frozen arrays themselves before we start cooking up sugar for them. After trying out use cases we’ll have a better feel for what sugar we want to add. In the early days it might sometimes look as bad as this: var as = new int[][]{ new int[]{ 1, 2 }.freeze(), new int[]{ 3 }.freeze() }.freeze(); (Or worse, if we are scrupulous about avoiding double copies and use an ArrayBuilder helper. But, one step at a time…) The connection between patterns per se and frozen arrays is quite simple: An array deconstruction pattern works exactly the same on a mutable and on a frozen array. The above is really about the limitations of compact array initializers. Compact notations are inherently difficult to adjust in meaning, at least while preserving compactness.