Close review of Language Overview whitepaper

Maciej Stachowiak Wed, 14 Nov 2007 14:03:37 -0800

Hello ES4 fans,

I have now read the recently posted whitepaper. I marked up my printed  
copy with many comments in the margins, and I am sharing them with the  
list now.


Please note that this does not constitute an official Apple position,  
just some personal off-the-cuff opinions. I have discussed the  
proposal with some of my colleagues, including Geoff Garen who  
attended the recent f2f, but we have not figured out a consensus  
overall position or anything. With the disclaimers out of the way,  
here are my review comments:

Section I.

Goals: I strongly agree with the stated goals of compatibility and  
enabling large software development. I wonder if perhaps performance  
should be added as a goal. At the very least we want it to be possible  
to achieve performance on par with ES3 engines, and ideally we want to  
enable better performance.


Section II.

Programming in the small: "... make the writing and reading of  
fragments of code simpler and more effortless." That is somewhat  
dubious gramatically, I suggest (with additional style fixes) "make  
the reading and writing of code fragments easier."


Portability: This section first it says that the full language must be  
supported - subset profiles are not desirable. Then it says that, to  
allow ES4 to be practically implementable on small devices and in  
hosted environments, certain features, like extensive compile-time  
analysis and stack marks cannot be part of the language. Then it says  
those features are part of the language, but optional.

I hope the problems here are clear: first, the section plainly  
contradicts itself. It argues against subsets and certain classes of  
features, and then says the spec includes such features as optional,  
thus defining a subset. So that needs to be fixed in the whitepaper.  
More significantly, I think this may be an indication that the  
language has failed to meet its design goals. My suggestion would be  
to remove all optional features (though I could be convinced that  
strict mode is a special case).


Section III.

Syntax: The new non-contextual keywords, and the resulting need to  
specify dialect out of band, are a problem. I'll have more to say  
about compatibility under separate cover.

Behavior:
- This section has says that "variation among ES3 implementations  
entails a license to specify behavior more precisely for ES4".  
However, the example given is a case where behavior among two  
implementations was already the same, due to compatibility  
considerations. I actually think both convergence on a single behavior  
where variation is allowed, and variation that leads to practical  
compatibility issues are license to spec more precisely,

- The RegExp change - is this really a bug fix? It's likely that this  
is not a big compatibility issue (Safari's ES3 implementation had  
things the proposed ES4 way for some time) but I think ES3's approach  
may be more performance and generating a new object every time does  
not seem especially helpful.

Impact: This section talks a lot about incompatibilities between ES4  
and ES3, however I think incompatibilities with ES3 as specced are in  
themselves almost irrelevant. What matters is incompatibilities with  
existing implementations and the content that depends on them. This  
section also appears to talk disparagingly about some implementations  
prioritizing compatibility over ES3 compliance, implies that any  
deviations may be due to "inadequate engineering practices", and  
implies that only "some" implementations are not compatible with ES3.  
Is there any significant implementation that anyone would claim is  
100% free of ECMAScript 3 compliance bugs? I doubt it, and so I think  
we should make this section less judgmental in tone.

The web: Here especially, the actual concern is real-world  
compatibility, not compatibility with the ES4 spec. Furthermore, it  
completely ignores forward compatibility (the ability to serve ES4 to  
older browsers that do not support it). It implies that this is just  
an issue of aligning the timing of implementations. Ignoring for the  
moment how impractical it is to expect multiple implementations to  
roll out major new features in tandem, I note that there were similar  
theories behind XHTML, XSL, XHTML 2, and many other technologies that  
have largely failed to replace their predecessors. Again, I'll say  
more about compatibility (and in particular how the WHATWG approach to  
compatibility can be applied to ES4) under separate cover.



Section IV.

Classes: If any of the new type system is worthwhile, surely this is.  
The impedance mismatch between the class model used by most OO  
languages and by specifications like the DOM, and ES3's prototype  
model, is needlessly confusing to authors. So I approve of adding  
classes in a reasonable and tasteful way.

Dynamic properties: the fact that the "dynamic" behavior is not  
inherited makes class inheritence violate the Liskov Substitution  
Principle. I think this is a problem. Subclassing should be subtyping  
in the LSP sense. I am not sure offhand how to fix this.

Virtual Properties: I wish the keyword for catchall getters and  
setters was something other than "meta", which is a vague word that  
doesn't mean much. Why not "catchall" or "fallback" or something along  
similarly concrete lines? (I realize now upon re-reading my margin  
comments that this is supposed to match meta invoke, but there too I  
am not sure the relationship is worth the vagueness.)

Wrappers: The whitepaper implies that providing catchall getters and  
setters for primitive types and skipping boxing isn't a compatibility  
issue. However, it is possible in ES3 to capture an implicit wrapper:

var x;
String.prototype.myFunc = function() { this.foo = "foo"; x = this; };
"bar".myFunc();

Prototype hacking allows you to observe identity of the temporary  
wrappers, save them for later, and store properties. Perhaps there is  
evidence that practices relying on techniques like this are  
exceedingly uncommon (I'd certainly believe it), if so it should be  
cited.

Literals:
- I am surprised to see a decimal type (a type that is not directly  
supported in current mainstream hardware) even though generally  
popular types like single-precision IEEE floating point and 64 bit  
integers are not present.
- Since ints/uints overflow to doubles, then either all int math must  
be performed in double space (requiring constant conversions when  
working with int variables), or every operation must check for  
overflow and possibly fall back to double space. Even when the final  
result cannot overflow, certainly in many expressions the difference  
between int and double intermediates can be observed. It seems likely,  
then, that math on variables declared int will be slower than math on  
variables declared double, which will surely be confusing to  
developers. This seems pretty bogus. Is there any case where int math  
using the normal operators can actually be efficient? Would it be  
plausible to make ints *not* overflow to double unless there is an  
actual double operand involved (in which case int constants would  
always need a special suffix, or perhaps can somehow be determined  
contextually).

Section V.

Record and array types: Structural types are confusingly similar to  
yet different from classes. Mostly they offer a subset of class  
functionality (though reading ahead I did see a few features limited  
to them). Also, already having prototype-based objects and class-based  
objects it seems excessive to add yet a third way. I recommend  
removing them and adding any features that are sorely missed as a  
result to classes.

"Any": The spec explains vaguely that the "any" type is not identical  
to the union (null, undefined, Object). How is it different? Is the  
difference observable to ES4 programs or is it purely a matter  
internal to the spec (in which case the difference is not relevant)?

Type definitions: Seeing the example of a type definition for a record  
makes this feature seem even more redundant with classes.

Data Types: If structural types cannot be recursive, then one of the  
canonical applications of record-like types, the linked list, cannot  
be implemented this way. I assume it can be with classes. Yet another  
reason to fold any interesting record features into classes.

Nullability: Are non-nullable types really worth it? I am not sure.  
Does any other explicit type system for a dynamic OO language have  
such a concept? The whitepaper says that "the ability to store null is  
occasionally the source of run-time errors" but will not dynamic- 
checking result in runtime errors anyway when assigning null to a non- 
nullable variable (except in strict mode)?

"wrap": Seems like a version of this feature and/or "like" founded on  
classes would work just as well.

Conversions: "In addition, any value in the language converts to a  
member of AnyBoolean", but the conversions specified are all to the  
more specific "boolean" type, so perhaps it should be expressed that  
way to avoid confusion.


Section VI.

Predefined namespaces: ES4 predefines and automatically opens the  
__ES4__ namespace. What will happen in ES5 (or ES4.1 or whatever)?  
Will they still name the primary namespace __ES4__? Will it have  
__ES5__ instead? Will it have both? I don't care that much about the  
specifics as long as this has been thought through.

Bindings: The sheer number of constructs that bind names is a little  
scary. I count 16 in the list. I don't think anyone has raised the  
paucity of binding constructs as a critical flaw in ES3. Are all these  
different constructs really necessary?

Bonding objects and scopes: It seems like introducing lexical block  
scopes makes things more challenging for online implementations.  
Creating a dynamic scope object per block scope is clearly  
unacceptable, but more work may be needed to build a per-function  
symbol table that can properly accomodate block scope. Is block scope  
worth it? Yes, "var" is a little weird, but having both "var" and  
"let" increases conceptual footprint and may overall lead to more  
author confusion.

package: Now that I have learned more about them, I think that  
exposing packages and namespaces as separate user-level concepts is  
confusing. Let's get this down to a single concept that developers  
have to learn. Namespaces can just have a paired internal namespace  
implicitly, I do not think it is helpful to give the public/internal  
pair a special different name.

let, let const: Are expression let and block let really that useful,  
other than to make old-school Lisp/Scheme hackers smile? To  
programmers mainly used to imperative paradigms I think these will  
come off as syntactic salt. See also my previous comments about  
whether lexical block scope is worth adding to the language at all.

Program units:
- Is there any need for the concept of "unit" to be exposed in the  
syntax? Why not just allow "use unit" at top level, and implicitly  
make each file (or in the browser context each inline script) a unit?
- I think the difference between using units and importing packages is  
going to be confusing to authors. Seriously, can anyone explain in one  
sentence of 12 words or less how Joe Random Developers will decide  
whether to use a namespace, import a package, or use a unit? Can we  
get this down to only one kind of thing that needs to be mentioned in  
the syntax? This would be a big win in reducing conceptual footprint.


Section VII.

Versioning: I am suspicious of versioning mechanisms, especially big  
giant switch versioning. Is there any use of __ECMASCRIPT_VERSION__  
that is not better handled by feature testing? (Maybe there is and I  
am not thinking of it.)

Type annotations and type checking: This section implies that type  
annotations are not at all being added for performance reasons and may  
indeed be harmful to performance. Wow! Seriously? I think runtime  
assertions are interesting when debugging but I do would not want them  
happening for every assignment statement in a release build of my C++  
code. I am not sure why ECMAScript programmers would want that. Later  
this section says "it is plausible" that typed programs will run  
faster and not slower with enough analysis, but this issue seems far  
too crucial to take such a blase attitude. Unless we can show that  
type annotations won't cause a performance hit in practice, and in  
particular give a convincing argument that the relevant analysis can  
be done with reasonable speed and without introducing an ahead-of-time  
compile phase, then it is irresponsible to include type annotations as  
currently designed. I am willing to believe that this is the case, but  
I cannot sign on to an attitude that we don't care if typed programs  
get faster or slower. Nor am I willing to take experience based on  
ahead-of-time compilers as definitive.

Pragmas: The "use decimal" pragma highlights how much complexity there  
is to the decimal type. Seriously, is it worth it? Is the problems it  
solves really that common?

"for each" statement: This seems like a convenient piece of syntactic  
sugar.

Generators: Do ordinary programmers really understand coroutine  
control flow? Is this really a significantly better paradigm than  
passing a visitor function? Not really convinced in this one yet.

Operator overloading through global multimethods: Overloading? Yikes.  
Seems complicated. Aren't we worried that this could make the common  
case of existing untyped code slower than it is already?

Tail calls:
- The whitepaper doesn't define very precisely what "accumulate  
control stack" means. Are recursive calls allowed to accumulate other  
kinds of space (in which case the usefulness of the requirement is  
dubious)? Do functions that may be implemented in native code count  
(so for instance if you eval an expression that calls your function in  
tail position repeatedly, does the requirement apply?)
- "The use of procedural abstraction for iteration requires the use of  
un-abstract control structures to consumption of control stack space,  
among other things." This sentence seems to be buggy and has triggered  
a parse error in my brain.
- It seems odd to mention goto here, since it is not a feature of the  
language.

"this": The most common reason that I know of for trying to copy this  
into a variable is for lexically nested functions that are set as  
event listeners or similar, and not called immediately by name. So I  
don't think the this-passing feature actually addresses the common  
likely use-case for such a thing, and so may be more confusing than  
helpful.

"eval" operator and the "eval" function: This seems like a good  
approach to sanitizing eval. Perhaps it should be highlighted that  
splitting the eval function and eval operator is a potential  
performance benefit through opening significant new optimization  
opportunities.

arguments: It seems strange to both deprecate a feature and improve it  
at the same time.

"typeof" operator: I think it's been decided to back out the typeof  
"null" change so this may as well be dropped from the whitepaper.


Section VIII.

Strict:
- I would strongly prefer if strict mode did not alter behavior of  
programs at all, except to reject those that do not pass the checks.  
Otherwise, since strict mode is optional, this risks interop issues.  
So I'm curious what the eval detail is. Perhaps strict mode could  
remove the eval operator and allow only the eval function, with some  
suitably named version made available ahead of time, if the difference  
is just removing local eval behavior.
- I am somewhat concerned about having strict mode at all. It seems  
like it could create the same kinds of problems we see today with  
content that is served as application/xhtml+xml to some browsers and  
text/html to others. It's not infrequent to see such content break  
only in the browsers that really support XML, due to sloppy testing of  
changes and the fact that the 78% browser doesn't support XHTML.

Verification:
- Does strict mode actually allow for any optimizations that couldn't  
be done to the exact same program in standard mode?


Section IX.

"switch type" statement: I guess this beats switching on typeof, but  
is it really significantly better than a series of "if" statements  
using the "is" operator?

Expression closures: I actually find the examples hard to follow given  
my expectation of ES3-like syntax. I think this may actually be  
syntactic salt.

Array comprehensions: This seems pretty sugary to me but this kind of  
syntax has proven useful for typical developers using Python.

Destructuring assignment and binding: I grudgingly accept that this  
sort of construct has been proven in the context of Python and Perl.

"type": Are runtime meta-objects representing types ruly necessary?  
What are they good for?

Slicing: This one I mildly object to. Array/String slicing is not, to  
my knowledge, particularly common in ECMAScript code of today. I am  
dubious that it merits its own operator syntax.

Semicolon insertion: I'd like more detail on the compatibility of the  
return change. The do-while change adopts de facto reality and so is  
good.

Trailing commas: Good to standardize this de facto extension.


Section X.

Map: Long overdue to have a real hashtable type.

Early binding, static type checking, and predictable behavior with  
"intrinsic": Perhaps it should be highlighted more that this is a  
potential significant performance improvement.

Reflection: This feature seems like it could be complex to implement  
and potentially unnecessary for small implementations. I note that  
J2ME omits reflection, which we can perhaps take as a sign that it is  
not suitable for small implementations.

ControlInspector: I think an interface that's meant for debuggers and  
similar tools, and not implementable in all interesting contexts, does  
not need to be standardized. Better than having an optional feature.

JSON: Sounds good.

DontEnum: Overloading a getter to sometimes also be a setter seems to  
be in poor taste. (1) It's confusing. (2) It makes it impossible to  
separately feature-test for existence of the setter. I suggest adding  
setPropertyIsEnumerable instead. Why this design choice? Also: can  
built-in properties that are naturally DontEnum be made enumerable?  
That seems like annoying additional complexity?

Math: I'm surprised to learn that the numeric value 4 is distinct in  
int and double types, and yet int math still must (effectively) be  
done in double space. This seems bad for performance all around. If  
ints are to be a distinct type, then integer math should always be  
done in int space.

uint-specific operations: This is syntactically ugly. Why can't  
integer math just always work this way? Also, why only uint versions?  
Surely it is desirable to do efficient math on signed integers as  
well. Also, bitops already happen in integer math space, thus type- 
specific versions should not be necessary since no floating point  
conversion will need to occur if both operands of ^ or & are  
statically typed as int or uint.


Things I didn't see:

What about standardizing the de facto <!-- comment syntax that is  
necessary for web compatibility?

_______________________________________________
Es4-discuss mailing list
Es4-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es4-discuss

Close review of Language Overview whitepaper

Reply via email to