Re: Case transformations in strings
On Tue, 24 Mar 2009 00:50:24 +0100, David-Sarah Hopwood wrote: If converting one character to many would cause a problem with the reference to toUpperCase in the regular expression algorithm, then presumably Safari and Chrome would hit that problem. Do they, or do they use different uppercase conversions for regexps vs toUpperCase? The Regular Expression specification in ES3 doesn't use toUpperCase directly, but rather the Canonicalize helper function (15.10.2.8). It states: 2. Let u be ch converted to upper case as if by calling String.prototype.toUpperCase on the one-character string ch. 3. If u does not consist of a single character, return ch. I.e., it uses a different algorithm for regexps than for strings. (It also prevents non-ASCII characters from canonicalizing to ASCII characters.) If the latter, then we should allow that, and probably require it. It's allowed, and required, already, so that's an easy fix :) /Lasse ___ Es-discuss mailing list Es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Targeting the ecmascript AST
Following on from the recent discussions on ecmascript as a compiler target and Brendan's point re targeting the AST, I have coded a user-friendly pseudocode-ish syntax which targets the V8 ecmascript AST: 'zedscript' aka 'zed is ecmascript for dummies'. (And given the focus of the list i won't be posting followup's - but i hope it may be of interest to some readers). The code can be found at: http://code.google.com/p/zedscript/ Zedscript runs _in_ the V8 ecmascript engine. That is, there is no parsing zedscript source -> javascript source phase. It targets the AST directly. The V8 scanning/parsing source code has been altered to allow the V8 engine to compile and run both ecmascript and zedscript. It reuses the V8 tokens/scanning/parsing/runtime/errorhandling machinery. This reduces abstraction leakages and aids debugging. zedscript can call ecmascript and vice-versa. (Calling jsfunction.toString() can be surprising however!). A zedscript script can be run via the V8 shell: ./shell So, zedscript provides a thin layer of syntax sugar over the core ecmascript semantics which will (hopefully): * show the emerging rich 3.1 and 4(Harmony) semantics in the best possible best light. * minimizes quirks and gotchas. * emphasize simplify, security, safety and speed. For instance ecmascript 3.1 'strict mode' could be enabled by default. * is not a port of existing languages e.g python, ruby. (Imho emcascript does not need 1001IronXXX ports. It needs one good alternate syntax - which can take inspiration from the sugar/syntax of other pragmatic languages - but can be considered a dialect of the core ecmascript semantics rather than a new language or port). * hits the sweet spot between succinctness and pseudocode readability. The goal is to track ecmascript 3.1 and 4/Harmony and deliver zedscript 3.1 and 4 on the V8 engine. (And maybe on tracemonkey and sfx/nitro). It could be considered a synthesis of Brendan's goal of sugar for es4 and Douglas Crockford's idea re a 'new' language. I feel that some users - especially those without a comp. sci background - will never get on with the curlies syntax. An alternate syntax in addition to the javascript syntax - especially for esHarmony - could really help promote ecmascript as a general purpose scripting language. Even if something like this never gets into the browser it would be useful for server and desktop development. Here an example 'ztest/sample.js'. Note how the script begins with //zed to signal to V8 it's a zedscript file. (This is a temporary solution). //zed // - currently has a dylan/moo -ish syntax // - the parens around the expression could change to // a more ruby/lua -ish syntax, with optional do/then. print("*** start\n") var x = 10 var y = 20 // if has elif clauses // and/or are aliases for &&/|| if (x == -99) print("FAIL") elif (x == 10 and y > 15) print("OK") elif (x == -99 or y > -99) print("FAIL") else print("FAIL") end // not as alias for ! var b = false; print(!b) print(not b) // fn as alias for function fn times2(i) return i * 2 end var z = 1 while (z <= 5) print(z + " : " + times2(z++)) end print("\n*** end "); -- ___ Es-discuss mailing list Es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
RE: Exactly where is a RegularExpressionLiteral allowed?
OK, let's try to wrap up this issues. In addition to adding RegularExpressionLiteral to Literal, do we also agree to delete the third paragraph of section 7 that says: Note that contexts exist in the syntactic grammar where both a division and a RegularExpressionLiteral are permitted by the syntactic grammar; however, since the lexical grammar uses the InputElementDiv goal symbol in such cases, the opening slash is not recognised as starting a regular expression literal in such a context. As a workaround, one may enclose the regular expression literal in parentheses. The second paragraph says: "The InputElementDiv symbol is used in those syntactic grammar contexts where a division (/) or division-assignment (/=) operator is permitted." Should we insert the work "initial" (or "leading") immediately in front of "division" to clarify where such contexts occur? Allen ___ Es-discuss mailing list Es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Case transformations in strings
Christian Plesner Hansen wrote: > David-Sarah Hopwood wrote: >> If converting one character to many would cause a problem with the >> reference to toUpperCase in the regular expression algorithm, then >> presumably Safari and Chrome would hit that problem. Do they, or >> do they use different uppercase conversions for regexps vs >> toUpperCase? > > Chrome uses context (but not locale) sensitive special casing for > ordinary toUpperCase. For regexps it uses the same mapping but > doesn't convert chars that map to more than one char and non-ascii > chars that would have converted to ascii chars. We would have liked > to use the full multi-character mapping without the exception for > ascii but couldn't for compatibility reasons. Can you expand on what the compatibility problem was for non-ASCII -> ASCII mappings in regexps? -- David-Sarah Hopwood ⚥ ___ Es-discuss mailing list Es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Case transformations in strings
David-Sarah Hopwood wrote: > Christian Plesner Hansen wrote: >> David-Sarah Hopwood wrote: >>> If converting one character to many would cause a problem with the >>> reference to toUpperCase in the regular expression algorithm, then >>> presumably Safari and Chrome would hit that problem. Do they, or >>> do they use different uppercase conversions for regexps vs >>> toUpperCase? >> Chrome uses context (but not locale) sensitive special casing for >> ordinary toUpperCase. For regexps it uses the same mapping but >> doesn't convert chars that map to more than one char and non-ascii >> chars that would have converted to ascii chars. We would have liked >> to use the full multi-character mapping without the exception for >> ascii but couldn't for compatibility reasons. > > Can you expand on what the compatibility problem was for > non-ASCII -> ASCII mappings in regexps? Oh, never mind -- this is required by step 5 of Canonicalize in section 15.10.2.8. So, there would be no regexp-related problems with requiring toUpperCase to perform multi-code-unit and/or context-sensitive mappings in ES3.1. -- David-Sarah Hopwood ⚥ ___ Es-discuss mailing list Es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Re: Exactly where is a RegularExpressionLiteral allowed?
Waldemar Horwat wrote: > David-Sarah Hopwood wrote: >> I'll repeat my argument here for convenience: >> >> A DivisionPunctuator must be preceded by an expression. >> A RegularExpressionLiteral is itself an expression. >> >> (This assumes that the omission of RegularExpressionLiteral from >> Literal is a bug.) >> >> Therefore, for there to exist syntactic contexts in which either >> a DivisionPunctuator or a RegularExpressionLiteral could occur, >> it would have to be possible for an expression to immediately >> follow [*] another expression with no intervening operator. >> The only case in which that can occur is where a semicolon is >> automatically inserted between the two expressions. >> Assume that case: then the second expression cannot begin >> with [*] a token whose first character is '/', because that >> would have been interpreted as a DivisionPunctuator, and so >> no semicolon insertion would have occurred (because semicolon >> insertion only occurs where there would otherwise have been a >> syntax error); contradiction. > > Yes, I verified when we were writing ES3 that this was the only case > where the syntactic grammar permitted a / to serve as both a division > (or division-assignment) and a regexp literal. The interaction of > lexing and semicolon insertion would have been unclear (how can you say > that the next token is invalid if you don't know how to lex it?), so we > wrote the spec to explicitly resolve those in favor of division. If that is what the note is intended to clarify, I think its current wording is more confusing than helpful. It certainly confused me. Anyway, there is no case in which a regexp needs to be parenthesized to avoid lexical ambiguity. How about replacing the current wording by something that specifically discusses the semicolon insertion issue, with an example: There are two goal symbols for the lexical grammar. The InputElementDiv symbol is used in those syntactic grammar contexts where a leading division (/) or division-assignment (/=) operator is permitted. The InputElementRegExp symbol is used in other syntactic grammar contexts. NOTE There are no syntactic grammar contexts where both a leading division or division-assignment, and a leading RegularExpressionLiteral are permitted. This is not affected by semicolon insertion (section 7.9); in examples such as the following: a = b /hi/g.exec(c).map(d); where the first non-whitespace, non-comment character after a LineTerminator is '/' and the syntactic context allows division or division-assignment, no semicolon is inserted at the LineTerminator. That is, this example is interpreted in the same way as: a = b / hi / g.exec(c).map(d); -- David-Sarah Hopwood ⚥ ___ Es-discuss mailing list Es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
RE: Exactly where is a RegularExpressionLiteral allowed?
I agree, I think David-Sarah's proposed note is better than just deleting the third paragraph. ___ Es-discuss mailing list Es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss