Re: why is regexp /\-/u a syntax-error?

2019-09-20 Thread Mathias Bynens
Think of the `u` flag as a strict mode for regular expressions.

`/\a/u` throws, because there is no reason to escape `a` as `\a` --
therefore, if such an escape sequence is present, it's likely a user error.
The same goes for `/\-/u`. `-` only has special meaning within character
classes, not outside of them.

On Fri, Sep 20, 2019 at 11:22 AM kai zhu  wrote:

> jslint previously warned against unescaped literal "-" in regexp.
>
> however, escaping "-" together with unicode flag "u", causes syntax error
> in chrome/firefox/edge (and jslint has since removed warning):
>
> ```javascript
> let rgx = /\-/u
> VM21:1 Uncaught SyntaxError: Invalid regular expression: /\-/: Invalid
> escape
> at :1:10
> ```
>
> just, curious on reason why above edge-case is a syntax-error?
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Proposal: `String.prototype.codePointCount`

2019-08-08 Thread Mathias Bynens
Prior discussion from 7 years ago:
https://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string

[...string].length does what you want. But it's definitely not always what
you need
.

On Thu, Aug 8, 2019 at 4:37 AM fanerge  wrote:

> I expect to be able to add an attribute to String.prototype that returns
> the number of codePoints of the string to reflect the actual number of
> characters instead of the code unit.
>
>
> Definition of String.prototype.length
>
> This property returns the number of code units in the string. UTF-16
> , the string format used by
> JavaScript, uses a single 16-bit code unit to represent the most common
> characters, but needs to use two code units for less commonly-used
> characters, so it's possible for the value returned by length to not
> match the actual number of characters in the string.
>
> We refer to the String class in Java
>
> The String class in the Java JVM uses UTF-16 encoding.
> String.length(): The method returns the number of characters in char in
> the string;
> String.codePointCount(): The method returns the number of codewords in
> the string.
>
>
> *I want the ECMA organization to be able to add a property or method to
> String.prototype that returns the value of the codePoint of the string. For
> example: String.prototype.codePointCount can return the actual number of
> codePoints instead of code unit.*
>
> *```*
>
> const str1 = ‘’;
>
> str1.length; // 4
>
> str1.codePointCount; // 4
>
> // ‘1’.codePointAt(0) // 49
>
>
> const str2 = '’;
>
> str2.length; // 8
>
> str2.codePointCount; // 4
>
> // '𠮷'.codePointAt(0); // 134071
>
>
> const str3 = ‘’;
>
> str3.length; // 8
>
> str3.codePointCount; // 4
>
> // '😯'.codePointAt(0); // 128559
>
> *```*
>
> *I believe that most developers need such a method and property to get the
> number of codePoints in a string. I sincerely hope that you can accept my
> proposal*,* thanks.*
>
>
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Overload str.replace to take a Map?

2018-05-20 Thread Mathias Bynens
 function (match0) {
> getValue(match0.slice(2, -2));
> if (value === undefined) {
> return match0;
> }
> argList.slice(1).forEach(function (arg0, ii, list) {
> switch (arg0) {
> case 'alphanumeric':
> value = value.replace((/\W/g), '_');
> break;
> case 'decodeURIComponent':
> value = decodeURIComponent(value);
> break;
> case 'encodeURIComponent':
> value = encodeURIComponent(value);
> break;
> case 'jsonStringify':
> value = JSON.stringify(value);
> break;
> case 'jsonStringify4':
> value = JSON.stringify(value, null, 4);
> break;
> case 'notHtmlSafe':
> notHtmlSafe = true;
> break;
> case 'truncate':
> skip = ii + 1;
> if (value.length > list[skip]) {
> value = value.slice(0, list[skip] - 3).trimRight() +
> '...';
> }
> break;
> // default to String.prototype[arg0]()
> default:
> if (ii === skip) {
> break;
> }
> value = value[arg0]();
> break;
> }
> });
> value = String(value);
> // default to htmlSafe
> if (!notHtmlSafe) {
> value = value
> .replace((/"/g), '"')
> .replace((/&/g), '&')
> .replace((/'/g), ''')
> .replace((/ .replace((/>/g), '>')
> .replace((/&(amp;|apos;|gt;|lt;|quot;)/ig), '&$1');
> }
> return value;
> });
> };
>
> console.log(templateRender(process.argv[2], JSON.parse(process.argv[3])));
> ```
>
>
>
> kai zhu
> kaizhu...@gmail.com
>
>
>
> On 20 May 2018, at 6:32 PM, Isiah Meadows  wrote:
>
> @Mathias
>
> My partcular `escapeHTML` example *could* be written like that (and it
> *is* somewhat in the prose). But you're right that in the prose, I did
> bring up the potential for things like `str.replace({cheese: "cake", ham:
> "eggs"})`.
>
> @Kai
>
> Have you ever tried writing an HTML template system on the front end? This
> *will* almost inevitably come up, and most of my use cases for this is on
> the front end itself handling various scenarios.
>
> @Cyril
>
> And every single one of those patterns is going to need compiled and
> executed, and compiling and interpreting regular expressions is definitely
> not quick, especially when you can nest Kleene stars. (See:
> https://en.wikipedia.org/wiki/Regular_expression#
> Implementations_and_running_times) That's why I'm against it - we don't
> need to complicate this proposal with that mess.
>
> -
>
> Isiah Meadows
> m...@isiahmeadows.com
> www.isiahmeadows.com
>
> On Sat, May 19, 2018 at 7:04 PM, Mathias Bynens  wrote:
>
>> Hey Kai, you’re oversimplifying. Your solution works for a single Unicode
>> symbol (corresponding to a single code point) but falls apart as soon as
>> you need to match multiple symbols of possibly varying length, like in the
>> `escapeHtml` example.
>>
>> On Sat, May 19, 2018 at 8:43 AM, kai zhu  wrote:
>>
>>> again, you backend-engineers are making something more complicated than
>>> needs be, when simple, throwaway glue-code will suffice.  agree with
>>> jordan, this feature is a needless cross-cut of String.prototype.replace.
>>>
>>> ```
>>> /*jslint
>>> node: true
>>> */
>>> 'use strict';
>>> var dict;
>>> dict = {
>>> '$': '^',
>>> '1': '2',
>>> '<': '<',
>>> '🍌': '🍑',
>>> '-': '_',
>>> ']': '@'
>>> };
>>> // output: "test🍐🍑_^^[22@ <foo>"
>>> console.log('test🍐🍌-$$[11] '.replace((/[\S\s]/gu), function
>>> (character) {
>>> return dict.hasOwnProperty(character)
>>&g

Re: Overload str.replace to take a Map?

2018-05-19 Thread Mathias Bynens
Hey Kai, you’re oversimplifying. Your solution works for a single Unicode
symbol (corresponding to a single code point) but falls apart as soon as
you need to match multiple symbols of possibly varying length, like in the
`escapeHtml` example.

On Sat, May 19, 2018 at 8:43 AM, kai zhu  wrote:

> again, you backend-engineers are making something more complicated than
> needs be, when simple, throwaway glue-code will suffice.  agree with
> jordan, this feature is a needless cross-cut of String.prototype.replace.
>
> ```
> /*jslint
> node: true
> */
> 'use strict';
> var dict;
> dict = {
> '$': '^',
> '1': '2',
> '<': '<',
> '🍌': '🍑',
> '-': '_',
> ']': '@'
> };
> // output: "test🍐🍑_^^[22@ "
> console.log('test🍐🍌-$$[11] '.replace((/[\S\s]/gu), function
> (character) {
> return dict.hasOwnProperty(character)
> ? dict[character]
> : character;
> }));
> ```
>
> kai zhu
> kaizhu...@gmail.com
>
>
>
> On 19 May 2018, at 4:08 PM, Cyril Auburtin 
> wrote:
>
> You can also have a
>
> ```js
> var replacer = replacements => {
>   const re = new RegExp(replacements.map(([k,_,escaped=k]) =>
> escaped).join('|'), 'gu');
>   const replaceMap = new Map(replacements);
>   return s => s.replace(re, w => replaceMap.get(w));
> }
> var replace = replacer([['$', '^', String.raw`\$`], ['1', '2'], ['<',
> '<'], ['🍌', '🍑'], ['-', '_'], [']', '@', String.raw`\]`]]);
> replace('test🍐🍌-$$[11] ') // "test🍐🍑_^^[22@ "
> ```
> but it's quickly messy to work with escaping
>
> Le sam. 19 mai 2018 à 08:17, Isiah Meadows  a
> écrit :
>
>> Here's what I'd prefer instead: overload `String.prototype.replace` to
>> take non-callable objects, as sugar for this:
>>
>> ```js
>> const old = Function.call.bind(Function.call, String.prototype.replace)
>> String.prototype.replace = function (regexp, object) {
>> if (object == null && regexp != null && typeof regexp === "object") {
>> const re = new RegExp(
>> Object.keys(regexp)
>> .map(key => `${old(key, /[\\^$*+?.()|[\]{}]/g, '\\$&')}`)
>> .join("|")
>> )
>> return old(this, re, m => object[m])
>> } else {
>> return old(this, regexp, object)
>> }
>> }
>> ```
>>
>> This would cover about 99% of my use for something like this, with
>> less runtime overhead (that of not needing to check for and
>> potentially match multiple regular expressions at runtime) and better
>> static analyzability (you only need to check it's an object literal or
>> constant frozen object, not that it's argument is the result of the
>> built-in `Map` call). It's exceptionally difficult to optimize for
>> this unless you know everything's a string, but most cases where I had
>> to pass a callback that wasn't super complex looked a lot like this:
>>
>> ```js
>> // What I use:
>> function escapeHTML(str) {
>> return str.replace(/["'&<>]/g, m => {
>> switch (m) {
>> case '"': return """
>> case "'": return "'"
>> case "&": return "&"
>> case "<": return "<"
>> case ">": return ">"
>> default: throw new TypeError("unreachable")
>> }
>> })
>> }
>>
>> // What it could be
>> function escapeHTML(str) {
>> return str.replace({
>> '"': """,
>> "'": "'",
>> "&": "&",
>> "<": "<",
>> ">": ">",
>> })
>> }
>> ```
>>
>> And yes, this enables optimizations engines couldn't easily produce
>> otherwise. In this instance, an engine could find that the object is
>> static with only single-character entries, and it could replace the
>> call to a fast-path one that relies on a cheap lookup table instead
>> (Unicode replacement would be similar, except you'd need an extra
>> layer of indirection with astrals to avoid blowing up memory when
>> generating these tables):
>>
>> ```js
>> // Original
>> function escapeHTML(str) {
>> return str.replace({
>> '"': """,
>> "'": "'",
>> "&": "&",
>> "<": "<",
>> ">": ">",
>> })
>> }
>>
>> // Not real JS, but think of it as how an engine might implement this. The
>> // implementation of the runtime function `ReplaceWithLookupTable` is
>> omitted
>> // for brevity, but you could imagine how it could be implemented, given
>> the
>> // pseudo-TS signature:
>> //
>> // ```ts
>> // declare function %ReplaceWithLookupTable(
>> // str: string,
>> // table: string[]
>> // ): string
>> // ```
>> function escapeHTML(str) {
>> static {
>> // A zero-initialized array with 2^16 entries (U+-U+),
>> except
>> // for the object's members. This takes up to about 70K per
>> instance,
>> // but these are *far* more often called than created.
>> const _lookup_escapeHTML = %calloc(65536)
>>
>> _lookup_escapeHTML[34] = """
>> _lookup_escapeHTML[38] = "&"
>> _lookup_escapeHTML[39] = "'"
>> _lookup_escapeHTML[60] = ">"
>> _lookup_escapeHTML[62] = "<

Re: add reverse() method to strings

2018-03-18 Thread Mathias Bynens
For arrays, indexing is unambiguous: `array[42]` is whatever value you put
there. As a result, it’s clear what it means to “reverse” an array.

This is not the case for strings, where indexing is inherently ambiguous.
Should `string[42]` index by UCS-2/UTF-16 code unit? By Unicode code point?
By grapheme cluster?



On Mon, Mar 19, 2018 at 6:28 AM, Felipe Nascimento de Moura <
felipenmo...@gmail.com> wrote:

> I have had to use that one, parsing texts and I remember I had to reverse
> strings that represented tokens...but that was very specific.
>
> What I would like to see in strings would be something like "firstCase"
> for transforming "felipe" into "Felipe" for example.
> I always have to use something like `str[0].toUpperCase() + str.slice(1)`.
>
> The only reason I would defend the "reverse" method in strings is because
> it makes sense.
> I think JavaScript is very intuitive, and, as Arrays do have the "reverse"
> method, that simply makes sense to have it in strings as well.
>
> Cheers.
>
>
> [ ]s
>
> *--*
>
> *Felipe N. Moura*
> Web Developer, Google Developer Expert
> <https://developers.google.com/experts/people/felipe-moura>, Founder of
> BrazilJS <https://braziljs.org/> and Nasc <http://nasc.io/>.
>
> Website:  http://felipenmoura.com / http://nasc.io/
> Twitter:@felipenmoura <http://twitter.com/felipenmoura>
> Facebook: http://fb.com/felipenmoura
> LinkedIn: http://goo.gl/qGmq
> -
> *Changing  the  world*  is the least I expect from  myself!
>
> On Sun, Mar 18, 2018 at 12:00 PM, Mark Davis ☕️ 
> wrote:
>
>> .reverse would only be reasonable for a subset of characters supported by
>> Unicode. Its primary cited use case is for a particular educational
>> example, when there are probably thousands of similar examples of educational
>> snippets that would be rarely used in a production environment. Given
>> that, it would be far better for those people who really need it to just
>> provide that to their students as a provided function for the sake of that
>> example.
>>
>> Mark
>>
>> On Sun, Mar 18, 2018 at 8:56 AM, Grigory Hatsevich > > wrote:
>>
>>> "This would remove the challenge and actively worsen their learning
>>> process" -- this is not true. You can see it e.g. by looking at the
>>> specific task I was talking about:
>>>
>>> "Given a string, find the shortest possible string which can be achieved
>>> by adding characters to the end of initial string to make it a palindrome."
>>>
>>> This is my code for this task:
>>>
>>> function buildPalindrome(s){
>>> String.prototype.reverse=function(){
>>> return this.split('').reverse().join('')
>>> }
>>>
>>> function isPalindrome(s){
>>> return s===s.reverse()
>>> }
>>> for (i=0;i>> first=s.slice(0,i);
>>> rest=s.slice(i);
>>> if(isPalindrome(rest)){
>>> return s+first.reverse()
>>>}
>>> }
>>> }
>>>
>>>
>>> As you see, the essence of this challenge is not in the process of
>>> reversing a string. Having a reverse() method just makes the code more
>>> readable -- comparing to alternative when one would have to write
>>> .split('').reverse().join('') each time instead of just .reverse()
>>>
>>> On Sun, Mar 18, 2018 at 2:38 PM, Frederick Stark 
>>> wrote:
>>>
>>>> The point of a coding task for a beginner is to practice their problem
>>>> solving skills to solve the task. This would remove the challenge and
>>>> actively worsen their learning process
>>>>
>>>>
>>>> On Mar 18 2018, at 6:26 pm, Grigory Hatsevich 
>>>> wrote:
>>>>
>>>>
>>>> My use case is solving coding tasks about palindromes on codefights.com.
>>>> Not sure if that counts as "real-world", but probably a lot of beginning
>>>> developers encounter such tasks at least once.
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, 18 Mar 2018 06:41:46 +0700, Mathias Bynens 
>>>> wrote:
>>>>
>>>> So far no one has provided a real-world use case.
>>>>
>>>> On Mar 18, 2018 10:15, "Mike Samuel" >>> <https://link.getmailspring.com/link/1521358598.local-593d9031-9a3d-v1.1.5-5834c...@g

Re: add reverse() method to strings

2018-03-17 Thread Mathias Bynens
So far no one has provided a real-world use case.

On Mar 18, 2018 10:15, "Mike Samuel"  wrote:

> Previous discussion: https://esdiscuss.org/topic/wiki-updates-for-string-
> number-and-math-libraries#content-1
>
> """
> String.prototype.reverse(), as proposed, corrupts supplementary
> characters. Clause 6 of Ecma-262 redefines the word "character" as "a
> 16-bit unsigned value used to represent a single 16-bit unit of text", that
> is, a UTF-16 code unit. In contrast, the phrase "Unicode character" is used
> for Unicode code points. For reverse(), this means that the proposed spec
> will reverse the sequence of the two UTF-16 code units representing a
> supplementary character, resulting in corruption. If this function is
> really needed (is it? for what?), it should preserve the order of surrogate
> pairs, as does java.lang.StringBuilder.reverse:download.oracle.com/
> javase/7/docs/api/java/lang/StringBuilder.html#reverse()
> """
>
> On Sat, Mar 17, 2018 at 1:41 PM, Grigory Hatsevich 
> wrote:
>
>> Hi! I would propose to add reverse() method to strings. Something
>> equivalent to the following:
>>
>> String.prototype.reverse = function(){
>>   return this.split('').reverse().join('')
>> }
>>
>> It seems natural to have such method. Why not?
>>
>
>
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: JSON.canonicalize()

2018-03-16 Thread Mathias Bynens
On Fri, Mar 16, 2018 at 9:04 PM, Mike Samuel  wrote:

>
> The output of JSON.canonicalize would also not be in the subset of JSON
> that is also a subset of JavaScript's PrimaryExpression.
>
>JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`
>

Soon U+2028 and U+2029 will no longer be edge cases. A Stage 3 proposal
(currently shipping in Chrome) makes them valid in ECMAScript string
literals, making JSON a strict subset of ECMAScript:
https://github.com/tc39/proposal-json-superset
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Ranges

2016-11-04 Thread Mathias Bynens
On Fri, Nov 4, 2016 at 6:24 PM, Jordan Harband  wrote:
> Here you go:
>
> 1) `function* range(start, end) { for (const i = +start; i < end; ++i) {
> yield i; } }`

For future reference: `++i` throws when `i` is a `const` binding. The
intended example uses `let` instead.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Adding DOTALL modifier to ECMAScript regex standards

2016-08-15 Thread Mathias Bynens

> On 10 Aug 2016, at 16:02, Jake Reynolds  wrote:
> 
> I brought up the topic of adding the DOTALL modifier to the Chrome V8 Engine 
> here and was directed to es-discuss.  I was curious about the practicality 
> and the want for adding a DOTALL modifier to the ECMAScript standards in the 
> future?
> 
> For those that don't know that DOTALL modifier is a regex modifier that 
> allows the '.' symbol to match newlines as well.
> 
> Example regex: /he[.*]?llo/
> Example search string 1: hello
> Example search string 2: he
> llo
> 
> The above regex will match the 1st search string but will not match the 2nd.
> 
> In ECMAScript the only current way to make a match like that work is to use 
> [\d\D] which will match everything including newlines, given below.  
> 
> Current workaround regex: /he[\d\D]?llo/
> 
> The s modifier is the standard in most major languages except Javascript and 
> Ruby.  This will allow newline matching for the . symbol.  The proposed regex 
> is below:
> 
> Proposed new regex: /he[.*]?llo/s
> Example search string: he
> llo

Formal proposal (incl. proposed spec changes) for this feature: 
https://github.com/mathiasbynens/es-regexp-singleline-flag
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Adding DOTALL modifier to ECMAScript regex standards

2016-08-10 Thread Mathias Bynens
On Wed, Aug 10, 2016 at 4:40 PM, Bob Myers  wrote:
> If it's any consolation there is the more compact hack of `[^]`, which I
> **think** is supposed to work everywhere.

That doesn’t work in IE < 9, but that shouldn’t matter in 2016.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Could we add the missing Regexp features from perl?

2016-06-24 Thread Mathias Bynens

> On 18 Jun 2016, at 00:01, Sebastian Zartner  
> wrote:
> 
> There are already a few regexp features in the pipeline, see 
> https://github.com/goyakin/es-regexp (listed in the Stage 0 proposals at 
> https://github.com/tc39/proposals/blob/master/stage-0-proposals.md).

Another one would be the proposal to add Unicode property escapes of the form 
`\p{…}` and `\P{…}` to ECMAScript regular expressions: 
https://github.com/mathiasbynens/es-regexp-unicode-property-escapes This is 
scheduled to be presented at the next TC39 meeting.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Observing whether a function is strict

2016-05-26 Thread Mathias Bynens
On Thu, May 26, 2016 at 9:48 AM, Claude Pache  wrote:
> I was wondering whether there is a way to observe whether a given random 
> function is strict (or sloppy, or neither).
> […] Are there other ways? (If not, I find it somewhat unfortunate that only 
> such nonstandard features leak this information.)

I smell a proposal for `Reflect.isStrict` in the making…
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Object.getOwnPropertyDescriptors still at stage 0

2016-01-18 Thread Mathias Bynens
On Mon, Jan 18, 2016 at 8:27 PM, Andrea Giammarchi
 wrote:
> Do you (or anyone else) know if that should be filed as a PR to tc39/ecma262
> or if it should just be a repository eventually posted in here?

It should be a repository that can eventually move to the tc39
organization if all goes well. Good luck!
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Object.getOwnPropertyDescriptors still at stage 0

2016-01-18 Thread Mathias Bynens
On Mon, Jan 18, 2016 at 6:50 PM, Andrea Giammarchi
 wrote:
> Accordingly with this ecma262 stage 0 summary
> https://github.com/tc39/ecma262/blob/master/stage0.md the (quite long time
> ago) discussed `Object.getOwnPropertyDescriptors`
> https://gist.github.com/WebReflection/9353781 hasn't move a bit from there.
>
> However, there are already use cases
> https://gist.github.com/WebReflection/9353781#gistcomment-1672863 and it's
> already available with Babel.
>
> On top of that, the npm package
> https://www.npmjs.com/package/object.getownpropertydescriptors has some
> download, actually surpassing es7-shim repository
> https://github.com/es-shims/es7-shim#shims
>
> What should be done in order to move this little improvement to a stage 1
> situation?

Consider turning your excellent gist into a repository that fulfills
the criteria described here: https://tc39.github.io/process-document/
See https://github.com/tc39/Array.prototype.includes for a good
example.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: RegExp.escape()

2015-06-30 Thread Mathias Bynens
On Mon, Jun 29, 2015 at 9:04 PM, Benjamin Gruenbaum
 wrote:
> Why? What advantage would it offer?

See Scott’s previous email:

On Mon, Jun 29, 2015 at 8:42 PM, C. Scott Ananian  wrote:
> Imagine trying to ensure that any characters over \u007f were
> escaped.  You don't want an iterable over ~64k characters.
>
> In addition, a RegExp would allow you to concisely specify "hex digits, but
> only at the start of the string" and some of the other oddities we've
> considered.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Parser for ES6?

2015-05-07 Thread Mathias Bynens
On Thu, May 7, 2015 at 5:37 PM, Park, Daejun  wrote:
> Is there any parser for ES6?

https://github.com/shapesecurity/shift-parser-js supports ES6/ES2015
RC2. You can read more about it here:
http://engineering.shapesecurity.com/2015/04/two-phase-parsing.html
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Declaration binding instationationing

2015-04-29 Thread Mathias Bynens
On Wed, Apr 29, 2015 at 7:29 AM, Garrett Smith  wrote:
> There is an English problem here:
>
> Let existingProp be the resulting of calling the [[GetProperty]]
> internal method of go with argument fn.

s/resulting/result/ indeed.

> Can the spec be made easier to read?

FYI, the best way to go about this is to file a spec bug on
https://bugs.ecmascript.org/ under the “editorial issue” component.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Should "const" be favored over "let"?

2015-04-17 Thread Mathias Bynens
On Fri, Apr 17, 2015 at 7:53 AM, Glen Huang  wrote:
> I've completely replaced "var" with "let" in my es 2015 code, but I noticed 
> most variables I introduced never change.

Note that `const` has nothing to do with the value of a variable
changing or not. It can still change:

const foo = {};
foo.bar = 42; // does not throw

`const` indicates the *binding* is constant, i.e. there will be no
reassignments etc.

In my post-ES6 code, I use `const` by default, falling back to `let`
if I explicitly need rebinding. `var` is for legacy code.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Number.prototype not being an instance breaks the web, too

2015-04-13 Thread Mathias Bynens
CCing Piotr.

On Mon, Apr 13, 2015 at 5:37 PM, Mark S. Miller  wrote:
> Hold on. I may have reacted too quickly. If it is only jsfiddle, since this
> is an online service rather than a widely copied library, they could just
> fix it. OTOH, if it really is a mootools issue, then yes, we really do need
> to change the spec. (History: Facebook fixed JSON incompatibility. ES5 fixed
> Object.prototype.toString.call(null) incompat with jQuery.)
>
> Could someone please reply-all to this thread cc'ing Piotr Zalewa and Oskar
> Krawczyk? Thanks.
>
>
>
> On Mon, Apr 13, 2015 at 8:26 AM, Mark S. Miller  wrote:
>>
>> I agree. With Number.prototype joining Array.prototype and
>> Function.prototype on the dark side, we should ask which others should too.
>> When it was only Function.prototype and Array.prototype, principle of least
>> surprise (POLS) had us keep the list as small as possible -- until we had
>> precisely this kind of evidence of incompatibility. From a security pov, the
>> important ones not to revert are those carrying mutable state not locked
>> down by Object.freeze. In ES5 this was only Date.prototype. Of the ES5
>> builtins in ES6, this now includes RegExp.prototype because of
>> RegExp.prototype.compile. (Because of de facto stack magic, this might
>> include Error.prototype as well.) Fotunately, there is still no evidence
>> that we need to corrupt these as well.
>>
>> OTOH, POLS still says that almost everything should not go to the dark
>> side, for consistency with ES6 classes. So the precise line becomes a matter
>> of taste. I propose that the co-corrupted list be
>>
>> Function.prototype
>> Array.prototype
>> Number.prototype
>> Boolean.prototype   // No incompat data. Only POLS
>> String.prototype   // No incompat data. Only POLS
>>
>> since Number, Boolean, and String are the ordinary ES5 wrappers of
>> primitive data values.
>>
>> For builtins that are new with ES6, clearly there's no compat issue. And
>> both security and consistency with ES6 classes argue in general for not
>> corrupting new things. But POLS should put very little weight on the ES5 vs
>> ES6 difference since post ES6 programmers will just see all of this as JS.
>>
>> Given that, I could argue Symbol.prototype either way, since Symbol is
>> kinda another wrapper of a primitive type. But I prefer not. I think we
>> should keep the list to those 5.
>>
>>
>> Allen, process-wise, is this too late for ES6? If there's any way this fix
>> can go in ES6, it should. Otherwise, it should become the first member of
>> ES6 errata.
>>
>>
>> All that said, I do find corrupting only Number.prototype to be plausible.
>> I would not mind if we decided not to spread the corruption even to
>> Boolean.prototype and String.prototype. If we have to do a last minute
>> as-small-as-possible change to the spec, to get it into ES6, this might be
>> best.
>>
>>
>>
>> On Mon, Apr 13, 2015 at 7:47 AM, Andreas Rossberg 
>> wrote:
>>>
>>> V8 just rolled a change into Canary yesterday that implements the new ES6
>>> semantics for Number.prototype (and Boolean.prototype) being ordinary
>>> objects. Unfortunately, that seems to break the web. In particular
>>> http://jsfiddle.net/#run fails to load now.
>>>
>>> What I see happening on that page is a TypeError
>>> "Number.prototype.valueOf is not generic" being thrown in this function
>>> (probably part of moo tools):
>>>
>>> Number.prototype.$family = function(){
>>> return isFinite(this) ? 'number' : 'null';
>>> }.hide();
>>>
>>> after being invoked on Number.prototype.
>>>
>>> AFAICS, that leaves only one option: backing out of this spec change.
>>>
>>> (See https://code.google.com/p/chromium/issues/detail?id=476437 for the
>>> bug.)
>>>
>>> /Andreas
>>>
>
>
> --
> Cheers,
> --MarkM
>
> ___
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Unicode normalization problem

2015-04-02 Thread Mathias Bynens
On Thu, Apr 2, 2015 at 1:39 AM, Andrea Giammarchi
 wrote:
> Jordan the purpose of `Array.from` is to iterate over the string, and the 
> point of iteration instead of splitting is to have automagically codepoints. 
> This, unless I've misunderstood Mathias presentation (might be)
>
> So, here there is a different problem: there are code-points that do not 
> represent real visual representation ...

Those are called grapheme clusters or just “graphemes”, as Boris
mentioned. And here’s how to deal with them:
https://mathiasbynens.be/notes/javascript-unicode#other-grapheme-clusters

“Unicode Standard Annex #29 describes [an algorithm for determining
grapheme cluster
boundaries](http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries).
For a _completely_ accurate solution that works for all Unicode
scripts, implement this algorithm in JavaScript, and then count each
grapheme cluster as a single symbol.”

> or maybe, the real problem, is about broken `Array.from` polyfill?

`Array.from` just uses `String.prototype[Symbol.iterator]` internally,
and that is defined to deal with code points, not grapheme clusters.
Either choice would have confused some developers. IIRC, Perl 6 has
built-in capabilities to deal with grapheme clusters, but until ES
does, this use case must be addressed in user-land.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Unicode normalization problem

2015-04-01 Thread Mathias Bynens
On Wed, Apr 1, 2015 at 10:30 PM, monolithed  wrote:
>> What you’re seeing there is not normalization, but rather the string
>> iterator that automatically accounts for surrogate pairs (treating them as a
>> single unit).
>
> ```js
> var foo = '𝐀';
> var bar = 'Й';
> foo.length; // 2
> Array.from(foo).length // 1
>
> bar.length; // 2
> Array.from(foo).length // 2
> ```
>
> I think this is strange.
> How to safely work with strings?

It depends on your use case. FWIW, I’ve outlined some examples here:
https://mathiasbynens.be/notes/javascript-unicode
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Unicode normalization problem

2015-04-01 Thread Mathias Bynens
On Wed, Apr 1, 2015 at 9:17 PM, Alexander Guinness  wrote:
> My reasoning is based on the following example:
>
> ```js
> var text = '𝐀';
>
> text.length; // 2
>
> Array.from(text).length // 1
> ```

What you’re seeing there is not normalization, but rather the string
iterator that automatically accounts for surrogate pairs (treating
them as a single unit).
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Should `use strict` be a valid strict pragma?

2015-02-05 Thread Mathias Bynens

> On 5 Feb 2015, at 11:04, Leon Arnott  wrote:
> 
> Well, that isn't quite the full story - if it were just a case of pragmas 
> having to use something, anything, that could pass ES3 engines, then there 
> wouldn't necessarily be two otherwise-redundant forms of the syntax - `"use 
> strict"` and `'use strict'`. The reason those exist is to save the author 
> remembering which string delimiter to use - it mirrors the string literal 
> syntax exactly.

If that were the case, then e.g. `'\x75\x73\x65\x20\x73\x74\x72\x69\x63\x74'` 
would trigger strict mode. (It doesn’t, and that’s a good thing.)
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Q: Lonely surrogates and unicode regexps

2015-01-28 Thread Mathias Bynens

> On 28 Jan 2015, at 11:36, Marja Hölttä  wrote:
> 
> TL;DR: /foo.bar/u.test(“foo\uD83Dbar”) == ?
> 
> The ES6 unicode regexp spec is not very clear regarding what should happen if 
> the regexp or the matched string contains lonely surrogates (a lead surrogate 
> without a trail, or a trail without a lead). For example, for the . operator, 
> the relevant parts of the spec speak about characters:
> 
> https://people.mozilla.org/~jorendorff/es6-draft.html#sec-atom
> https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-charactersetmatcher-abstract-operation
> https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-canonicalize-abstract-operation
> 
> E.g.,
> “Let A be the set of all *characters* except LineTerminator.”
> “Let ch be the *character* Input[e].”
> 
> But is a lonely surrogate a character? According to the Unicode standard, 
> it’s not. If it's not, what will ch be if the input string contains a lonely 
> surrogate in the relevant position?
> 
> Q1: Are lonely surrogates allowed in /u regexps?
> 
> E.g., /foo\uD83D/u; (note lonely lead surrogate), should this be allowed? 
> Will it match a lead surrogate inside a surrogate pair?
> 
> Suggestion: we shouldn't allow lonely surrogates in /u regexps.
> 
> If users actually want to match lonely surrogates (e.g., to check for them or 
> remove them) then they can use non-/u regexps.

You’re proposing to define “characters” in terms of Unicode scalar values in 
the case `/u` is used. I could get behind that — it reinforces the idea that 
`/u` is like a strict mode for regular expressions.

Playing devil’s advocate, the problem is that regular expressions and strings 
go hand in hand, and there is no guarantee that JavaScript strings only consist 
of valid code points. Making `.` not match lone surrogates breaks the developer 
expectation that `(.)` matches every “part” of the string. Having to avoid `/u` 
to prevent this seems like a potentially bad thing.

> The regexp syntax treats a lonely surrogate as a normal unicode escape, and 
> the rules say e.g., "The production RegExpUnicodeEscapeSequence :: u 
> Hex4Digits evaluates as follows: Return the character whose code is the SV of 
> Hex4Digits." - it's also unclear what this means if no valid character has 
> this code.
> 
> Q2: If the string contains a lonely surrogate, what should it match? Should 
> it match .? Should it match [^a] ? (Or is it undefined behavior?)
> 
> Test cases:
> /foo.bar/u.test("foo\uD83Dbar") == ?
> /foo.bar/u.test("foo\uDC00bar") == ?
> /foo[^a]bar/u.test("foo\uD83Dbar") == ?
> /foo[^a]bar/u.test("foo\uDC00bar") == ?
> /foo/u.test("bar\uD83Dbarfoo") == ?
> /foo/u.test("bar\uDC00barfoo") == ?
> /foo(.*)bar\1/u.test("foo\uD834bar\uD834\uDC00") == ? // Should the 
> backreference be allowed to match the lead surrogate of a surrogate pair?
> /^(.+)\1$/u.test("\uDC00foobar\uD83D\uDC00foobar\uD83D") == ?? // Should we 
> allow splitting the surrogate pair like this?
> 
> Suggestion: a lonely surrogate should not be a character and it should not 
> match . or [^a] etc. However, a lonely surrogate in the input string 
> shouldn't prevent some other part of the string from matching.
> 
> If a lonely surrogate is treated as a character, the matching rule for . gets 
> complicated and difficult / slow to implement: . should not match individual 
> surrogates inside a surrogate pair, but if it has to match a lonely 
> surrogate, we'll end up needing lookahead and lookbehind logic to implement 
> that behavior.
> 
> For example, the current version of Mathias’s ES6 Unicode regular expression 
> transpiler ( https://mothereff.in/regexpu ) converts /a.b/u into 
> /a(?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uE000-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF])b/
>  and afaics it’s not yet fully consistent wrt lonely surrogates, so, a 
> consistent implementation is going to be more complex than this.

This is indeed an incomplete solution. The lack of lookbehind support in ES 
makes this hard to transpile correctly. Ideas welcome!

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: escaping - in /u RegExp

2015-01-14 Thread Mathias Bynens

> On 13 Jan 2015, at 22:23, Allen Wirfs-Brock  wrote:
> 
> Would those of you who consider yourselves RegExp experts take a look at 
> https://bugs.ecmascript.org/show_bug.cgi?id=3519  Is this a bug? If so, what 
> is the fix?
> 
> This construction for Identity Escape goes back to Norbert's original 
> proposal 
> http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html
>  
> 
> Perhaps we need to add a:
>   ClassAttom[U] :: [+U]  \-
> 
> production or some such to the pattern grammar.

I think it’s a bug — see 
https://codereview.chromium.org/788043005/diff/220001/src/parser.cc#newcode4354 
for the discussion that led to this report.

Your change would allow developers to use an escaped `-` in a character class, 
e.g. `/[a-f\-A-Z]/u`, rather than having to move it to the beginning (i.e. 
`/[-a-fA-Z]/u` or end (`/[a-fA-Z-]/u`) of the character class, as is possible 
today without the `u` flag. That is a good thing IMHO.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Sept 23 2014 Meeting Notes

2014-10-03 Thread Mathias Bynens
Thanks for once again putting this together, Rick!

On Fri, Oct 3, 2014 at 3:22 PM, Rick Waldron  wrote:
> ## 4.4 Number('0b0101'). NaN or not?
> (Erik Arvidsson)
>
> EA: Previous discussion:
> https://github.com/rwaldron/tc39-notes/blob/c61f48cea5f2339a1ec65ca89827c8cff170779b/es6/2014-04/apr-9.md#46-updates-to-parseint
>
> Should `Number` be able to parse the string "0b0" or "0o1"
>
> (Discussion of people (ab)using Number for converting user input and whether
> this should affect things.)
>
> Yes.
>
>  Conclusion/Resolution
>
> - Use spec-internal `ToNumber` via userland `Number` called as a function
> will convert (ie `Number('0b101') === 5)`.
> - Upholding previous consensus on `parseInt`

I don’t understand this reasoning. Making `parseInt` understand the
new syntax was considered a security hazard, but for `Number` we can
somehow get away with it? Any more information on the disconnect?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: RegExps that don't modify global state?

2014-09-17 Thread Mathias Bynens
On Tue, Sep 16, 2014 at 8:16 PM, Domenic Denicola
 wrote:
> I also noticed today that the static `RegExp` properties are not specced, 
> which seems at odds with our new mandate to at least Annex B-ify the 
> required-for-web-compat stuff.

As a general note to people looking to spec some Annex B stuff,
https://javascript.spec.whatwg.org/ is a good place to start. Many
such things are listed there, but still lack a proper spec definition.
Case in point: https://javascript.spec.whatwg.org/#regexp.$n
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: "use strict" VS setTimeout

2014-09-07 Thread Mathias Bynens
On Sun, Sep 7, 2014 at 7:29 PM, Andrea Giammarchi
 wrote:
> This looks like a potential problem when possible passed methods are not
> bound + it looks inconsistent with *"use strict"* expectations.

It’s not just `setTimeout` – other DOM timer methods have the same
behavior. The spec is here, FWIW:
http://www.whatwg.org/specs/web-apps/current-work/multipage/webappapis.html#dom-windowtimers-settimeout
Pretty sure this cannot be changed without breaking the Web.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Questions regarding ES6 Unicode regular expressions

2014-08-26 Thread Mathias Bynens
On 26 Aug 2014, at 19:01, Allen Wirfs-Brock  wrote:

> I've thought about this a bit. I was initially inclined to agree with the 
> idea of extending the existing character classes similar to what Mathias' 
> proposes.  But I now think that is probably not a very good idea and that 
> what is currently spec'ed (essentially that the /u flag doesn't change the 
> meaning of \w, \d, etc.) is the better path. […] It seems to me, that we want 
> programmers to start migrating to full Unicode regular expressions without 
> having to do major logic rewrite of their code.  For example, ideally the 
> above expression could simply be replaced by 
> `parseInt(/\s*(\d+)/u.exec(input)[1])` and everything in the application 
> could continue to work unchanged.

I see your point, but I disagree with the notion that we must absolutely 
maintain backwards compatibility in this case. The fact that the new flag is 
opt-in gives us an opportunity to improve behavior without obsessing about 
back-compat, similar to how the strict mode opt-in is used to make all sorts of 
things better. When [evangelizing 
`/u`](https://mathiasbynens.be/notes/es6-unicode-regex), we can educate 
developers and tell them to not blindly/needlessly add `/u` to their existing 
regular expressions.

> Instead, we should leave the definitions of \d, \w and \s unchanged and plan 
> to adopt the already established convention that `\p{}` is 
> the notation for matching Unicode categories. See 
> http://www.regular-expressions.info/unicode.html 

We could do both: improve `\d` and `\w` now, and add `\p{property}` and 
`\P{property}` later. Anyhow, I’ve filed 
https://bugs.ecmascript.org/show_bug.cgi?id=3157 for reserving `\p{…}`/`\P{…}`.

> I think digesting all the \p{} possibilities is too much to do for ES6, so I 
> suggest that for ES6 that we simply reserve the \p{} and 
> \P{} syntax within /u patterns.  A \p proposal can then be 
> developed for ES7.

Sounds good to me.

> I see one remaining issue:
> In ES5 (and ES6): `/a-z/i`  does not match U+017F (ſ) or U+212A (K) because 
> the ES canonicalization algorithm excludes mapping code points > 127 that 
> toUpperCase to code points <128.
> However, as currently spec'ed, the ES6 canonicalization algorithm for /u 
> RegExps does not include that >127/<128 exclusion.  It maps U+017F to "S" 
> which matches. 
> This is probably a minor variation, from the ES5 behavior, but we should 
> probably be sure it is a desirable and tolerable change as we presumably 
> could also apply the >127/<128 filter to /u canonicalization.

This is a useful feature, and the explicit opt-in makes the small back-compat 
break acceptable IMHO.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Questions regarding ES6 Unicode regular expressions

2014-08-26 Thread Mathias Bynens
On 26 Aug 2014, at 02:16, Norbert Lindenberg 
 wrote:

> […]

Thanks for confirming. Sounds like my “ES6 Unicode regular expressions to ES5” 
transpiler is working correctly, then: https://github.com/mathiasbynens/regexpu 
Demo: http://mothereff.in/regexpu (Bug reports welcome.)

> On Aug 25, 2014, at 1:59 , Mathias Bynens  wrote:
> 
>> Norbert’s original proposal for the `u` flag 
>> (http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/#RegExp)
>>  mentioned the following:
>> 
>>> Possibly the definition of the character classes `\d\D\w\W\b\B` is extended 
>>> to their Unicode extensions, such as all characters in the Unicode category 
>>> “Number, decimal” for `\d`, as proposed by Steven Levithan. Whether this 
>>> can be done under the same flag or requires a different one still needs 
>>> discussion.
>> 
>> Has this been discussed any further? (I couldn’t find any mention of it in 
>> the meeting notes repository.) Should I file a bug?
> 
> The “needs discussion” part actually came from the March 2012 TC39 meeting:
> https://mail.mozilla.org/pipermail/es-discuss/2012-March/021919.html
> We subsequently had some discussions about how to go about such a discussion, 
> which petered out because no regular expression expert was available to work 
> with.
> 
> I suspect this issue needs a proposal rather than a bug.

https://github.com/mathiasbynens/es6-unicode-character-class-escape-sets#readme 
I’m fairly confident in the proposals for `\d` and `\w`, but `\b` needs work.

@Steven Levithan, would you mind lending your expertise on this? This is your 
chance to make `/na\b/u.test('naïve')` return `false` :)
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Questions regarding ES6 Unicode regular expressions

2014-08-25 Thread Mathias Bynens
Norbert’s original proposal for the `u` flag 
(http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/#RegExp)
 mentioned the following:

> Possibly the definition of the character classes `\d\D\w\W\b\B` is extended 
> to their Unicode extensions, such as all characters in the Unicode category 
> “Number, decimal” for `\d`, as proposed by Steven Levithan. Whether this can 
> be done under the same flag or requires a different one still needs 
> discussion.

Has this been discussed any further? (I couldn’t find any mention of it in the 
meeting notes repository.) Should I file a bug?

Norbert also suggested replacing ‘characters’ with ‘code points’ in sections 
like 
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-characterclassescape 
and 
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-runtime-semantics-charactersetmatcher-abstract-operation
 when the `u` flag is set. It seems the intent was to make e.g. `/\d/u` match 
`/[0-9]/`, and `/\D/u` match all Unicode code points except `[0-9]`. This is 
different from `/\D/` which only matches BMP code points.

It seems like this change has not propagated to the spec draft, though. Is this 
correct, and if so, what’s the reason for that?

The same goes for `/[^a]/u` – should this match all Unicode code points except 
`a` or should it only match BMP code points?

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-08 Thread Mathias Bynens
Claude Pache proposed the following spec patch: 
https://bugs.ecmascript.org/show_bug.cgi?id=2792#c11
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-06 Thread Mathias Bynens
On 7 Aug 2014, at 02:46, Bill Frantz  wrote:

> On Tue, Aug 5, 2014 at 7:56 AM, Mathias Bynens  wrote:
> 
> ...
>> In section 11.8.3 (Numeric Literals), the definition for
>> `DecimalIntegerLiteral` should somehow be tweaked to match that of
>> `DecimalDigits`, with the exception that if the first digit is `0` and all
>> other digits are octal digits (0-7) it must be treated as a legacy octal
>> literal.
> 
> So this horrible footgun, changing the value of a constant changes its radix, 
> is only lurking in sloppy mode.

It affects strict mode code too in existing implementations: there you go from 
not throwing on e.g. `0123456789` (which is not an octal literal because of the 
`8` and `9`) to suddenly throwing a syntax error when the value changes to `0` 
followed by only octal digits (as then it is an octal literal). See my previous 
posts in this thread.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: July 29 2014 TC39 Meeting Notes

2014-08-06 Thread Mathias Bynens
On 5 Aug 2014, at 18:30, Rick Waldron  wrote:

> - Spread now works on strings `var codeUnits = [..."this is a string"]`

The code example implies it results in an array of strings, one item for each 
UCS-2/UTF-16 code unit. Shouldn’t this be symbols matching whole Unicode code 
points (matching `StringIterator`) instead, i.e. no separate items for each 
surrogate half?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-05 Thread Mathias Bynens
On 5 Aug 2014, at 17:19, Mark S. Miller  wrote:

> On Tue, Aug 5, 2014 at 8:17 AM, Mathias Bynens  wrote:
> 
>> The literals under discussion (e.g. `08` and `09`) are not octal literals.
> 
> Strict mode should reject these even more vehemently! (Allen, can we have an 
> early vehement error?)

Now I’m confused again. That contradicts what Allen said earlier in this thread:

On 5 Aug 2014, at 16:20, Allen Wirfs-Brock  wrote:

> Regarding, leading 0 constants in strict mode. The long term plan is to 
> eventually make them legal decimal constants.

I stand by my earlier suggestion:

1. Accept decimal integer literals with leading `0`, even in strict mode.
2. Interpret the value of such literals as octal in case they consist of octal 
digits only. (Note: this is already in Annex B – see 
`LegacyOctalIntegerLiteral`.)

Strict mode would accept `08` as it’s a zero-prefixed decimal literal but not 
`07` since that’s an octal literal.

This matches what all browsers already do (except Firefox), and fulfills the 
long-term plan Allen was talking about.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-05 Thread Mathias Bynens
On 5 Aug 2014, at 17:05, Mark S. Miller  wrote:

> Strict mode should not accept octal literals.

The literals under discussion (e.g. `08` and `09`) are not octal literals.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-05 Thread Mathias Bynens
On 5 Aug 2014, at 17:05, Mark S. Miller  wrote:

> Because of compatibility constraints, JS history can generally proceed only 
> in an additive manner, which means a steady degradation of quality along the 
> "simplicity" dimension. An opt-in mode switch is the only way to escape that 
> dynamic. Strict mode is the only one we've got, and the only one we're likely 
> to have in the foreseeable future. Strict mode should not accept octal 
> literals. Regarding sloppy mode, it continues to exist only for the sake of 
> legacy compat, so adding more crap to it for better web compat is the right 
> tradeoff -- as long as the crap stays quarantined within sloppy mode.

My point was that the crap under discussion is already available in strict mode 
in existing implementations (except for the one in Firefox/SpiderMonkey). It’s 
just not demonstrated yet if The Web depends on this functionality in strict 
mode too. (It not working in Firefox is an indication that it may not, sure.)
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-05 Thread Mathias Bynens
On 5 Aug 2014, at 16:56, Alex Kocharin  wrote:

> What about allowing one-digit numbers with leading zeroes? "07" equals to 7 
> no matter whether it parsed as an octal or as a decimal. Thus, no harm there.

That wouldn’t solve the problem. Consider e.g. `01234567` (i.e. `342391`) vs. 
`01234568` (which must equal `1234568` for compatibility with existing code).
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-05 Thread Mathias Bynens

On 5 Aug 2014, at 16:20, Allen Wirfs-Brock  wrote:

> We're only talking about Annex B, non-strict.  Right?

All engines are going to implement this anyway, so why make it Annex B only? I 
wouldn’t restrict it to non-strict mode either, as this decision seems to be 
purely based on the Firefox/SpiderMonkey bug that was discussed earlier.

> It would be great is somebody wanted to proposal the actual annex B language 
> that is need to correctly describe the web reality semantics.

In section 11.8.3 (Numeric Literals), the definition for 
`DecimalIntegerLiteral` should somehow be tweaked to match that of 
`DecimalDigits`, with the exception that if the first digit is `0` and all 
other digits are octal digits (0-7) it must be treated as a legacy octal 
literal.

> Regarding, leading 0 constants in strict mode. The long term plan is to 
> eventually make them legal decimal constants. The only reason not to do that 
> now is because it might screw up people who are migrating non-strict web 
> reality code containing octal constants into strict mode.

Firefox is the only browser that throws on `(function() { 'use strict'; return 
08; }())` and the only reason it does that is because of a bug (see my earlier 
email). In general, strict mode does not matter here.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-05 Thread Mathias Bynens
On 4 Aug 2014, at 18:55, Jason Orendorff  wrote:

> We're talking about something different here, legacy *decimal* integer
> literals starting with 0 and containing 8 or 9. As far as I know, no
> version of ES has ever permitted this kind of nonsense, but supporting
> it is apparently required for Web compatibility. (One more great
> reason to write all your code under "use strict".)

I don’t understand this comment. What does strict mode have to do with this? 
Note that `08` and `09` are not octal literals, since `8` and `9` are not 
`OctalDigit`s.

In non-strict mode, 
https://people.mozilla.org/~jorendorff/es6-draft.html#sec-additional-syntax-numeric-literals
 applies, but even then `08` and `09` should throw (as per the current spec) 
for the same reason.

Strict mode doesn’t make a difference as per the current spec when parsing this 
program:

```js
08
```

It does in Firefox/Spidermonkey, but that seems like a bug. Test this in the 
most recent nightly:

```js
(function() { 'use strict'; return 08; }())
```

This currently throws:

> SyntaxError: octal literals and octal escape sequences are deprecated

…which is a misleading message. It should instead say something like:

> SyntaxError: numbers starting with 0 followed by a digit are octals and can't 
> contain 8

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-04 Thread Mathias Bynens

On 5 Aug 2014, at 08:40, Mathias Bynens  wrote:

> On 5 Aug 2014, at 02:41, Allen Wirfs-Brock  wrote:
> 
>> there is already a bug open on this 
>> https://bugs.ecmascript.org/show_bug.cgi?id=2792 
> 
> Older bug report: https://bugs.ecmascript.org/show_bug.cgi?id=1553
> 
> We previously discussed this up at the April TC39 meeting: 
> https://github.com/rwaldron/tc39-notes/blob/master/es6/2014-04/apr-9.md#change-escapesequence-0-lookahead--decimaldigit-to-match-reality

Never mind – I was confused. This topic is about numeric literals rather than 
string literals (although the underlying issue is more or less the same). Carry 
on!
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Early error on '0' followed by '8' or '9' in numeric literals does not seem to be web-compatible

2014-08-04 Thread Mathias Bynens
On 5 Aug 2014, at 02:41, Allen Wirfs-Brock  wrote:

> there is already a bug open on this 
> https://bugs.ecmascript.org/show_bug.cgi?id=2792 

Older bug report: https://bugs.ecmascript.org/show_bug.cgi?id=1553

We previously discussed this up at the April TC39 meeting: 
https://github.com/rwaldron/tc39-notes/blob/master/es6/2014-04/apr-9.md#change-escapesequence-0-lookahead--decimaldigit-to-match-reality
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Strawman proposal] StrictMath variant of Math

2014-08-01 Thread Mathias Bynens
On 1 Aug 2014, at 09:25, Carl Shapiro  wrote:

> Thanks for the suggestion.
> 
> As Ray pointed out, the Math package in Java still has its accuracy 
> requirements specified and so it is not analogous to the current status of 
> Math package in ES6.  Also, the StrictMath package and the strictfp class 
> qualifier came about in Java back when the x87 was the predominant FPU.  
> Because of the idiosyncrasies of the x87 one could not compute bit-identical 
> floating point results without additional overhead.  Nevertheless, the 
> accuracy requirements and conformance was still achieved with satisfactory 
> performance.  Much of the history is still available on-line
> 
> http://math.nist.gov/javanumerics/reports/jgfnwg-minutes-6-00.html
> http://math.nist.gov/javanumerics/reports/jgfnwg-02.html
> 
> It is unclear how much of these "strict" modes is still relevant given that 
> SSE2 is now the predominant FPU.  The strict modes were always effectively a 
> non-issue on other architectures.
> 
> Much of the pressure to relax the accuracy of the special functions seems to 
> be coming from their use in various benchmark suites and the tireless efforts 
> of the compiler engineers to squeeze out additional performance gains.  
> Requiring bounds on the accuracy of the special functions has the additional 
> benefit of putting all the browsers on equal ground so nobody has to have 
> their product suffer the indignity of a benchmark loss because they try to do 
> the right thing in the name of numerical accuracy.

+1

Introducing a new global `Math` variant wouldn’t solve the interoperability 
issues. IMHO, the accuracy of the existing `Math` methods and properties should 
be codified in the spec instead.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: 5 June 2014 TC39 Meeting Notes

2014-06-14 Thread Mathias Bynens
On 13 Jun 2014, at 18:15, Domenic Denicola  wrote:

> IMO it would be a good universe where `` had the following things 
> `

Re: Idea for ECMAScript 7: Number.compare(a, b)

2014-06-06 Thread Mathias Bynens
On 6 Jun 2014, at 01:15, Axel Rauschmayer  wrote:

> It’d be nice to have a built-in way for comparing numbers, e.g. when sorting 
> arrays.
> 
> ```js
> // Compact ECMAScript 6 solution
> // Risk: number overflow
> [1, 5, 3, 12, 2].sort((a,b) => a-b)
> 
> // Proposed new function:
> [1, 5, 3, 12, 2].sort(Number.compare)
> ```

That sorts in ascending order. What if you need to sort in descending order? 
Would there need to be a built-in function for that too?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Array.prototype.last()

2014-05-13 Thread Mathias Bynens
Previous discussion on this topic: 
http://esdiscuss.org/topic/array-prototype-last

We should look at how existing utility libraries handle this behavior and base 
any proposals on that IMHO. Underscore and Lo-Dash have 
[`_.first`](http://lodash.com/docs#first) and 
[`_.last`](http://lodash.com/docs#last), which both take an optional `callback` 
parameter, in which case all the first/last `n` elements for which `callback` 
returns a truthy value are returned. This seems like a sensible thing to add to 
the proposal.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: ToPropertyDescriptor, [[HasProperty]], & [[HasOwnProperty]]

2014-05-08 Thread Mathias Bynens
On Fri, May 9, 2014 at 1:44 AM, John-David Dalton
 wrote:
> Should I create a spec bug for tracking this?

Please do.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Native base64 utility methods

2014-05-08 Thread Mathias Bynens
On 5 May 2014, at 20:22, Andrea Giammarchi  wrote:

> @mathias didn't mean to change atob and btoa rather add two extra methods 
> such encode/decode for strings (could land without problems in the 
> String.prototype, IMO) with "less silly names" whatever definition of silly 
> we have ^_^

Agreed. Moving `TextEncoder`/`TextDecoder` to ES would be nice (but it requires 
`ArrayBuffer` / `Uint8Array`). http://encoding.spec.whatwg.org/#api
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Native base64 utility methods

2014-05-05 Thread Mathias Bynens
On 5 May 2014, at 10:48, Claude Pache  wrote:

> In my view, if `atob` and `btoa` were to enter in ES, it should be in 
> Appendix B (the deprecated legacy features of web browsers), where it would 
> be in good company with the other utility that does an implicit confusion 
> between binary and ISO-8859-1-encoded strings, namely `escape/unescape`.

How do `atob` and `btoa` do any sort of implicit conversion between binary and 
any other encoding? Their behavior is well-defined, and they’re explicitly 
limited to extended ASCII.

I don’t think this is Annex B material regardless — this is not a legacy 
feature.

> We should be able to define a better designed function (and with a less silly 
> name, while we're at it).

That would kind of defeat the purpose IMHO. We’re stuck with `atob`/`btoa` 
anyway in browsers — adding yet another name for the same thing does not really 
help.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Native base64 utility methods

2014-05-05 Thread Mathias Bynens
On 5 May 2014, at 00:00, Andrea Giammarchi  wrote:

> as generic global utility it would be also nice to make it compatible with 
> all strings.

For backwards compatibility reasons, `atob`/`btoa` should probably continue to 
work in exactly the same way they work now (i.e as per 
http://whatwg.org/html/webappapis.html#atob). Otherwise, any existing code that 
uses `atob`/`btoa` before UTF-8-decoding or after UTF-8-encoding, including 
your snippet, would suddenly break.

Like you demonstrated, it’s easy enough to encode or decode the input using 
UTF-8 or any other character encoding before passing to `atob`/`btoa`. (E.g. 
http://mothereff.in/base64)

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Native base64 utility methods

2014-05-04 Thread Mathias Bynens
To convert from base64 to ASCII and vice versa, browsers have had global `atob` 
and `btoa` functions for a while now. At the moment, these are defined in the 
HTML standard: http://whatwg.org/html/webappapis.html#atob

However, such utility methods are not only useful in browsers. How about adding 
these as global functions to ECMAScript so that they’re natively available in 
all JavaScript engines, not just in browser environments?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: RegExp.escape

2014-03-21 Thread Mathias Bynens
On 21 Mar 2014, at 16:38, C. Scott Ananian  wrote:

> ```js
> function replaceTitle(title, str) {
>  return str.replace(new RegExp(title), "...");
> }
> ```
> 
> There ought to be a standard simple way of writing this correctly.

I’ve used something like this in the past:

RegExp.escape = function(text) {
  return text.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, '\\$&');
};

It escapes some characters that do not strictly need escaping to avoid bugs in 
ancient JavaScript engines. A standardized version could be even simpler, and 
would indeed be very welcome IMHO.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Array.prototype.contains

2014-03-05 Thread Mathias Bynens
On 5 Mar 2014, at 17:19, Domenic Denicola  wrote:

> Personally I think the more useful model to follow than 
> `String.prototype.contains` is `Set.prototype.has`.

But then DOM4 `DOMStringList` would still have its own `contains` _and_ the 
`has` it inherits from `Array.prototype`. That seems confusing, no?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Another switch

2014-02-20 Thread Mathias Bynens
On 20 Feb 2014, at 21:20, Eric Elliott  wrote:

> Object literals are already a great alternative to switch in JS:
> 
> var cases = {
>   val1:  function () {},
>   val2: function () {}
> };
> 
> cases[val]();

In that case, you’d need a `hasOwnProperty` check to make sure you’re not 
trying to call `__proto__` or `toString`, etc. See 
 for a more complete example.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-15 Thread Mathias Bynens
On 14 Feb 2014, at 19:59, Allen Wirfs-Brock  wrote:

> It's a really high bar to get over that closed gate.  Unless the exclusion of 
> a feature was a mistake […] I don't think we should be talking about adding 
> it to ES6.

It does feel like a mistake to me to introduce `String.prototype.codePointAt`, 
but no similar function that returns the symbol instead.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-14 Thread Mathias Bynens
On 14 Feb 2014, at 11:14, C. Scott Ananian  wrote:

> Note that `Array.from(str)` and `str[Symbol.iterator]` overlap
> significantly.  In particular, it's somewhat awkward to iterate over
> code points using `String#symbolAt`; it's much easier to use
> `substr()` and then use the StringIterator.

`String#at` is not meant for iterating over code points – that’s what the 
`StringIterator` is for.

`String#at` is exactly like `String#codePointAt` except it returns strings 
(containing the symbol) instead of numbers (representing the code point value). 
It can be used to get the symbol at a given code unit position in a string 
(similar to how `String#codePointAt` can be used to get the code point at a 
given code unit position in a string).
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-14 Thread Mathias Bynens
On 14 Feb 2014, at 11:11, Domenic Denicola  wrote:

> This was the method that was only useful if you pass `0` to it?

I’ll just avoid the infinite loop here by pointing to earlier posts in this 
thread where this was discussed before: 

 and 
.

This method is just as useful as `String.prototype.codePointAt`. If that method 
is included, so should `String.prototype.at`. If `String.prototype.at` is found 
not to be useful, `String.prototype.codePointAt` should be removed too.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2014-02-14 Thread Mathias Bynens
Allen mentioned that `String#at` might not make it to ES6 because nobody in 
TC39 is championing it. I’ve now asked Rick if he would be the champion for 
this, and he agreed. (Thanks again!)

Looking over the ‘TC39 progress’ document at 
,
 it seems most of the work is already taken care of: the use case was discussed 
in this thread, the proposal has a complete spec text, and there’s an example 
implementation/polyfill with unit tests. See .

Is there anything else I can do to help get this included as a non-TC39-member?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: comment overflow

2014-02-10 Thread Mathias Bynens
On 10 Feb 2014, at 10:30, Michael Dyck  wrote:

> On a more meta level, do the process plans for ES7 include any new
> mechanisms for:
> (a) submitting comments on spec drafts, or
> (b) reducing the number of errors in spec drafts to begin with?

If only the spec were maintained in a plain text-based format (like Markdown or 
even HTML) it would be easy to host its repository on, say, GitHub, which would 
enable commenting on inline diffs (= perfect for pointing out small typos 
etc.). That way, it would also be possible to link to specific lines in a 
specific revision of the spec. Those things would already avoid a lot of 
overhead currently present when filing bugs IMHO. 

This brings us back to the good old let’s-stop-using-a-Word-document discussion.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Ecmascript.org

2014-01-31 Thread Mathias Bynens

> I was wondering who was in charge of the ecmascript.org web site.

$ whois ecmascript.org
[snip]
Registrant Organization:Mozilla Corporation
Registrant Street: 650 Castro St Ste 300
Registrant City:Mountain View
Registrant State/Province:CA
Registrant Postal Code:94041
Registrant Country:US
Registrant Phone:+1.6509030800
Registrant Phone Ext:
Registrant Fax:
Registrant Fax Ext:
Registrant Email:hostmas...@mozilla.com
[snip]

Mozillians can probably tell you more.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.contains(regex)`

2013-12-23 Thread Mathias Bynens
On 18 Dec 2013, at 23:02, Benjamin (Inglor) Gruenbaum  wrote:

> If anything, I'd expect all of them to throw when passed multiple arguments 
> for forward compatibility. It might be useful to check multiple values in 
> contains/endsWith/startsWith or constrain it in some way. 

The reason `String.prototype.{starts,ends}With` throw when passed a regular 
expression is forward compatibility:

> Note 2. Throwing an exception if the first argument is a RegExp is specified 
> in order to allow future editions to define extends that allow such argument 
> values.

It seems that `contains` was forgotten about when 
https://bugs.ecmascript.org/show_bug.cgi?id=498#c3 was fixed, so I’ve filed 
https://bugs.ecmascript.org/show_bug.cgi?id=2407 asking to make 
`String.prototype.contains(regex)` throw as well.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


`String.prototype.contains(regex)`

2013-12-18 Thread Mathias Bynens
Both `String.prototype.startsWith` and `String.prototype.endsWith` throw a 
`TypeError` if the first argument is a RegExp:

> Throwing an exception if the first argument is a RegExp is specified in order 
> to allow future editions to define extends that allow such argument values.

However, this is not the case for `String.prototype.contains`, even though it’s 
a very similar method. As per the latest ES6 draft, 
`String.prototype.contains(regex)` behaves like 
`String.prototype.contains(String(regex))`. This seems inconsistent. What’s the 
reason for this inconsistency?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Function.prototype.apply() & Function.prototype.call() with `undefined` or `null` as `thisArg`

2013-12-10 Thread Mathias Bynens
Turns out this is a bug in the spec: 
https://bugs.ecmascript.org/show_bug.cgi?id=2370
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Function.prototype.apply() & Function.prototype.call() with `undefined` or `null` as `thisArg`

2013-12-10 Thread Mathias Bynens
>From  and 
>:

> The `thisArg` value is passed without modification as the `this` value. This 
> is a change from Edition 3, where a `undefined` or `null` `thisArg` is 
> replaced with the global object and `ToObject` is applied to all other values 
> and that result is passed as the `this` value.

It seems like modern engines still have the ES3 behavior:

function foo() {
  console.log(this);
  return this;
};
foo.call(undefined) === undefined; // `false`, expected `true`

I’ve tested this in Spidermonkey/Firefox, Carakan/PrestOpera, JSC/Safari, and 
v8/Chrome. They all show FAIL in this test case:

data:text/html,function foo() { console.log(this); return this; }; 
document.write(foo.call(undefined) === undefined %3F 'PASS' %3A 
'FAIL');

Is this…

1. a wilful violation of the ES5 spec for back-compat reasons, or…
2. is it just an oversight that this never got implemented, or…
3. am I misreading the spec?

If 1 is the case, the ES6 spec should match reality by reverting the change 
introduced in ES5.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Using Unicode code point escape sequences in regular expressions without the `u` flag

2013-11-22 Thread Mathias Bynens
On 22 Nov 2013, at 11:20, Allen Wirfs-Brock  wrote:

> Did you check the ES6 draft grammar{1]?  The answer to that should be fairly 
> obvious there and if it isn't it would be good to know so we can try to make 
> it clearer in the spec.
> 
> [1]: http://people.mozilla.org/~jorendorff/es6-draft.html#sec-patterns 

It’s pretty clear that (1) is equivalent to (2). I guess (3) is equivalent to 
(1) and (2) because of the following:

RegExpUnicodeEscapeSequence[U] ::
[+U] LeadSurrogate \u TrailSurrogate

…but I was looking for confirmation.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Using Unicode code point escape sequences in regular expressions without the `u` flag

2013-11-22 Thread Mathias Bynens
One more related question: are these three regular expression literals 
equivalent?

1. `/[💩-💫]/u`: raw astral symbols
2. `/[\u{1F4A9}-\u{1F4AB}]/u`: astral symbols represented using Unicode code 
point escape sequences
3. `/[\uD83D\uDCA9-\uD83D\uDCAB]/u`: astral symbols represented as a surrogate 
pair

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Using Unicode code point escape sequences in regular expressions without the `u` flag

2013-11-21 Thread Mathias Bynens
On 21 Nov 2013, at 15:07, Erik Arvidsson  wrote:

> That would unfortunately not be backwards compatible since /\u{123}/ is a 
> valid RegExp in ES5.1.

Ah, doh! I was thinking in terms of strings: modern engines throw errors for 
things like `'\u{123}'`. Thanks for the explanation.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Using Unicode code point escape sequences in regular expressions without the `u` flag

2013-11-21 Thread Mathias Bynens
If I’m reading the latest draft correctly, `RegExpUnicodeEscapeSequence`s 
aren’t allowed in regular expressions without the `u` flag. Why is that?

AFAICT, the only situations that require looking at code points rather than 
UCS-2/UTF-16 code units in order to support full Unicode are:

* the regex is case-insensitive;
* the regex contains a character class;
* the regex uses `.`;
* the regex uses a quantifier.

I’d suggest allowing `\u{xx}`-style escape sequences everywhere, and simply 
changing the behavior of the resulting regular expression depending on the `u` 
flag. There’s no good reason to disallow e.g. `/\u{20}/` or even `/\u{1F4A9}/`.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: [Json] BOMs

2013-11-21 Thread Mathias Bynens
On 21 Nov 2013, at 09:41, Bjoern Hoehrmann  wrote:

> Is there any chance, by the way, to change `JSON.stringify` so it does
> not output strings that cannot be encoded using UTF-8? Specifically,
> 
>  JSON.stringify(JSON.parse("\"\uD800\""))
> 
> would need to escape the surrogate instead of emitting it literally.

Previous discussion: 
http://esdiscuss.org/topic/code-points-vs-unicode-scalar-values#content-14

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Working with grapheme clusters

2013-10-27 Thread Mathias Bynens
On 26 Oct 2013, at 14:39, Bjoern Hoehrmann  wrote:

> * Norbert Lindenberg wrote:
>> On Oct 25, 2013, at 18:35 , Jason Orendorff  
>> wrote:
>> 
>>> UTF-16 is designed so that you can search based on code units
>>> alone, without computing boundaries. RegExp searches fall in this
>>> category.
>> 
>> Not if the RegExp is case insensitive, or uses a character class, or ".", or 
>> a
>> quantifier - these all require looking at code points rather than UTF-16 code
>> units in order to support the full Unicode character set.
> 
> If you have a regular expression over an alphabet like "Unicode scalar
> values" it is easy to turn it into an equivalent regular expression over
> an alphabet like "UTF-16 code units".

FWIW, [Regenerate](http://mths.be/regenerate) is a JavaScript library that can 
be used for this. A few examples from 
:

> Here’s a regular expression is created that matches any Unicode scalar value:
> 
> >> regenerate()
>  .addRange(0x0, 0x10) // all Unicode code points
>  .removeRange(0xD800, 0xDBFF) // minus high surrogates
>  .removeRange(0xDC00, 0xDFFF) // minus low surrogates
>  .toRegExp()
> /[\0-\uD7FF\uE000-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]/


Similarly, to polyfill `.` in a Unicode-enabled ES6 regex:

> When the `u` flag is set, `.` is equivalent to the following 
> backwards-compatible regular expression pattern:
> 
> >> regenerate()
>  .addRange(0x0, 0x10) // all Unicode code points
>  .remove(  // minus `LineTerminator`s 
> (http://ecma-international.org/ecma-262/5.1/#sec-7.3):
>0x000A, // Line Feed 
>0x000D, // Carriage Return 
>0x2028, // Line Separator 
>0x2029  // Paragraph Separator 
>  )
>  .toString();
> 
> '[\0-\x09\x0B\x0C\x0E-\u2027\u202A-\uD7FF\uDC00-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF]'
> 
> >> 
> /foo(?:[\0-\x09\x0B\x0C\x0E-\u2027\u202A-\uD7FF\uDC00-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF])bar/u.test('foo💩bar')
> true
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Working with grapheme clusters

2013-10-24 Thread Mathias Bynens
On 24 Oct 2013, at 16:22, Anne van Kesteren  wrote:

> On Thu, Oct 24, 2013 at 3:02 PM, Claude Pache  wrote:
>> As a side note, I ask whether the `String.prototype.symbolAt 
>> `/`String.prototype.at` as proposed in a recent thread,
>> and the `String.prototype[@@iterator]` as currently specified, are really 
>> what people need,
>> or if they would mistakenly use them with the intended meaning of 
>> `String.prototype.graphemeAt`
>> and `String.prototype.graphemes` as discussed in the present message?
>> 
>> Thoughts?
> 
> If we want to make it easier for developers to work with text, we should 
> offer them functionality at the grapheme cluster level and not distract 
> everyone with code units and code points. Thanks for making a proposal!

I’d welcome grapheme helper methods (even though the ES6 string methods already 
make it easier to deal with grapheme clusters than ever before), but I strongly 
disagree the string iterator should be changed. I think the use case of 
iterating over code points is much more common.

Imagine you’re writing a JavaScript library that escapes a given string as an 
HTML character reference, or as a CSS identifier, or anything else. In those 
cases, you don’t care about grapheme clusters, you care about code points, 
cause those are the units you end up escaping individually.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Working with grapheme clusters

2013-10-24 Thread Mathias Bynens
On 24 Oct 2013, at 16:02, Claude Pache  wrote:

> Therefore, I propose the following basic operations to operate on grapheme 
> clusters:

Out of curiosity, is there any programming language that operates on grapheme 
clusters (rather than code points) by default? FWIW, code point iteration is 
what I’d expect in any language.

>   text.graphemeAt(0) // get the first grapheme of the text
> 
>   // shorten a text to its first hundred graphemes
>   var shortenText = ''
>   let numGraphemes = 0
>   for (let grapheme of text) {
>   numGraphemes += 1
>   if (numGraphemes > 100) {
>   shortenText += '…'
>   break
>   }
>   shortenText += grapheme
>   }

So, you would want to change the string iterator’s behavior too?

> As a side note, I ask whether the `String.prototype.symbolAt 
> `/`String.prototype.at` as proposed in a recent thread, and the 
> `String.prototype[@@iterator]` as currently specified, are really what people 
> need, or if they would mistakenly use them with the intended meaning of 
> `String.prototype.graphemeAt` and `String.prototype.graphemes` as discussed 
> in the present message?

I don’t think this would be an issue. The new `String` methods and the iterator 
are well-defined and documented in terms of *code points*.

IMHO combining marks are easy enough to match and special-case in your code if 
that’s what you need. You could use a regular expression to iterate over all 
grapheme clusters in the string:

// Based on the example on 
http://mathiasbynens.be/notes/javascript-unicode#accounting-for-other-combining-marks
var regexGraphemeCluster = 
/([\0-\u02FF\u0370-\u1DBF\u1E00-\u20CF\u2100-\uD7FF\uDC00-\uFE1F\uFE30-\u]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF])([\u0300-\u036F\u
1DC0-\u1DFF\u20D0-\u20FF\uFE20-\uFE2F]*)/g;

var zalgo = 
'Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞';

zalgo.match(regexGraphemeCluster);
[
  "Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍",
  "A̴̵̜̰͔ͫ͗͢",
  "L̠ͨͧͩ͘",
  "G̴̻͈͍͔̹̑͗̎̅͛́",
  "Ǫ̵̹̻̝̳͂̌̌͘",
  "!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞"
]
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Making the identifier identification strawman less restrictive

2013-10-22 Thread Mathias Bynens
On 14 Oct 2013, at 23:21, Erik Arvidsson  wrote:

> I'm concerned about the latest version of this on the wiki. The
> edition parameter requires that we ship 2 tables today. This seems
> like it might change to 3 in ES7 and n in ES(n+4). I think the only
> reasonable requirement is that it matches what the engine actually
> uses. For tools it seems better for them to include this table. I
> don't want all runtimes to have to pay for something that only tools
> need.

This strawman is only useful for tools. If tools need to implement this 
themselves, this basically means the strawman is rejected, right?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-20 Thread Mathias Bynens
On 19 Oct 2013, at 12:54, Domenic Denicola  wrote:

> My proposed cowpaths:
> 
> ```js
> Object.mixin(String.prototype, {
>  realCharacterAt(i) {
>let index = 0;
>for (var c of this) {
>  if (index++ === i) {
>return c;
>  }
>}
>  }
>  get realLength() {
>let counter = 0;
>for (var c of this) {
>  ++counter;
>}
>return counter;
>  }
> });
> ```

Good stuff!

To account for [lookalike symbols due to combining marks] [1], just add a call 
to `String.prototype.normalize`:

Object.mixin(String.prototype, {
  get realLength() {
let counter = 0;
for (var c of this.normalize('NFC')) {
  ++counter;
}
return counter;
  }
});

assert('ma\xF1ana'.realLength == 'man\u0303ana'.realLength);

[1]: http://mathiasbynens.be/notes/javascript-unicode#accounting-for-lookalikes

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Mathias Bynens
On 19 Oct 2013, at 00:53, Domenic Denicola  wrote:

> On 19 Oct 2013, at 01:12, "Mathias Bynens"  wrote:
>> `String.prototype.codePointAt` or `String.prototype.at` come in handy in 
>> case you only need to get the first code point or symbol in a string, for 
>> example.
> 
> Are they useful for anything else, though? For example, if I wanted to get 
> the second symbol in a string, how would I do that?

Yeah, that’s the problem with these methods. Additional user code is required 
to handle non-zero `position` arguments, unless you’re sure the `position` is 
actually the start of a code point (and not in the middle of a surrogate pair). 
I guess there are situations where that’s a certainty, for example when you’re 
dealing with a string in which the user selected some text.

This brings us back to the earlier discussion of whether something like 
`String.prototype.codePoints` should be added: 
http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string It 
could be a getter or a generator… Or does `for…of` iteration handle this use 
case adequately?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-19 Thread Mathias Bynens
On 19 Oct 2013, at 12:15, Bjoern Hoehrmann  wrote:

> Certainly not common enough to warrant a two-character method on the
> native string type. Odds are people will use it incorrectly in an
> attempt to make their code look concise […]

Are you saying that changing the name to something that is longer than `at` 
would solve this problem?

> […] not understanding that it'll retrieve a substring of .length 1 or 2,
> possibly consisting of a lone surrogate, based on a 16 bit index that
> might fall in the middle of a character; the problematic cases are
> fairly rare, so it's hard to notice improper use of `.at` in automated
> testing or in code review.

People are using `String.prototype.charAt()` incorrectly too, expecting it to 
return whole symbols instead of surrogate halves wherever possible. How would 
_not_ introducing a method that avoids this problem help?
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
On 18 Oct 2013, at 17:51, Joshua Bell  wrote:

> Given that you can only use the proposed String.prototype.at() properly for 
> indexes > 0 if you know the index of a non-BMP character or lead surrogate by 
> some other means, or if you will test the return value for a trailing 
> surrogate, is it really an advantage over using codePointAt / fromCodePoint?
> 
> The name "at" is so tempting I'm imagining naive scripts of the form for (i = 
> 0; i < s.length; ++i) { r += s.at(i); } which will work fine until they get a 
> non-BMP input at which point they're suddenly duplicating the trailing 
> surrogates.
> 
> Pushing people towards for-of iteration and even Allen's Array.from( 
> '𝌆𝌆𝌆'))[1] seems safer; users who need more subtle things have have 
> codePointAt / fromCodePoint available and hopefully the knowledge to use them.

Just because new features can be used incorrectly doesn’t mean the feature 
isn’t useful. `for…of` on strings and `String.prototype.at` are two very 
different things for two very different use cases. It’s a matter of using the 
right tool for the job, IMHO.

In your example (iterating over all code points in a string), `for…of` should 
be used.

`String.prototype.codePointAt` or `String.prototype.at` come in handy in case 
you only need to get the first code point or symbol in a string, for example.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
On 18 Oct 2013, at 15:12, Andrea Giammarchi  wrote:

> so my counter-question would be: is there any way to do that in core so that 
> we can “💩💩💩”.split() it so that we can have an ArrayLike that with [1] gives 
> back the single “💩” and not the whole thing ?

This brings us back to the earlier discussion of whether something like 
`String.prototype.codePoints` should be added: 
http://esdiscuss.org/topic/how-to-count-the-number-of-symbols-in-a-string I 
think it would be useful

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
Please ignore my previous email; it has been answered already. (It was a draft 
I wrote up this morning before I lost my internet connection.)

On 18 Oct 2013, at 11:57, Allen Wirfs-Brock  wrote:

> Given that we have charAt, charCodeAt and codePointAt,  I think the most 
> appropiate name for such a method would be 'at':
>  '𝌆'.at(0)

Love it!

> The issue when this sort of method has been discussed in the past has been 
> what to do when you index at a trailing surrogate possition:
> 
> '𝌆'.at(1)
> 
> do you still get '𝌆' or do you get the equivalent of 
> String.fromCharCode('𝌆'[1]) ?

In my proposal it would return the equivalent of `String.fromCharCode('𝌆'[1])`. 
I think that’s the most sane behavior in that case. This also mimics the way 
`String.codePointAt` works in such a case.

Here’s a prollyfill for `String.prototype.at` based on my earlier proposal: 
https://github.com/mathiasbynens/String.prototype.at Tests: 
https://github.com/mathiasbynens/String.prototype.at/blob/master/tests/tests.js
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
On 18 Oct 2013, at 11:05, Anne van Kesteren  wrote:

> On Fri, Oct 18, 2013 at 4:58 PM, Mathias Bynens  wrote:
>> I disagree. In those situations you should just iterate over the string 
>> using `for…of`.
> 
> That seems to iterate over code units as far as I can tell.
> 
> for (var x of "💩")
>  print(x.charCodeAt(0))
> 
> invokes print() twice in Gecko.

Woah, that doesn’t seem very useful. Is that a bug, or the way it’s supposed to 
work? I thought it was supposed to only iterate over whole code points (i.e. 
only print once for each code point, not once for each surrogate half).
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
On 18 Oct 2013, at 10:48, Anne van Kesteren  wrote:

> On Fri, Oct 18, 2013 at 1:46 PM, Mathias Bynens  wrote:
>> Similarly, `String.prototype.charCodeAt` is fixed by 
>> `String.prototype.codePointAt`.
> 
> When you phrase it like that, I see another problem with
> codePointAt(). You can't just replace existing usage of charCodeAt()
> with codePointAt() as that would fail for input with paired
> surrogates. E.g. a simple loop over a string that prints code points
> would print both the code point and the trail surrogate code point for
> a surrogate pair.

I disagree. In those situations you should just iterate over the string using 
`for…of`.

`.symbolAt()` can be a useful replacement for `.charAt()` in case you only need 
to get the first symbol in the string. The same goes for `.codePointAt()` vs. 
`.charCodeAt()`.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
On 18 Oct 2013, at 10:25, Rick Waldron  wrote:

> String.prototype.elementAt?

This may be confusing too, since the spec refers to `elements` as code units, 
not code points.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
On 18 Oct 2013, at 10:39, Domenic Denicola  wrote:

> Doesn't Unicode have some name for "visual representation of a code point"? 
> Maybe it's "symbol"?

Not that I know of. I guess “Character” 
(http://www.unicode.org/glossary/#character) comes close, but we can’t really 
use that because `String.prototype.charAt` already exists. FWIW, I always use 
the term “symbol” to refer to a string that represents a single code point.

IMHO it’s not _really_ confusing to name this new method `symbolAt` because 
it’s defined on `String.prototype`, which indicates that it acts on strings and 
has nothing to do with ES6 Symbols. That said, I welcome better suggestions :)

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
Here’s my proposal. Feedback welcome, as well as suggestions for a better name 
(if any).

## String.prototype.symbolAt(pos)

NOTE: Returns a single-element String containing the code point at element 
position `pos` in the String `value` resulting from converting the `this` 
object to a String. If there is no element at that position, the result is the 
empty String. The result is a String value, not a String object.

When the `symbolAt` method is called with one argument `pos`, the following 
steps are taken:

01. Let `O` be `CheckObjectCoercible(this value)`.
02. Let `S` be `ToString(O)`.
03. `ReturnIfAbrupt(S)`.
04. Let `position` be `ToInteger(pos)`.
05. `ReturnIfAbrupt(position)`.
06. Let `size` be the number of elements in `S`.
07. If `position < 0` or `position ≥ size`, return the empty String.
08. Let `first` be the code unit at index `position` in the String `S`.
09. Let `cuFirst` be the code unit value of the element at index `0` in the 
String `first`.
10. If `cuFirst < 0xD800` or `cuFirst > 0xDBFF` or `position + 1 = size`, then 
return `first`.
11. Let `cuSecond` be the code unit value of the element at index `position + 
1` in the String `S`.
12. If `cuSecond < 0xDC00` or `cuSecond > 0xDFFF`, then return `first`.
13. Let `second` be the code unit at index `position + 1` in the string `S`.
14. Let `cp` be `(first – 0xD800) × 0x400 + (second – 0xDC00) + 0x1`.
15. Return the elements of the UTF-16 Encoding (clause 6) of `cp`.

NOTE: The `symbolAt` function is intentionally generic; it does not require 
that its `this` value be a String object. Therefore it can be transferred to 
other kinds of objects for use as a method.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: `String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
On 18 Oct 2013, at 09:21, Rick Waldron  wrote:

> I think the idea is good, but the name may be confusing with regard to 
> Symbols (maybe not?)

Yeah, I thought about that, but couldn’t figure out a better name. “Glyph” or 
“Grapheme” wouldn’t be accurate. Any suggestions?

Anyway, if everyone agrees this is a good idea I’ll get started on fleshing out 
a proposal. We can then use this thread to bikeshed about the name.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


`String.prototype.symbolAt()` (improved `String.prototype.charAt()`)

2013-10-18 Thread Mathias Bynens
ES6 fixes `String.fromCharCode` by introducing `String.fromCodePoint`.

Similarly, `String.prototype.charCodeAt` is fixed by 
`String.prototype.codePointAt`.

Should there be a method that is like `String.prototype.charAt` except it deals 
with astral Unicode symbols wherever possible?

>> '𝌆'.charAt(0) // U+1D306
'\uD834' // the first surrogate half for U+1D306

>> '𝌆'.symbolAt(0) // U+1D306
'𝌆' // U+1D306

Has this been discussed before? If there’s any interest I’d be happy to create 
a strawman.

Mathias  
http://mathiasbynens.be/
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: How is let compatibility resolved?

2013-10-14 Thread Mathias Bynens
On 2 Oct 2013, at 10:45, Petka Antonov  wrote:

> In current version, this works just fine:
> 
>var let = 6;

Note that `let` was reserved in strict mode (only) in ES5, meaning that even as 
per ES5 that snippet only works in sloppy mode.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Fwd: Making the identifier identification strawman less restrictive

2013-10-11 Thread Mathias Bynens
Forwarding Marijn’s message since he’s not subscribed to es-discuss.

Begin forwarded message:

> From: Marijn Haverbeke 
> Subject: Re: Making the identifier identification strawman less restrictive
> Date: 10 October 2013 11:13:34 CEST
> To: Norbert Lindenberg 
> Cc: Mathias Bynens , es-discuss , 
> Anton Kovalyov , Yusuke SUZUKI , 
> ariya.hida...@gmail.com, Jeremy Ashkenas , 
> mi...@bazon.net
> 
> I have no particular opinion about this. Identifiers with obscure
> characters tend to be so rare that I don't expect to have any trouble
> with this except for constructed conformance tests. Since you'll
> probably be the people who are going to construct such tests, I'll
> leave you to figure out what's sane.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Fwd: Making the identifier identification strawman less restrictive

2013-10-09 Thread Mathias Bynens
Forwarding Anton’s message since he’s not subscribed to es-discuss.

Begin forwarded message:

> From: Anton Kovalyov 
> Subject: Re: Making the identifier identification strawman less restrictive
> Date: 9 October 2013 22:17:50 CEST
> To: Mathias Bynens 
> Cc: Norbert Lindenberg , es-discuss 
> , Yusuke SUZUKI , Ariya 
> Hidayat , Jeremy Ashkenas , 
> Marijn Haverbeke , mi...@bazon.net
> 
> Hi,
> 
> If someone who’s running their code in the ES5 environment has a potential of 
> running into problems when using Unicode 6.3, JSHint needs to warn about it. 
> Today it doesn’t mostly because I’m really fuzzy on differences between 
> Unicode versions and I don’t have much time to dig into that so I’m relying 
> on incoming patches.
> 
> Hope that helps at all. Let me know if you need more info or if I 
> misunderstood the question.
> 
> Anton

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Making the identifier identification strawman less restrictive

2013-10-09 Thread Mathias Bynens
CC’ing the creators of the tools we’ve been talking about to get their input. 
Hi guys! Please start reading here: 
http://esdiscuss.org/topic/making-the-identifier-identification-strawman-less-restrictive.

On 9 Oct 2013, at 07:48, Norbert Lindenberg  
wrote:

> - For a code transformation tool, such as CoffeeScript, I agree that you 
> probably don't want to introduce any artificial restrictions, so you want to 
> use the latest Unicode version possible. Step 10 of the proposed algorithm 
> ("let unicode be the Unicode version supported by the implementation in 
> ECMAScript identifiers") is intended to cover that case.

But that makes it an implementation-dependent impure function, which is 
unacceptable for code transformation tools like CoffeeScript and parsers like 
Esprima, Acorn, or UglifyJS. They’d support certain identifiers in engine A but 
not in engine B, without any control over it. If this is how 
`String.isIdentifier{Start,Part}` works I think these tools will stick to their 
custom identifier identification methods, which would defeat the purpose of the 
entire strawman. (Ariya, Marijn, Mihai: any thoughts?)

>> For these reasons, I’d suggest changing the identifier identification 
>> proposal as follows. […]
> 
> That would create several problems:
> 
> - The Unicode version for ES 5 would be above that for ES 6 (step 9).

I would love to see that changed too as per 
http://javascript.spec.whatwg.org/#unicode-database-version, but that’s an 
issue with the main ES spec. https://bugs.ecmascript.org/show_bug.cgi?id=2071

> - Tools like JSHint, if they want to ensure compatibility with all ES 5 
> implementations, would have to lie and specify ES 3.

They don’t at the moment. @Anton, any thoughts?

> - Step 11 would allow all Unicode code points that are matched by the 
> IdentifierStart production, including supplementary code points, which ES 5 
> does not permit in identifiers. (Note that Unicode 3.0, the version 
> referenced by the ES 3 and ES 5 specs, was the last one that did not define 
> any supplementary characters, so the spec as proposed doesn't have that 
> problem).

Step 11 says “If cp is matched by the IdentifierStart production in edition 
`edition` of the ECMAScript Language Specification using Unicode version 
`unicode`, then return `true`” so this is not a problem either way. ES5 
`IdentifierStart` doesn’t include supplementary code points, like you said, 
because of the way ES5 defines “character”.

> - Implementations that don't support Unicode 6.3 yet, e.g., because they rely 
> on Unicode information provided by the operating system, would not be able to 
> comply with the spec.

Which implementations do that? The ones I’ve seen all use custom-generated 
Unicode data files. Is this really an issue?

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: FYI: Ecma-404 approved and published

2013-10-08 Thread Mathias Bynens
On 8 Oct 2013, at 23:39, "Mark S. Miller"  wrote:
> JSON must not change. If it refers to "the latest Unicode, whatever that is", 
> then it is potentially subject to disruption by (admittedly unlikely) future 
> changes to Unicode.

By that logic, it should have referred to either Unicode v5.0.0 or v4.1.0 
because that were the latest available versions back in July 2006 as per 
http://www.unicode.org/history/publicationdates.html.

On 8 Oct 2013, at 23:51, Allen Wirfs-Brock  wrote:
> If you look at the actual dependencies, it hardly matter as they are upon 
> things that is very hard to image ever changing.
> 
> The dependencies are:
> 1)The definition of "code point" 
> http://www.unicode.org/versions/Unicode6.2.0/ch03.pdf#G2212 
> 2) the actual code point to abstract character associations for the 
> "ASCII characters" mentioned in the spec. 
> 3) the UTF-16 encoding algorithm used for non-BMP code points
> 4)  ?? is there anything else?

Not as far as I can tell.

On 8 Oct 2013, at 23:51, Allen Wirfs-Brock  wrote:
> I suspect the version specificity could be removed in the future.


Yay!
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: FYI: Ecma-404 approved and published

2013-10-08 Thread Mathias Bynens
On 8 Oct 2013, at 22:19, Rick Waldron  wrote:
> On Tue, Oct 8, 2013 at 4:10 PM, Mathias Bynens  wrote:
> > As for Unicode, it explicitly refers to Unicode 6.2.0, even though version 
> > 6.3.0 was released last week.
> 
> The document was written in July, which was before last week.

No need to get snarky.

Why not just refer to http://www.unicode.org/versions/latest/, i.e. the latest 
available Unicode version? The version number doesn’t really matter for JSON as 
all it cares about is the concept of “code points”, the range of which is fixed.

Sorry for not raising this earlier, I must’ve missed the call for feedback 
in/before July.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: FYI: Ecma-404 approved and published

2013-10-08 Thread Mathias Bynens
On 8 Oct 2013, at 19:59, Allen Wirfs-Brock  wrote:
> The Ecma General Assembly has approved by letter ballot Ecma-404: THE JSON 
> Data Interchange Formal
> See http://www.ecma-international.org/publications/standards/Ecma-404.htm 

As for Unicode, it explicitly refers to Unicode 6.2.0, even though version 
6.3.0 was released last week.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Making the identifier identification strawman less restrictive

2013-10-06 Thread Mathias Bynens
This is about the identifier identification strawman: 
http://wiki.ecmascript.org/doku.php?id=strawman:identifier_identification

For tooling, it’s better to have a false positive than to have a false 
negative. In the case of identifier identification, it’s more useful to flag an 
identifier that is permitted as per the latest Unicode version as valid instead 
of rejecting it, even if it’s perhaps not supported in some engines that use 
data tables based on older Unicode versions.

In general, tools try to be lenient rather than restrictive in the input they 
accept. The list of ECMAScript 5 parsers that handle non-ASCII symbols in 
identifiers in the strawman backs this up: instead of using Unicode 3.0.0 data, 
more recent Unicode versions are used, in an attempt to handle as many 
technically valid identifiers as possible.

* Esprima and Acorn parse identifiers as per Unicode 6.3.0.
* UglifyJS v1 and v2 use Unicode 6.1.0, which as far as ECMAScript 5.1 
identifiers go, is identical to Unicode 6.3.0.

For these reasons, I’d suggest changing the identifier identification proposal 
as follows. Step 8 currently says:

> If `edition` is `3` or `5`, let `unicode` be `3.0`.

Change that into step 8a:

> If `edition` is `3`, let `unicode` be `3.0`.

Then, add a new step `8b`:

> If `edition` is `5`, let `unicode` be `6.3`.

Mathias  
http://mathiasbynens.be/

P.S. I’ve created an identifier identification prollyfill 
(https://github.com/mathiasbynens/identifier-identification) based on the 
current strawman. I’ll happily modify it if the strawman gets updated in any 
way.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: On `String.prototype.codePointAt` and `String.fromCodePoint`

2013-09-24 Thread Mathias Bynens
> I think I'm convinced that String.fromCodePoint()'s design is correct,
> especially since the rendering subsystem deals with code points too.

Glad to hear.

> String.prototype.codePointAt() however still feels wrong since you
> always need to iterate from the start to get the correct code *unit*
> offset anyway so why would you use it rather than the code *point*
> iterator that is planned for inclusion?

I think there are valid use cases for both.

For example, `String.prototype.codePointAt()` makes it easy to get only the 
code point at the first position, i.e. `str.codePointAt(0)`. `for…of` iterates 
over all code points in the string by default.

One key difference is that `String.prototype.codePointAt` is polyfillable in 
ES3/ES5, while `for…of` isn’t. This makes it easier to switch to 
`String.prototype.codePointAt` in existing code that is (incorrectly) using 
`String.prototype.charCodeAt` to loop over all code points in a string.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


On `String.prototype.codePointAt` and `String.fromCodePoint`

2013-09-24 Thread Mathias Bynens
Patches implementing `String.prototype.codePointAt` and `String.fromCodePoint` 
are available for both SpiderMonkey 
(https://bugzilla.mozilla.org/show_bug.cgi?id=918879) and V8 
(https://code.google.com/p/v8/issues/detail?id=2840).

One spec bug remains to be fixed, though: 
. It seems pretty clear the 
intent is to return `undefined` and not `NaN` (the algorithms in both the 
proposal and the ES6 draft agree on it), but it would be good to have this 
confirmed.

Is it a good idea for engines to start implementing these methods, or is their 
design still being discussed? The definitions of these methods have been in the 
ES6 draft for a long time (since July 2012) without any changes. Does that 
indicate stability? How sure are we that they will end up in the final ES6 spec?

Mathias  
http://mathiasbynens.be/
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Backwards compatibility and U+2E2F in `Identifier`s

2013-09-18 Thread Mathias Bynens
On 18 Sep 2013, at 21:05, Anne van Kesteren  wrote:

> On Mon, Aug 19, 2013 at 5:25 AM, Mathias Bynens  wrote:
>> After comparing the output, I noticed that both regular expressions are 
>> identical except for the following: ECMAScript 5 allows U+2E2F VERTICAL 
>> TILDE in `IdentifierStart` and `IdentifierPart`, but ECMAScript 6 / Unicode 
>> TR31 doesn’t.
> 
> Per ES6 identifiers start with code points whose category is ID_Start
> which per http://www.unicode.org/reports/tr31/ includes Lm which per
> http://www.unicode.org/Public/UNIDATA/UnicodeData.txt is true for
> U+2E2F. So why exactly is it disallowed?

`ID_Start` includes code points in the `Lm` category indeed, but then later 
explicitly disallows `Pattern_Syntax` and `Pattern_White_Space` code points. As 
it says on the page you linked to:

> In set notation, this is 
> [[:L:][:Nl:]--[:Pattern_Syntax:]--[:Pattern_White_Space:]] plus stability 
> extensions.

U+2E2F has the `Pattern_Syntax` property and is thus not a valid `ID_Start` 
code point.

___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Code points vs Unicode scalar values

2013-09-11 Thread Mathias Bynens
On 10 Sep 2013, at 18:30, Allen Wirfs-Brock  wrote:

> On Sep 10, 2013, at 12:14 AM, Mathias Bynens wrote:
> 
>> FWIW, here’s a real-world example of a case where this behavior is 
>> annoying/unexpected to developers: http://cirw.in/blog/node-unicode
> 
> This suggests to me that the problem is in JSON.stringify's Quote operation.  
> I can see an argument that Quote should convert all unpaired surrogates to 
> \u escapes.  I wonder if changing Quote to do this would break anything…

*If* this turns out to be a non-breaking change, it would make sense to have 
`JSON.stringify` escape any non-ASCII symbols, as well as any non-printable 
ASCII symbols, similar to `jsesc`’s `json` option [1]. This would improve 
portability of the serialized data in case it was saved to a misconfigured 
database, saved to a file with a non-UTF-8 encoding, served to a browser 
without `charset=utf-8` in the `Content-Type` header, et cetera.

[1] http://mths.be/jsesc#json
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Code points vs Unicode scalar values

2013-09-10 Thread Mathias Bynens
FWIW, here’s a real-world example of a case where this behavior is 
annoying/unexpected to developers: http://cirw.in/blog/node-unicode
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


Re: Question about allowed characters in identifier names

2013-09-05 Thread Mathias Bynens

On 5 Sep 2013, at 19:37, Norbert Lindenberg  
wrote:

> On Sep 5, 2013, at 1:06 , Mathias Bynens  wrote:
> 
>> On 26 Aug 2013, at 04:08, Norbert Lindenberg 
>>  wrote:
>> 
>>> On Aug 24, 2013, at 23:43 , Mathias Bynens  wrote:
>>> 
>>>> I would suggest adding something like `String.isIdentifier` which accepts 
>>>> a multi-symbol string or an array of code points to the strawman. Seems 
>>>> useful to be able to do `String.isIdentifier('foobar')`
>>> 
>>> What would be the use case(s) for that?
>> 
>> Tools like http://mothereff.in/js-escapes.
> 
> I see nothing on that page about identifiers.

Sorry, wrong link. I meant this one: http://mothereff.in/js-variables

> Note that String methods in general don't know anything about Unicode escapes 
> - those are handled by the ECMAScript or JSON parsers.

Of course.
___
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


  1   2   >