On 17.12.2011 20:19, Lasse Reichstein wrote:
On Sat, Dec 17, 2011 at 12:12 PM, Dmitry Soshnikov
<dmitry.soshni...@gmail.com>  wrote:
Hi,

Just recently was working with Ruby's code. And found useful again its
(actually from Perl) "approximately equal" operator: =~

The operator is just a sugar for `test' method of RegExp.

if (/ecma/.test("ecmascript")) {
...
if ("ecmascript" ~= /ecma/) {
So you save three characters (one, if you had paren-free invocation).
I personally don't find it more readable.

Yep! (and the argument about three characters isn't essential) ;)

It seems obvious and goes without saying that ~= is better than .test(). Perhaps it's just IMO though, I can't insist. I just found it very convenient in other languages.

And the other thing is "RegExp-substringing" with using bracket notation:
string[RegExp, startIndex].

"ecmascript"[/ecma/, 0]; // "ecma"
That's already valid syntax (stupid code, but valid). The result is "e".

Oh, my bad. Yes, it's already valid. Well, then we may consider other options. Have to think.

This is actually the sugar for:

"ecmascript".match(/ecma/)[0]; // "ecma"
You would want to handle the case where match returns null.

Add:
  String.prototype.get = function(re, n) { var res = re.exec(this);
return res ? res[n] : null; };
and you have:

   "ecmascript".get(/ecma/, 0) == "ecma"

(feel free to make it non-enumerable).

My fault I described it not clear. In string[regexp, startIndex] is exactly start index -- from where to start search in the string. It's not related to the index of `match' result. Anyway, this syntax is already borrowed.

E.g. a simple lexer:

var code = "var a = 10;"
var cursor = 0;

while (cursor<  code.length) {

    var chunk = code[cursor .. -1]; // sugar for slice: code.slice(cursor,
cursor.length);

    if (identifier = chunk[/\A([a-z]\w*)/, 1]) {
        // handle identifier token
    }

    else if (number = chunk[/\A([0-9]+)/, 1]) {
        // handle numbers
    }
...
Thoughts?
I don't think the advantage of slightly shorter code is worth the
extra syntactic complexity from adding two new constructions.

I love these arguments ;) But in fact -- of course it's worth. Especially, if the shortness makes it easier and more convenient.

By the way, are there syntactic complexity for the "~=" operator?

Especially since they only work with RegExps. If it was more generic,
in some way, it might be more reasonable to make operators for it.

And it's not even more readable (IMO) than:

    var chunk = code.substring(cursor);
    if (identifier = getMatch(chunk, /\A([a-z]\w*)/, 1)) {
        // handle identifier token
    } else if (number = getMatch(chunk, /\A([0-9]+)/, 1])) {
      // handle numbers
   }

Of course, since you already used to. Had people already have such operators, nobody would write these function calls then.

and for efficiency, I'd avoid the substring, and use single
invocations of global regexps.

It's already another topic, you may still catch the regexps and with using proposed operators.

This seems like something that can easily be abstracted into a helper
function, and come
out looking even better.

   var code;  // some string.
   var cursor;  // a position.
   var idMatch = /[a-z]\w*/ig;
   var numMatch =  /[0-9]+/g;
   // ...
   function check(re, n) {
     n = n || 0;
     re.lastIndex = cursor;
     var res = re.exec(code);
     if (res) {
       cursor = re.lastIndex;
       return res[n];
     }
     return null;
   }

   // and inside some loop:
   ...
   if (identifier = check(idMatch, 0)) {
      // handle identifier
   } else if (numeral = check(numMatch, 0)) {
      // handle identifier
   }


But this might come from me preferring to hide regexps away inside
abstractions. Using a RegExp is an implementation detail - it's just
one way to find something in a string, and there might be other, and
you might want to change implementation over time. Hard-coding regexps
into an interface gives them too much exposure, and making extra
operators just for regexps also puts too much focus on them.

Yes, this is also true, usually in such cases it's better to abstract things and provide some getters for this. But it was just an example to show the proposal, it's not the talk about lexer implementation.

If the language is built as a text processor, like Perl being heavily
influenced by AWK, it makes sense to have RegExps as a primary and
preferred feature. In ECMAScript, which has a more general-purpose
design, I don't think they should be given preferred treatment. A
class with methods is perfectly fine for what they do.

Perhaps, but I don't see why we can't have strong and powerful regexp constructions too.

  If ECMAScript
had raw strings, the RegExp literal wouldn't even be necessary.

How that? Can you explain?

Dmitry.
_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to