Hi Julian, don't forget that it is a bit unfair to compare like that. (For 2 reasons). 1) Exec needs to set re.LastIndex accordingly. I.e. set it to 0 at the start and correctly upon match 2) The result array has two extra properties: index and input that you don't set.
There are possible more extra things that need to happen that is not caused by Yarr at all... Now one of the important improvements your version got, is that it doesn't need to flatten the inputstring. I think. While that happens by default for Yarr. Now that's one of the benefits of using JS. It has more information. Another one is that we don't need to jump out of JS to C++ to Yarr Jit code again. Best Hannes On Sun, Jan 5, 2014 at 1:08 PM, Julian Viereck <[email protected]> wrote: > Hi Hannes, > > thanks a lot for your reply :) > > >> I'm not sure what you have tried. But I tried your hardcoded version. > > I tried to make my testing more transparent and uploaded my code on a GitHub > repo: > > https://github.com/jviereck/regexp.js-octane > > >> Though I would suggest to try to run the numbers again, since the numbers >> differ so much from mine. > > Looking at the numbers, I think the numbers are fine if we assume you have a > more powerful PC that results in a score roughly 2x of my value by default. > Your score values before and after differ by ~200 points, while my do by > ~100 - so there is the 2x speed difference. > > >> we see 2 signatures in "Exec". So it is less specialized (not much, just >> an extra if to distinguish the paths at the "exec" call). I'm sure if all >> regexps would be transformed to "RegExpJS" we would get that back. It would >> only see 1 signature again. > > Thanks a lot for this hint! Based on this input, I have created a new > "Exec2" function, which is an exact copy of the "Exec" function, but the > "Exec2" function is only used for executing the re0 regular expression [1]. > Using the hard coded RegExpJS function for re0 [2] resulted in these > numbers: > > before: 1582.7 > (https://github.com/jviereck/regexp.js-octane/tree/e925606d0850b5c94d1622f7cfdcd2ab2c08e767) > after: 1632.7 > (https://github.com/jviereck/regexp.js-octane/tree/0630eec8e656f3df5effc27114ba80ffe970d53e) > > These numbers are the average of 10 runs. There seems to be a speedup using > the hardcoded JS version. > > These results look more promising. However, they should be treated with care > as getting /^ba/ to work is quite simple and the implementation makes very > good to JS functions (e.g. String.prototype.startsWith), while a more > complicated example including backtracking might yield different results. > > Do you think it is worth to implement a hard coded version of the second > Octane tested regular expression: > > var re1 = /(((\w+):\/\/)([^\/:]*)(:(\d+))?)?([^#?]*)(\?([^#]*))?(#(.*))?/; > > to see how good the performance can get? > > Best, > > - Julian > > > [1]: > https://github.com/jviereck/regexp.js-octane/commit/0d6e01d36a7d5dc24c385e3437e6b740dbd9da78#diff-0 > > [2]: > https://github.com/jviereck/regexp.js-octane/commit/0630eec8e656f3df5effc27114ba80ffe970d53e > > > On 05/01/14 12:13, hv1989 wrote: > > Hi Julian, > > I'm not sure what you have tried. But I tried your hardcoded version. > (i.e. defining RegExpJS ourself, with the ^ba hack) > > - octane1.0-regexp: > before: 4510 > after: 4658 > > - octane2.0-regexp: > before: 2585 > after: 2390 > > So in octane1.0 that is indeed an improvement. For octane2.0 not and > that has a reason. In octane2.0 all calls to "exec()" have a wrapper: > "Exec()" that does some extra testing to make sure the result is > correct. Using TypeInformation we can find out this is only called > with "RegExp" as first parameter. So we can optimize that. Now with > "new RegExpJS(/^ba/);" we see 2 signatures in "Exec". So it is less > specialized (not much, just an extra if to distinguish the paths at > the "exec" call). I'm sure if all regexps would be transformed to > "RegExpJS" we would get that back. It would only see 1 signature > again. > > Now about RegExp.JS bringing such a big loss. That is possible. Yarr > isn't bad and in octane-regexp we only are stuck in the interpreter > for 3% and even in that case the interpreter isn't that slow. We > wouldn't win much on octane-regexp if we could JIT everything (what > the problem is for the other benchmarks like jQuery and Peacekeeper). > It will bring maximum a 4% gain for octane-regexp. Though I would > suggest to try to run the numbers again, since the numbers differ so > much from mine. > > Best Hannes > > On Sun, Jan 5, 2014 at 11:31 AM, <[email protected]> wrote: > > On Thursday, January 2, 2014 6:47:58 PM UTC+1, Nicolas Pierron wrote: > > On 01/02/2014 07:31 AM, Nicolas B. Pierron wrote: > > I should have wrote that with a past tense … > > https://github.com/jviereck/regexp.js > > So far I hadn't done any performance numbers for RegExp.JS. I looked into > this and thanks to the help of Till I got the Octane benchmark running in > the JS shell [1]. > > Before converting the entire Octane RegExp benchmark to run using RegExp.JS > I thought I just try the first RegExp tested in the benchmark. This means > the in terms of code changes: > > diff --git a/regexp.js b/regexp.js > - var re0 = /^ba/; > + var re0 = new RegExpJS(/^ba/); > > Just changing this one RegExp caused the score from ~1480 on my machine to > drop to 77 (!!!) using the RegExp.JS library (& my.mood = :( ). > > Okay, so maybe RegExp.JS is doing something completely wrong, which is why I > tried another dump approach and defined: > > function RegExpJS(reg) { } > > RegExpJS.prototype.exec = function(str) { > if (str.startsWith('ba')) { > return ['ba']; > } else { > return null; > } > } > > This RegExpJS object ONLY works HARDCODED with the first regexp of the > octane benchmark (/^ba/) - cheating, I know, but let's see where this gets > us in terms of performance. Running the regexp.js benchmark with this > RegExpJS definition and the modification |var re0 = new RegExpJS(/^ba/);| > resulted in a score of ~1340. Better than 77, but still a huge drop compared > to 1480 by only changing one RegExp in the benchmark! > > (If you wonder if replacing the |if(str.startsWith('ba'))| call with |if > (str[0] == 'b' && str[1] == 'a') {| --- no, that doesn't make any difference > in terms of performance :/). > > --- > > Without knowing anything about the Spidermonkey JS internals, this very > small benchmarking raises the following questions to me: > > 1) Is the YARR implementation so much faster than anything written in plane > JS (even if the JS is highly optimized for the RegExp and matches the string > in the best optimial way)? > 2) Is there a performance bug in Spidermonkey, that makes even the plain > RegExpJS running only /^ba/ such slow? > > > > Cheers, > > - Julian > > > > > [1] Using the js shell provided at > http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/ dated > on the 04-Jan-2014 11:50. > > > > _______________________________________________ > dev-tech-js-engine-internals mailing list > [email protected] > https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals > > > -- > > - Julian _______________________________________________ dev-tech-js-engine-internals mailing list [email protected] https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

