Re: [JS-internals] [GSoC 2013] Project Ideas

吴伟 Mon, 22 Apr 2013 22:07:04 -0700

Hi,


On Sun, Apr 21, 2013 at 8:37 AM, Nicolas B. Pierron <
[email protected]> wrote:

> Hi,
>
>
> On 04/20/2013 02:41 AM, 吴伟/Wei WU wrote:
>
>> On Fri, Apr 19, 2013 at 10:01 AM, Nicolas B. Pierron <
>> [email protected]> wrote:
>>
>>>
>>> - Clarifying our heuristics, to be able to make guesses while we are
>>> compiling in IonMonkey, and recompile without the guarded assumptions if
>>> the bailout paths are too costly.  Our current view is mostly black &
>>> white
>>> and only one bad use case can destroy the performances.  We need to
>>> introduce some gray view, saying that we are compiling for the likely
>>> sub-set and accept bailouts as part of the normal lifetime of a script.
>>>  As
>>> of today, IonMonkey is too much like a normal compiler, we should make it
>>> an assumption based compiler.
>>>
>>
>>
>>
>> I found a bug (#825268
>> <https://bugzilla.mozilla.org/**show_bug.cgi?id=825268<https://bugzilla.mozilla.org/show_bug.cgi?id=825268>>)
>> that
>>
>> may related to this project. According to the description of that bug I
>> realized that my understanding of the term 'heuristics' is relatively
>> naive
>> (the strategy used by an optimization algorithm to modify the intermediate
>> representation is based on a few expressions calculated from the structure
>> of source codes.) and I think the heuristics you mentioned are more
>> generic
>> than that.
>>
>> If I understand correctly, 'compiling for the the likely sub-set' means
>> that we can compile multiple versions for a method and execute one of them
>> based on which one's assumptions are satisfied. for example:
>>
>> function g(x){
>>    ...use x...
>> }
>> function f(){
>>    for (var i = 0; i < 10000; i++){
>>      if (i % 1000){
>>        g(i);
>>      }else{
>>        g('string');
>>      }
>>    }
>> }
>>
>> Currently IonMonkey compiles g(x) with the guard "assert(typeof x ===
>> int)"
>> and the jit code will be invalidated periodically. If g(x) can be compiled
>> with the assumption "assert(typeof x in {int,string}" then the bailouts
>> would be avoided.
>>
>> Am I right?
>>
>
> If g is not inlined, IonMonkey will compile with a guard to ensure that x
> is an Int.  As soon as a call is made with a string, we will discard the
> code, and later recompile it with a larger type set (int & string).  The
> problem is that doing operations on something which can be either an int or
> a string will always be slow, as we fallback on VM function calls.  In your
> example the Int case can be executed in the baseline compiler while the
> string case can remain in Ion-compiled code.  Doing a bailout might still
> be worth it compared to the price of a recompilation and a slower execution
> for the most-likely use case.
>
> The question which remain behind is: "Should we keep or discard the
> code?". To be able to answer this question we need some kind of heuristics.
> And before making heuristics, we need a meaningful metric. The only metric
> that is meaningful in such cases is the time, but not ordinary time, as we
> are at the mercy of the scheduler.  Bug 825268 suggests to clean-up the way
> we are using the use-count to make it a reliable & *determinist* source of
> time based on the execution of scripts.
>
>  - Adding profile guided optimization, the idea would be to profile which
>>> branches are used and to prune branches which are unused, either while
>>> generating the MIR Graph, or as a second optimization phase working on
>>> the
>>> graph.  A trivial way of doing so can just look at the jump target which
>>> have been visited so far.  A more clever one might register the
>>> transition
>>> and use Markov chain to determine relations between branches, and
>>> duplicate
>>> a sub-part of the graph to improve constant propagation and the range
>>> analysis.  Before doing anything fancy like the more-clever case we still
>>> need to check that this is worth it and see the benefit of other
>>> optimizations if we fold the graph.
>>>
>>>  I'm interested in this idea and I'm willing to implement it. A simple
>> and
>> fast algorithm may be a good choice for GSoC, while the advanced methods
>> require more investigation. I haven't found any branch profiling code in
>> the interpreter so the instrumenting mechanism must be finished first, or
>> we can leverage baseline compiler to generate self-instrumented jit code.
>> Profiling data can be saved separately or in MIR nodes as an annotation.
>> In
>> the second case more other passes may benefit from it easily.
>>
>
> Indeed, there is no code for profiling yet.  The most easiest way to think
> about it is to do like the write barrier of the incremental GC, i-e having
> a buffer (potentially circular, as the opposite of the write barrier) which
> is filled with jump targets.  Thus, every time we jump to another location
> in the code we push the location of the offset in this buffer, and later we
> reuse this buffer to re-organize basic blocks in IonMonkey and to duplicate
> / prune basic blocks if needed.
>

There are two possible ways to store branch profiling data. One is count
array, the other is target buffer.
I've read LLVM's implementation of Branch(Edge) Profiling and found that it
uses a vector (array) to save the information. LLVM allocates two counters
for each branch/edge. When a block jumps to another the correlate counters
will be incremented by 1. Then these frequencies will be transformed into
probabilities and consumed by some transform passes. Jikes RVM stores
branch profiling data in the similar way.
One possible problem is that they use block ID to index branch counters and
we don't have such things in JavaScript bytecodes.

On the other hand, branch target buffer maintains a sequence of branch
targets and the index problem can be avoided. Further more, It makes
possible to determine relations between branches. The main problem I have
considered is that the cost of calculating branch probabilities might be
high, since we must traverse the buffer to summarize the results.

I have encountered some other issues. Could you give me some suggestions?
- Instrumenting the Interpreter is fairly easy, while I'm not sure it is
possible to instrument baseline compiler in the similar way. I think it's
better to support them both.
- Overhead. Instrumenting every conditional jumps may degrades performance
and an 'accepted overhead rate' should be considered. Also, it should be
able to switch on/off by command line arguments.
- Store/Restore profiling data on disks. Branch profiling data can be
stored and restored in LLVM and Jikes RVM. But I don't think it is
necessary in SpiderMonkey.
- Designing the interface of Branch Profiling. Since I was unable to find
design guidelines for SpiderMonkey, I will propose a draft interface in my
proposal and update it in future.

I am planning to submit a proposal for this project. Would you like to be
my mentor during this summer?


>  Two bugs (#410994 
> <https://bugzilla.mozilla.org/**show_bug.cgi?id=410994<https://bugzilla.mozilla.org/show_bug.cgi?id=410994>>,
>> #
>> 419344 
>> <https://bugzilla.mozilla.org/**show_bug.cgi?id=419344<https://bugzilla.mozilla.org/show_bug.cgi?id=419344>>
>> ) have
>>
>> mentioned PGO but may be irrelevant with this idea.
>>
>
> Indeed, they are irrelevant for doing PGO on JavaScript.
>
>  Other case of smaller projects might be:
>>>
>>> - Improving our Alias Analysis to take advantage of the type set (this
>>> might help a lot kraken benchmarks, by factoring out array accesses).
>>>
>>>  My knowledge about Alias Analysis is limited; I know the principles and
>> have read [4] 
>> <http://dl.acm.org/citation.**cfm?id=277670<http://dl.acm.org/citation.cfm?id=277670>>
>> one year ago. I am
>>
>> not sure how type sets can help subsequent optimizations. Would you please
>> provide me some more information(wiki, blogs, or papers) about it?
>> [4]: 
>> http://dl.acm.org/citation.**cfm?id=277670<http://dl.acm.org/citation.cfm?id=277670>
>>
>
> JavaScript is an untyped language, which means that any variable access
> can alias (reference) the same value as variable.  Currently IonMonkey
> alias analysis is not clever and it does not use the type of object to know
> if one write might alias another.  If two variables are holding objects
> with different types, then we have the guarantee that their value cannot
> alias each others.
>
> One JavaScript function which looks like that:
>
> function f(a, arr) {
>   for (var i = 0; i < arr.length; i++) {
>     arr[i].b = 1;
>     a.b = 0;
>   }
> }
>
> Can be written in a well-typed language as:
>
> void f(Foo *a, Vector<Bar> *arr) {
>   for (int i = 0; i < arr->length(); i++) {
>     (*arr)[i].b = 1;
>     a->b = 0;
>   }
> }
>
> where Foo is a different type than Bar.  Which implies that both b are
> part of different object, which means that the previous JavaScript function
> can be transformed to:
>
> function f(a, arr) {
>   a.b = 0; /* Assume shape of a is different than the shape of arr[i] */
>   for (var i = 0; i < arr.length; i++) {
>     arr[i].b = 1;
>   }
> }
>
> In the case of kraken, the problem is that the alias analysis does not
> consider the type of Arrays and consider that the value they contain can be
> anything, including them-self.  If the alias analysis were using the type
> information we should be able to figure out that M[i] does not alias
> M[i][j], and thus factor M[i] access across setters of M[i][j].
>
>
>>> - Improve dummy functions used for asm.js boundaries.  Asm.js needs to
>>> communicate with the DOM, and to do so it need some trampoline functions
>>> which are used as an interface with the DOM API.  Such trampolines might
>>> transform typed arrays into strings or objects and serialize the result
>>> back into typed arrays.
>>>
>>>  I haven't read OdinMonkey's code yet, but I will look into it next week.
>> Speeding up asm.js on ARM is one of my interests.
>>
>
> I do not know much of the status of OdinMonkey on ARM, a few weeks ago the
> concern was just to be able to compile it, as the ahead-of-time compilation
> was taking a lot of memory at the statr-up of the application, I wonder if
> this is still a concern as we removed the generation of the bytecode.
>
> What I was referring to are mostly bugs similar to Bug 860574 [1].
>
> [1] 
> https://bugzilla.mozilla.org/**show_bug.cgi?id=860574<https://bugzilla.mozilla.org/show_bug.cgi?id=860574>
>
>
> --
> Nicolas B. Pierron
>
> ______________________________**_________________
> dev-tech-js-engine-internals mailing list
> dev-tech-js-engine-internals@**lists.mozilla.org<[email protected]>
> https://lists.mozilla.org/**listinfo/dev-tech-js-engine-**internals<https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals>
>



-- 
Best wishes,
Wei Wu (吴伟)
_______________________________________________
dev-tech-js-engine-internals mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Re: [JS-internals] [GSoC 2013] Project Ideas

Reply via email to