Le 11 avr. 2011 à 12:17, David Chisnall a écrit :

> On 11 Apr 2011, at 10:38, Quentin Mathé wrote:
> 
>> Le 4 avr. 2011 à 00:05, David Chisnall a écrit :
>> 
>>> I've committed this code in Languages/ObjC2JS.  It builds as a clang 
>>> plugin, which uses clang to generates an abstract syntax tree and then 
>>> walks it emitting JavaScript.
>> 
>> I was wondering what's your take about the LLVM bitcode interpreter approach 
>> taken by the Emscripten project. See 
>> https://github.com/kripken/emscripten/wiki
> 
> It's an interesting project, but it's not really an interesting approach.  I 
> looked at compiling LLVM bitcode to JavaScript first, but it's just a 
> horrible match.  Neither the basic memory nor basic flow control primitives 
> in LLVM have analogues in JavaScript, making it quite messy.
> 
> They also have the disadvantage that they throw away the high-level semantics 
> from the start.  If we used this approach, we'd compile libobjc2 to bitcode, 
> then compile Objective-C to bitcode, then interpret this in JavaScript.  A 
> method lookup would end up being a few dozen LLVM instructions, all of which 
> would be interpreted.  A good JS VM would JIT-compile the interpreter, but 
> JIT compiling the interpreted code is an order of magnitude harder.  They use 
> the Closure Compiler to do this transform (which means that their compiler is 
> two or three orders of magnitude more complex than mine), but it still 
> wouldn't be able to map the Objective-C object model into the JavaScript one. 
>  The method lookup uses some shift and lookup operations that don't mesh well 
> with the JavaScript model for integers, not with the JavaScript memory model, 
> so even a good compiler going from this level will generate bad code.
> 
> In contrast, a message send in my code is a lookup in a JavaScript dictionary 
> and a call.  Both of these are primitive JavaScript operations and, more 
> importantly, they are ones whose performance affects every single JavaScript 
> program.  A half-decent JavaScript implementation will do things like inline 
> method calls from my approach.

ok, I understand much better now.

>>> From what I understand, there are more room for optimizations by emitting 
>>> JS code from the Clang AST. For example, C or ObjC loops constructs can be 
>>> remapped to JS loops, or ObjC message sends might be more easy to optimize 
>>> (e.g. by inserting type feedback optimizations written in JS).
>> 
>> However this means that another code generator will have to be written to 
>> compile the LK languages to JS, when Emscripten would involve no extra code 
>> generator.
> 
> It requires about one line of code for each AST node - hardly any effort at 
> all.  LK's AST is much simpler than clang's, and clang's was pretty easy.

I thought it would be more complex, sounds like a reasonable redundancy then.

> 
>> Given that they claim there are a lot of rooms for optimizations in their 
>> approach (see 
>> https://github.com/kripken/emscripten/raw/8a6e2d67c156d9eaedf88b752be4d1cf4242e088/docs/paper.pdf),
>>  why did you choose to emit JS code from the Clang AST?
> 
> In general, it's always better to give the optimiser as much information as 
> possible, so generating JavaScript code that is semantically similar to the 
> source is better.
> 
> Oh, and if their code really does work as they described in the paper, then 
> it's not just wrong, it's badly wrong.  Things like overflow semantics in C 
> are not correctly modelled (they are with mine).

I read on their website (or in the paper perhaps) that correct overflow 
semantics for C and C++ remains to be implemented.

>> Would message send optimizations be possible with Emscripten? Just harder 
>> than with your approach?
> 
> It's not really possible.  I have a very lightweight Objective-C runtime 
> implemented in JavaScript, which makes Objective-C objects into JavaScript 
> objects and just adds a lightweight class model on top.  All instance 
> variable and method lookups are done as pure JavaScript slot lookups.  This 
> makes it trivial for the JavaScript VM to optimise the code - it just needs 
> to use the same optimisations as it uses for normal JavaScript.  In effect, 
> it means that the compiler doesn't need to do any optimisations, it just 
> needs to give the JS VM enough information to be able to do them.
> 
> It also lets us do some nice high-level optimisations.  For example, we can 
> replace GSString and GSArray with JSString and JSArray, which work by just 
> adding an isa field to Array.prototype and String.prototype, meaning that all 
> of the effort that goes into optimising the array and string implementations 
> in JavaScript will directly benefit us - any code that uses NSArray or 
> NSString gets to be (almost) as fast as code using JavaScript arrays and 
> strings directly.
> 
>> Could your choice also be related that LLVM seems to lack some elements to 
>> build a VM that would interpret the LLVM bitcode? Especially when you 
>> consider the live optimizations (e.g. profiling and inlining method calls 
>> while running) that a Java VM supports?
> 
> Essentially, the problem is translating a high-level language (e.g. 
> Objective-C) into a high-level language (JavaScript).  Going via a low-level 
> language (LLVM IR) means that you throw away a lot of useful information and 
> then have to try to re-infer it later.  It's a fundamentally flawed approach. 
>  It will work, if you put enough effort into it, but that doesn't make it 
> sensible.

ok.

Thanks for the detailed explanation :-)

On a related note, there is a blog post about Newspeak on JS recently: 
http://gbracha.blogspot.com/2011/03/truthiness-is-out-there.html

Cheers,
Quentin.


_______________________________________________
Etoile-dev mailing list
[email protected]
https://mail.gna.org/listinfo/etoile-dev

Reply via email to