Le 11 avr. 2011 à 12:17, David Chisnall a écrit : > On 11 Apr 2011, at 10:38, Quentin Mathé wrote: > >> Le 4 avr. 2011 à 00:05, David Chisnall a écrit : >> >>> I've committed this code in Languages/ObjC2JS. It builds as a clang >>> plugin, which uses clang to generates an abstract syntax tree and then >>> walks it emitting JavaScript. >> >> I was wondering what's your take about the LLVM bitcode interpreter approach >> taken by the Emscripten project. See >> https://github.com/kripken/emscripten/wiki > > It's an interesting project, but it's not really an interesting approach. I > looked at compiling LLVM bitcode to JavaScript first, but it's just a > horrible match. Neither the basic memory nor basic flow control primitives > in LLVM have analogues in JavaScript, making it quite messy. > > They also have the disadvantage that they throw away the high-level semantics > from the start. If we used this approach, we'd compile libobjc2 to bitcode, > then compile Objective-C to bitcode, then interpret this in JavaScript. A > method lookup would end up being a few dozen LLVM instructions, all of which > would be interpreted. A good JS VM would JIT-compile the interpreter, but > JIT compiling the interpreted code is an order of magnitude harder. They use > the Closure Compiler to do this transform (which means that their compiler is > two or three orders of magnitude more complex than mine), but it still > wouldn't be able to map the Objective-C object model into the JavaScript one. > The method lookup uses some shift and lookup operations that don't mesh well > with the JavaScript model for integers, not with the JavaScript memory model, > so even a good compiler going from this level will generate bad code. > > In contrast, a message send in my code is a lookup in a JavaScript dictionary > and a call. Both of these are primitive JavaScript operations and, more > importantly, they are ones whose performance affects every single JavaScript > program. A half-decent JavaScript implementation will do things like inline > method calls from my approach.
ok, I understand much better now. >>> From what I understand, there are more room for optimizations by emitting >>> JS code from the Clang AST. For example, C or ObjC loops constructs can be >>> remapped to JS loops, or ObjC message sends might be more easy to optimize >>> (e.g. by inserting type feedback optimizations written in JS). >> >> However this means that another code generator will have to be written to >> compile the LK languages to JS, when Emscripten would involve no extra code >> generator. > > It requires about one line of code for each AST node - hardly any effort at > all. LK's AST is much simpler than clang's, and clang's was pretty easy. I thought it would be more complex, sounds like a reasonable redundancy then. > >> Given that they claim there are a lot of rooms for optimizations in their >> approach (see >> https://github.com/kripken/emscripten/raw/8a6e2d67c156d9eaedf88b752be4d1cf4242e088/docs/paper.pdf), >> why did you choose to emit JS code from the Clang AST? > > In general, it's always better to give the optimiser as much information as > possible, so generating JavaScript code that is semantically similar to the > source is better. > > Oh, and if their code really does work as they described in the paper, then > it's not just wrong, it's badly wrong. Things like overflow semantics in C > are not correctly modelled (they are with mine). I read on their website (or in the paper perhaps) that correct overflow semantics for C and C++ remains to be implemented. >> Would message send optimizations be possible with Emscripten? Just harder >> than with your approach? > > It's not really possible. I have a very lightweight Objective-C runtime > implemented in JavaScript, which makes Objective-C objects into JavaScript > objects and just adds a lightweight class model on top. All instance > variable and method lookups are done as pure JavaScript slot lookups. This > makes it trivial for the JavaScript VM to optimise the code - it just needs > to use the same optimisations as it uses for normal JavaScript. In effect, > it means that the compiler doesn't need to do any optimisations, it just > needs to give the JS VM enough information to be able to do them. > > It also lets us do some nice high-level optimisations. For example, we can > replace GSString and GSArray with JSString and JSArray, which work by just > adding an isa field to Array.prototype and String.prototype, meaning that all > of the effort that goes into optimising the array and string implementations > in JavaScript will directly benefit us - any code that uses NSArray or > NSString gets to be (almost) as fast as code using JavaScript arrays and > strings directly. > >> Could your choice also be related that LLVM seems to lack some elements to >> build a VM that would interpret the LLVM bitcode? Especially when you >> consider the live optimizations (e.g. profiling and inlining method calls >> while running) that a Java VM supports? > > Essentially, the problem is translating a high-level language (e.g. > Objective-C) into a high-level language (JavaScript). Going via a low-level > language (LLVM IR) means that you throw away a lot of useful information and > then have to try to re-infer it later. It's a fundamentally flawed approach. > It will work, if you put enough effort into it, but that doesn't make it > sensible. ok. Thanks for the detailed explanation :-) On a related note, there is a blog post about Newspeak on JS recently: http://gbracha.blogspot.com/2011/03/truthiness-is-out-there.html Cheers, Quentin. _______________________________________________ Etoile-dev mailing list [email protected] https://mail.gna.org/listinfo/etoile-dev
