Hi!

ever since I've learned that current Emscripten depends on a custom branch 
of LLVM, I've been wondering about this.

   - What changes make the Emscripten branch different from upstream?
   - Why are those changes necessary?
   - Which of these changes are necessary at compile time, and which at 
   link time?
   - Is the plan to keep this split, or to unify things back again?
   - What does NaCl have to do with all of this?

I know the answers to my questions are somewhere out there, but digging 
through tons of mails for that doesn't feel very rewarding. Is there some 
kind of documentation for all of this? If so, feel free to answer all of my 
questions with a simple link to that documentation.

I've started answering my first question: what changed? Looking at the most 
recent merge from LLVM upstream through NaCl into Emscripten, I see changes 
by NaCl to 694 files 
<https://github.com/kripken/emscripten-fastcomp/compare/02916381eb87518ba6d541b065141a72b60561da...ce5573729064e6e54e1c05e1f5e5bd0a65fc6fe8#files_bucket>,
 
and changes by Emscripten to 42 files 
<https://github.com/kripken/emscripten-fastcomp/compare/ce5573729064e6e54e1c05e1f5e5bd0a65fc6fe8...78afe0bb182d056d5d5e24894c3f66e5b6fe0c7f#files_bucket>.
 
The former is too much for GitHub to display, but after filtering the list 
of affected files, I see things like the newly introduced 
lib/Target/JSBackend 
<https://github.com/kripken/emscripten-fastcomp/tree/ce5573729064e6e54e1c05e1f5e5bd0a65fc6fe8/lib/Target/JSBackend>
 
directory. That directory is mostly authored by Emscripten developers, as 
far as I can see. So does that mean that the NaCl project integrated the 
Emscripten codebase into their own? I thought they were shipping LLVM IR to 
be executed by Chrome without detour through a JavaScript implementation. 
The changes to Emscripten which are not included in NaCl are many commits 
but few modifications, so is that just stuff NaCl hasn't merged into their 
code yet? So far I haven't found a merge from Emscripten to NaCl yet, but 
that's probably because I can't think of the correct git commands to ask 
for this, and the graph is just too big for manual scrolling.

The clang repository shows a similar picture on a smaller scale. Upstream 
to NaCl 
<https://github.com/kripken/emscripten-fastcomp-clang/compare/8eceb42a00eee3f45ff5e3c8665c32e4ac9feeb4...097061e2dec7091d5256f0e80909bf1641087b97#files_bucket>
 
shows changes to 30 files, many of them mentioning Emscripten. The changes 
from that to Emscripten 
<https://github.com/kripken/emscripten-fastcomp-clang/compare/097061e2dec7091d5256f0e80909bf1641087b97...6bef7274efd0513418674ce0f215c46d188957da#files_bucket>
 
are almost negligible. This time, however, the commits to the NaCl repo are 
by people I haven't noticed as core Emscripten contributors before. Are 
they? If not, what is the motivation for adding Emscripten-specific code 
there?

As to why all these changes are necessary: I assume that a C++-written 
backend is “obviously” the fastest way to generate target-specific code 
from IR. I'd have assumed that it should be possible to implement such a 
backend as a plug-in, or as a separate program linking against the LLVM 
libs. Why hasn't that been done, but the source tree been modified instead? 
Is it because that's simply the easiest way to ensure all versions and 
paths and stuff match up? Or is it because this should end up in vanilla 
LLVM one day, so it might as well be written to that code tree and kept in 
sync with it? Or is vanilla LLVM simply not flexible enough to allow for 
such a backend without modifications to other places as well?

The distinction between compile time and link time is particularly 
important when dealing with other front ends. What if I want to compile 
Fortran, D, or any other language to JavaScript? Is it enough to generate 
LLVM code using any frontend, and feed that to Emscripten? Can I safely 
ignore the warning that I'm linking IR code generated for different 
triples? Or should I try to compile each such frontend against the LLVM 
library? Would that even be enough, or are the modifications to clang 
essential enough that compiling other languages this way would be 
problematic? (I've recentry investigated ways to compile LAPACK 
<https://github.com/kripken/emscripten/issues/998#issuecomment-110954873> 
using Dragonegg <http://dragonegg.llvm.org/>, and even though Dragonegg looks 
pretty much dead 
<https://github.com/llvm-mirror/llvm/commit/3d09e0e55093ecb569a5c700> and 
flang <https://github.com/isanbard/flang> still a fetus, I wonder whether 
this could work for more than the single example function I've tried so 
far. My first contributions <https://github.com/kripken/emscripten/pull/841> 
to Emscripten where when I tried to feed ldc output to it, and I'm still 
interested in compiling D to JavaScript.)

The question of whether Emscripten changes will (aim to) be merged into 
llvm has been raised on this list before 
<https://groups.google.com/d/msg/emscripten-discuss/_kIjDA3oKDY/U4UsxyIhBPcJ>, 
but the response only talked about rebasing, which isn't the same thing as 
I see it. Are there plans to one day have all the JS-specific changes in 
vanilla LLVM, and to have Emscripten simply compile against that? If not, 
is some alternative like a plugin or similar envisioned? Having multiple 
copies of LLVM lying around doesn't feel good in the very long term.

I hope my curiosity hasn't annoyed anyone too much. Thanks for reading, and 
thank you even more for any insights you can provide.

Greetings,
  Martin von Gagern


-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to