Divergence from LLVM upstream

martin . vgagern Sat, 13 Jun 2015 09:02:21 -0700

Hi!

ever since I've learned that current Emscripten depends on a custom branch 
of LLVM, I've been wondering about this.

- What changes make the Emscripten branch different from upstream?
- Why are those changes necessary?
- Which of these changes are necessary at compile time, and which at
link time?
- Is the plan to keep this split, or to unify things back again?
- What does NaCl have to do with all of this?

I know the answers to my questions are somewhere out there, but digging
through tons of mails for that doesn't feel very rewarding. Is there some
kind of documentation for all of this? If so, feel free to answer all of my
questions with a simple link to that documentation.

I've started answering my first question: what changed? Looking at the most
recent merge from LLVM upstream through NaCl into Emscripten, I see changes
by NaCl to 694 files
<https://github.com/kripken/emscripten-fastcomp/compare/02916381eb87518ba6d541b065141a72b60561da...ce5573729064e6e54e1c05e1f5e5bd0a65fc6fe8#files_bucket>,

and changes by Emscripten to 42 files
<https://github.com/kripken/emscripten-fastcomp/compare/ce5573729064e6e54e1c05e1f5e5bd0a65fc6fe8...78afe0bb182d056d5d5e24894c3f66e5b6fe0c7f#files_bucket>.

The former is too much for GitHub to display, but after filtering the list
of affected files, I see things like the newly introduced
lib/Target/JSBackend
<https://github.com/kripken/emscripten-fastcomp/tree/ce5573729064e6e54e1c05e1f5e5bd0a65fc6fe8/lib/Target/JSBackend>

directory. That directory is mostly authored by Emscripten developers, as
far as I can see. So does that mean that the NaCl project integrated the
Emscripten codebase into their own? I thought they were shipping LLVM IR to
be executed by Chrome without detour through a JavaScript implementation.
The changes to Emscripten which are not included in NaCl are many commits
but few modifications, so is that just stuff NaCl hasn't merged into their
code yet? So far I haven't found a merge from Emscripten to NaCl yet, but
that's probably because I can't think of the correct git commands to ask
for this, and the graph is just too big for manual scrolling.

The clang repository shows a similar picture on a smaller scale. Upstream
to NaCl
<https://github.com/kripken/emscripten-fastcomp-clang/compare/8eceb42a00eee3f45ff5e3c8665c32e4ac9feeb4...097061e2dec7091d5256f0e80909bf1641087b97#files_bucket>

shows changes to 30 files, many of them mentioning Emscripten. The changes
from that to Emscripten
<https://github.com/kripken/emscripten-fastcomp-clang/compare/097061e2dec7091d5256f0e80909bf1641087b97...6bef7274efd0513418674ce0f215c46d188957da#files_bucket>

are almost negligible. This time, however, the commits to the NaCl repo are
by people I haven't noticed as core Emscripten contributors before. Are
they? If not, what is the motivation for adding Emscripten-specific code
there?

As to why all these changes are necessary: I assume that a C++-written
backend is “obviously” the fastest way to generate target-specific code
from IR. I'd have assumed that it should be possible to implement such a
backend as a plug-in, or as a separate program linking against the LLVM
libs. Why hasn't that been done, but the source tree been modified instead?
Is it because that's simply the easiest way to ensure all versions and
paths and stuff match up? Or is it because this should end up in vanilla
LLVM one day, so it might as well be written to that code tree and kept in
sync with it? Or is vanilla LLVM simply not flexible enough to allow for
such a backend without modifications to other places as well?

The distinction between compile time and link time is particularly
important when dealing with other front ends. What if I want to compile
Fortran, D, or any other language to JavaScript? Is it enough to generate
LLVM code using any frontend, and feed that to Emscripten? Can I safely
ignore the warning that I'm linking IR code generated for different
triples? Or should I try to compile each such frontend against the LLVM
library? Would that even be enough, or are the modifications to clang
essential enough that compiling other languages this way would be
problematic? (I've recentry investigated ways to compile LAPACK
<https://github.com/kripken/emscripten/issues/998#issuecomment-110954873>
using Dragonegg <http://dragonegg.llvm.org/>, and even though Dragonegg looks
pretty much dead
<https://github.com/llvm-mirror/llvm/commit/3d09e0e55093ecb569a5c700> and
flang <https://github.com/isanbard/flang> still a fetus, I wonder whether
this could work for more than the single example function I've tried so
far. My first contributions <https://github.com/kripken/emscripten/pull/841>
to Emscripten where when I tried to feed ldc output to it, and I'm still
interested in compiling D to JavaScript.)

The question of whether Emscripten changes will (aim to) be merged into
llvm has been raised on this list before
<https://groups.google.com/d/msg/emscripten-discuss/_kIjDA3oKDY/U4UsxyIhBPcJ>,
but the response only talked about rebasing, which isn't the same thing as
I see it. Are there plans to one day have all the JS-specific changes in
vanilla LLVM, and to have Emscripten simply compile against that? If not,
is some alternative like a plugin or similar envisioned? Having multiple
copies of LLVM lying around doesn't feel good in the very long term.

I hope my curiosity hasn't annoyed anyone too much. Thanks for reading, and
thank you even more for any insights you can provide.

Greetings,
Martin von Gagern

--
You received this message because you are subscribed to the Google Groups
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Divergence from LLVM upstream

Reply via email to