Re: [pypy-dev] Compiling PyPy interpreter without GC

Kunshan Wang Thu, 19 Mar 2015 20:52:57 -0700

Hi Maciej,

I am a PhD student at the Australian National University. I am a
colleague of John Zhang and the chief designer of the Mu project, a
micro virtual machine. (http://microvm.org) I can introduce this project
to this mailing list.

TL;DR: Implementing a managed language is hard. Existing VMs are either
too big or too monolithic. Mu, the micro VM, only handles concurrency,
JIT and GC, which an average language designer probably don't want to
(and is unlikely to have the expertise to) work with. Then a "client"
program can use Mu to implement its language.

Our group at the ANU coined the concept "micro virtual machines" or
"micro VM", which is a parody the concept of "micro kernel" in the OS
literature. Mu is a concrete micro VM (the same way "seL4 is a concrete
microkernel". Mu was previously called MicroVM or µVM, but we changed
the name to distinguish between "our particular micro VM" and "the
general concept of micro virtual machines".

Our motivation: Many managed programming language implementations suck,
such as CPython which uses GIL, naive reference counting GC and no JIT.
People may want to create alternative implementations. Currently they
either base on another (macro) VM or build a new VM from scratch.
Existing VMs (like JVM, CLR, etc.) usually provide too much abstraction.
They cause semantic gaps and a lot of unnecessary dependencies, while
still do not work well with languages they are not designed for (See how
PyPy outperforms Jython). Others (including PyPy) build a whole new VM,
doing everything from scratch. There are many "high-performance VM"
projects like PyPy (LuaJIT, v8, JavaScriptCore, HHVM to name a few), but
the most important low-level parts (JIT, GC, ...) are not reused.

We proposed a third option. A "micro virtual machine" is minimal,
providing only the abstractions of concurrency, execution (JIT) and
garbage collection, which we identify as the three major concerns that
make language implementation hard. Another program which we call a
"client" sits above the micro VM and implements the concrete language.

Here are some facts about Mu, our concrete micro VM:

1. It has a specification which defines the behaviour, and multiple
implementations are allowed. We also have a reference implementation.

2. The architecture has two layers: a micro VM at the bottom, and a
"client" on top of it. The client interacts with the micro VM through an
API.

3. Its type system is similar with the level of C, but with object
references. The level is similar to the LL type system in PyPy. It has
fixed integers, FP numbers, references, structs, arrays, ..., but no
Java-like object hierarchy. (The micro VM is minimal and does not know
OOP. The client implements its own types using structs, arrays, ...)

4. Its instruction set is LLVM-like, using the SSA form. But it is
designed for managed languages.

5. Heap memory allocation is a primitive instruction. Heap memory is
garbage-collected.

6. Mu has a garbage collector. All references in Mu can be identified,
whether they are in the heap, stack, global variables or held for the
client (in which case the reference is exposed as opaque handles, like
JNI). With all references identified, Mu can perform precise (accurate,
exact) garbage collection.

7. Mu has threads which can run simultaneously (supposed to be
implemented as OS threads, but the spec does not force it). Mu also has
a C++11-like memory model. (Yes. Mu has RELAXED, CONSUME, ACQUIRE,
RELEASE, ACQ_REL, SEQ_CST to scare your children.)

8. Mu's code loading unit is "bundle" (like the ".class" file in Java).
The format is called "Mu IR" (like LLVM IR) which contain codes and
top-level definitions. The client can define new codes at run time. (You
can JIT-compile high-level programs to Mu IR at run time.)

9. Mu has the "TRAP" instruction which pauses the execution of a Mu IR
program and execute a "handler" in the client. The "handler" can
introspect the states of the execution (including local variables) and
perform OSR (on-stack replacement, removing a stack frame and pushing a
new frame of a probably newly-compiled function). After the handler, the
client decide where the thread should continue.

10. Mu has a simple exception handling mechanism. It does not rely on
system libraries.

John Zhang is working on making RPython a front-end language of Mu (in
other words, making Mu a back-end of RPython). We think the LL type
system is roughly at the same level as Mu, but since Mu already has
reference types, heap allocation instruction and an internal garbage
collector, RPython no longer need to insert low-level implementation
details for GC (like read/write barriers, GC-safe points, stack maps and
so on). However, Mu does not expose the memory directly as bytes. For
this reason, some implementation strategies are no longer applicable in
Mu. For example, array copy must be done element by element, and memcpy
cannot be used.

Regards,
Kunshan Wang

On 18/03/2015 7:57 pm, Maciej Fijalkowski wrote:
> Hi John.
> 
> Can you describe the microVM and it's capabilities? Chances are it
> captures things at the wrong level (I have a longer response in mind,
> but I'll wait for you to describe it, in case I'm plain wrong)
> 
> What do you mean by "provides a GC"? Does it mean you just call malloc
> and you never have to call free?
> 
> Generally speaking we don't suggest you translate pypy as a first
> step, but instead write tests (equivalent to what's in
> translator/c/test) and check aspects of translation one bit at a time.
> That said, dependency on rweakref even when disabled is a bug, can you
> post a full traceback?
> 
> Cheers,
> fijal
> 
> 
> 
> 
> 
> On Wed, Mar 18, 2015 at 2:01 AM, John Zhang <u5157...@uds.anu.edu.au> wrote:
>> Hi all,
>>     I'm working on developing a MicroVM backend for PyPy. It's a virtual
>> machine under active research and development by my colleagues in ANU. It
>> aims to capture GC, threading and JIT in the virtual machine, and frees up
>> the burden of the language implementers.
>>
>>     Since MicroVM provides GC, I need to remove GC from the PyPy
>> interpreter. As I was trying to compile it with the following command:
>>     pypy $PYPY/rpython/bin/rpython \
>>           -O0 \
>>           --gc=none \
>>           --no-translation-rweakref \
>>           --annotate \
>>           --rtype \
>>           --translation-backendopt-none \
>>           $PYPY/pypy/goal/targetpypystandalone.py
>>     It gives off an error during annotation stage, saying that it's not able
>> to find a module called '_rweakref'.
>>     Does anyone know what the problem might be, and how one might go and
>> solve it?
>>
>>     Appreciate greatly,
>>     John Zhang
>> _______________________________________________
>> pypy-dev mailing list
>> pypy-dev@python.org
>> https://mail.python.org/mailman/listinfo/pypy-dev
> _______________________________________________
> pypy-dev mailing list
> pypy-dev@python.org
> https://mail.python.org/mailman/listinfo/pypy-dev
>

signature.asc
Description: OpenPGP digital signature

_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev

Re: [pypy-dev] Compiling PyPy interpreter without GC

Reply via email to