Hi all,

As you may already know, I've been working on a project I've called
Ocean.  The summary is that I want to create a source-level
replacement for GNU C that provides language safety, introspection,
user-extensible syntax/semantics, and sane concurrency support.  I
will be using the COLA infrastructure to build the compiler, and I
want to help produce useful modules that can be integrated in the
basic COLA system.

I intend to provide the majority of Ocean under the GPLv2, but I will
be amenable to relicensing portions of it under MIT (especially
infrastructure that might be useful to COLA in general).  I want to
have an open development process, and will be using github to host the
project.

The core language would be mostly-backwards-compatible with C (and in
the future, possibly C++ and/or Objective-C).  However, the ABI is
rather different, so in order to use Ocean with a C library, you would
have to recompile that library.  Some wizard pointer manipulation
would not compile or maybe not execute under Ocean (since it is a safe
language), but I think that would not be a problem for most
applications, especially if they can put an "#ifdef __OCEAN__" in the
needed places.

Below is a laundry list of the features I would like to provide.  I
would appreciate any feedback you have to offer.

Thanks,

-- 
Michael FIG <[EMAIL PROTECTED]> //\
   http://michael.fig.org/    \//

Ocean Core Language
*******************

Michael FIG <[EMAIL PROTECTED]>, 2008-09-05

These are the features of Ocean's ABI, and that work with Ocean's
default C-like core language.  Much of the design here is inspired by
the COLA and Erlang systems.

* Wide oops (object-oriented pointers)

There is no such thing as a pointer containing an arbitrary address.
Every pointer is an oop: it points to a valid object + offset, and is
associated with a compile-time size as in C (with the exception of the
"void *" oop).  This is accomplished by making oops a double word with
the high word as the base address of an object, and the low word is an
offset to be added.  The compiler forbids the conversion of any value
type to an oop, but allows oop-to-integer conversions for
compatibility with C.  Oop arithmetic has the same semantics as C
pointer arithmetic, but properly preserves the base object address
instead of mixing it with the offset.

* Runtime object metadata

Every object has an oop header in the double-word immediately
preceding the base pointer which indicates a metadata object.  The
metaobject describes any extra oop object headers (such as are needed
to preserve the allocated size, field layout, object vtable, locks,
versioning, ownership, etc).  All functions also have an oop metadata
header.

* Object safety

Every metaobject declares a read/write barrier, which the compiler
forces clients to use.  An optional metaobject layer can, for example,
validate all object access (no indexing off the beginning of an object
or the end of an array or assigning a value type to a pointer member).
More aggressive layers can implement object permissions, such as
denying access based on the caller's context.  These barriers can also
provide hooks for garbage collection.

There are also no uninitialized variables.  Stack variables are zeroed
before the frame is entered, just like heap allocations.

* Discriminated unions

A layout function must be declared for every union that contains both
value types and oops, so that its oops can be correctly located.

* Malloc support

Malloc returns a zero-filled object with "unknown layout" in its oop
header.  A call to the Ocean primitive "layout(struct MyStruct, ptr)"
updates the object at "ptr" to have the layout corresponding to the
named type.  The following code fragment can allow C compatibility:

   #ifndef __OCEAN__
   # define layout(TYPE, PTR) ((TYPE *)(PTR))
   #endif

The C++ "new" operator will probably be introduced as a
non-C-compatible Ocean extension.

* Precise GC

Copying garbage collection is possible because every object has an
associated layout, so pointers can be identified and updated whether
on the stack or in malloced memory.  Oops make this possible by
allowing the garbage collector to alter the base but not the offset of
each pointer when relocating an object.

* Tasklets and kernel threads (NxM threading)

Every function receives a hidden oop argument to chain stack frames
together and provide stack and thread-specific data.  Tasklet creation
and manipulation functions are available.

Tasklets can be declared as having a reserved kernel thread (i.e. no
other tasklet runs on that thread).  The compiler inserts rescheduling
requests into code so that CPU-bound tasklets don't block other
tasklets.  Otherwise, rescheduling is requested at every "receive"
(see below).  Tasklets by default run in a thread pool, with the at
least one thread, and at most the number of physical cores allocated
to the application by the system administrator (default all cores)
minus the reserved threads.  I/O requests (i.e. blocking system calls)
are performed by sending a message to a tasklet that is running in a
special I/O thread pool, then waiting to receive a result message from
the I/O thread.

The generated code tracks stack usage so that tasklets can be created
with a tiny stack object, and larger stack objects can be added as
necessary.  Large stack objects could be reclaimed if the stack space
becomes unused.

* Message passing

Every tasklet has a private mailbox.  Tasklets can send messages
asynchronously, and wait for messages from other sources with a
timeout in milliseconds.  The "send" construct recursively changes the
ownership of an oop and places it in the specified tasklet's mailbox.
The "receive" construct loops through messages in the current
tasklet's mailbox, evaluating the body until a "break".  If there was
no "break" and the timeout is nonzero, it waits that many milliseconds
for more incoming messages for the body to process.  If the timeout
expires without reaching a "break", then the message is set to NULL.
If there was a "break", the current message is removed from the
mailbox.  Message order is preserved by the "receive" construct.

   /* Append MY_MSG to tasklet1's mailbox. */
   send(tasklet1, my_msg);

   void *msg;
   receive (msg, 0) /* Only process messages already in our queue (0ms). */
   {
      /* If the message matches, exit the receive clause. */
      if (((MyMsg *)msg)->zot == 123) break;
   }
   receive (msg, 1000) break; /* Receive any message within one second. */
   receive (NULL, 1000); /* Wait one second. */

* Software Transactional Memory

Rather than using locks, STM allows the programmer to start a
transaction, read and write objects without affecting other tasklets,
then atomically commit or roll back the transaction.  Again, this is
made possible with wide oops and read/write barriers.  Each object can
be "owned" by a given tasklet (no other tasklet is allowed to touch it
directly: the write barrier prevents it), and if there is no owner the
STM metaobject layer only allows write access from within a
transaction.

* Metaprogramming

Everything within an "ifdef __OCEAN_META__" is evaluated as COLA code
at compile time.  This is how the Ocean compiler can be modified to
extend syntax or semantics.

   #ifdef __OCEAN_META__
   (printf "this is COLA code!\n")
   #endif

End.

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Reply via email to