Re: [Avail] avail's memory model

Mark van Gulik Sat, 14 Jun 2014 18:17:24 -0700

On Jun 14, 2014, at 2:45 PM, Robbert van Dalen wrote:
> [...]
>>> 4) can tuple strengthening strategies be re-used for other classes (such as 
>>> “optional”) without incorporating the tuple as a field?
>> 
>> A guillemet group can already be tied to an argument of type <cheese…|0..1>, 
>> which can be accessed via "x[1] else [tofu]".  Technically, the tuple still 
>> has to be constructed, but when we start inlining non-primitives (hopefully 
>> in a few months), the intermediate tuples will never actually have to get 
>> built.  But that's just transient objects.  The storage doesn't really 
>> matter for objects that'll only last a microsecond.  If we really have a 
>> *lot* of optional values that have to reside in memory simultaneously, they 
>> might best be represented by tuple types like <cheese…|0..1>.  Or some sort 
>> of explicit optional primitive type with its own family of descriptors.
> 
> now i wonder what’s the memory footprint of a single element tuple?


Oh... I just noticed I forgot to include the descriptor field for each 
AvailObject, so the 160 should account for four more Java pointers, at 8 bytes 
each, so 192 bytes for the entire optional.

Like all AvailObjects, a tuple has the usual Java header and two slots, one for 
the array of AvailObjects and one for the array of ints.  An ordinary tuple 
(not one containing, say, a character or a nybble) has one object slot and one 
integer slot, so both arrays are necessary.  When there are zero object slots 
or integer slots in an AvailObject, the single shared empty array of objects or 
the single shared empty array of ints is used instead of creating one 
especially for that AvailObject.  However, since we need one slot of each for 
the one-element ordinary tuple (whose descriptor is an ObjectTupleDescriptor), 
we have:

        3 object headers (AvailObject, AvailObject [], int []) @ 8 bytes each,
        3 slots in the AvailObject itself (descriptor, objectSlots, intSlots) @ 
8 bytes each
        1 slot in the int [] @ 4 bytes (to cache the tuple's hash value)
        1 slot in the AvailObject [] @ 8 bytes (the tuple element)

So a one-element tuple takes 60 bytes.  On the other hand, the zero element 
tuple is shared and takes no additional space per use (the slot holding it 
would be accounted with whatever structure owns the slot).

Compressed pointers would shave it to 44 bytes.  The custom representation 
we'll be using with LLVM will look like:

        ...before the object...
        [tuple element #1] @ 8 bytes
        [middle-header] @ 8 bytes
        [int slot #1 for hash] @ 4 bytes
        ...after the object...

= 20 bytes.

The middle-header is aligned the same as a pointer, but has the lowest bit set 
to distinguish it from the (4-byte aligned) pointers that precede it.  It 
contains a size field and a table subscript for the descriptor.  All pointer 
slots come before the middle-header, and all int slots come after.  This allows 
us to scan a space:  Process pointers until you hit a header whose low bit is 
set, then use the embedded size field to skip N ints, then repeat.  All 
pointers point to the middle-headers of objects.  The descriptor subscript can 
be extracted at offset 0 (with a mask and/or shift, to be determined), which 
can then be looked up in an array of all descriptors.  Object slots occur at 
negative indices (header - 8, header - 16, etc.), with the count encoded in the 
header, if necessary.  Int slots occur at positive indices (header + 8, header 
+ 12, etc.), with the count always encoded the same way in the header (to allow 
quick scanning).  Right-to-left scanning of a space is not supported.

If we choose a compressed pointer representation instead, we should be able to 
shrink it to 12 bytes (4-byte pointers and headers), which is certainly 
competitive with existing 32-bit memory layouts.  We could even squish it down 
to 8 bytes (16 with big pointers) by eliminating the hash field, but then we 
would have to replace the tuple with an indirection when we wanted its hash 
value -- or we'd have to compute it every time, like in all those lame 
languages.  At least we'd be guaranteed to get the same value every time.


> don’t get me wrong: i like avail’s strong stance towards (runtime) types. 
> because by doing so, avail blurs the difference between compile and runtime - 
> which is very powerful.
> but what are the true benefits to keep hold of types at runtime?
> avail usually doesn’t erase types, but at what (memory) cost?

For tracking types at runtime, most values don't go to more effort than 
possessing a descriptor plus the usual data fields that you'd expect.  This is 
a bit surprising, considering that we effectively track precise types, but the 
key here is that we track *perfectly* precise types.  The type of any value is 
an enumeration type whose elements consist of exactly that one value.  
Therefore we can always get to the exact type whenever we want it by simply 
constructing it.  Types don't have identity, so we're not forced to stash a 
type pointer in the object.  The types don't form an explicit graph structure 
with actual pointers between them.  Instead, the descriptors describe how to 
check whether a value is a member of a type, and how a type is related to 
another type.

In a functional language, the best you can ever hope to get is tagged unions 
whose membership is explicitly listed in one place.  The tag can in theory be 
crammed into the pointers in some cases, but I assume a separate word is 
dedicated for this purpose instead.  So a list has to be a Cons struct or a Nil 
struct, and nothing else, ever, no matter how you attempt to extend or reuse a 
program.  Given those insane limitations, you can actually get by with less 
than one bit of information to specify this rudimentary "type" field.  Just 
allocate a unique memory location to act as the Nil object, since it contains 
no state or identity.  Then the tag test simply compares the address of the 
struct itself against the Nil struct's address.  But these two "types" of 
objects form an island in the type system (after erasure).  There are no 
circumstances in which it's even syntactically possible to have something 
that's Cons-like, Nil-like, or Foo-like.  Cons and Nil are basically a type 
system unto themselves, completely isolated from all other types, so it's not 
surprising that such a uselessly limited type scheme can be represented 
efficiently.  In that sense, one could just as correctly say that Java has an 
even better representation of zero-bit tags for its primitive types like char 
and int.  There can never be a place in Java code that doesn't know statically 
whether something is an int or not – not only doesn't genericity extend there, 
but neither does the rest of the type system!  A goal of Avail is to allow all 
values to have types that are interoperable with all the other types.  I don't 
mean they're compatible, or that a cheeseburger can be passed where an int is 
required, like in JavaScript and Smalltalk.  What I mean is that there are 
operations *on* types, that work for all types.  And that genericity as a 
concept operates with all types.  So in Avail you can write higher-order 
functions that operate on arbitrary kinds of values without resorting to 
cloning source code – and unlike in functional languages, in Avail you can do 
*useful* things with the objects, like add numbers together, create iterators 
on arbitrary collections, or compare things for equality without knowing which 
of exactly three types (tags) they might be.  Functional languages require you 
to describe... modules?  Is that the terminology they use?  Anyhow, they 
require you to declare a mumble-mumble to say how you're adapting some struct 
to behave like some other struct via some homomorphic map of operations.  It's 
really just feature renaming, of course.  And lots of code cloning.  And extra 
runtime data structures eating up space – not that I think that matters in a 
language that has to be tricked, over and over, every time you want to do 
something as simple as loop.

A statically typed system assigns a type approximation to each expression of 
the language.  Sometimes those type approximations are not types (genericity 
with erasure), but they should be related to the actual types in some way.  
Some languages like C and early Ada constrained values to have exact types.  
That's still static typing, but it prevents object-oriented use of such a 
system by forbidding runtime polymorphism.

A dynamically typed system is a system in which objects have types that are not 
closed.  They may be potentially limited to a subset of the type system (if 
static typing is also present), but not *necessarily* limited to those types 
which are listed in some particular piece of code.  The ability to extend at 
least some of the types of a system is essential, as is the necessity of having 
expressions in the language whose precise runtime type is not known.  C does 
not have this, but C++ does.  JavaScript has this, but has no static typing 
whatsoever.  Functional languages do not have this, since its types are tiny 
islands of tightly-coupled tags masquerading as types.  Machine language has 
neither static nor dynamic types for the most part, although technically... 
integral data is usually tagged statically with size information.

Avail has both static and dynamic typing, of course, and never erases type 
information.  Erasure always shows up as a compromise – it makes the compiler 
writer's job easier (especially when retrofitting a crappy form of genericity 
onto an existing crappy type system like Java's), but its limitations show up 
almost every day when you use a language that has erasure.


> avail usually doesn’t erase types, but at what (memory) cost?
> my guess is: to enable the runtime reflection/introspection of objects, and 
> their genuine types and thus intent.

While we do type manipulation at compile time, types are just values like 
anything else in Avail.  But I don't feel like we're doing a lot of reflection 
per se, since I normally think of reflection as an "escape" mechanism from a 
language to accomplish something that wasn't designed in.


>> Over the last few months Todd has proposed a reasonably safe version of the 
>> null concept 
> 
> that’s very interesting. explicit null typing is the holy grail of safety, 
> etc.

We've had the @Nullable and @NotNull annotations in Java/Eclipse as a terrific 
model of how not to do it.  Let's see, primitive types, fiat object types, 
arrays, erased genericity, annotations, and now specifically null annotations.  
That's six Java type systems that have almost nothing to do with each other, 
each of which fails pretty badly even at its own purpose.


>>> my guess is that using a tuple as a field is (currently) wasteful, but that 
>>> it is the only easy way to strengthen “optional” via tuple strengthening.
>> 
>> The tuple type above should be sufficient.
> 
> i ask again: what’s the memory footprint of a single element tuple?
> if it is too big, i would suggest adding an “optional” primitive to the vm, 
> as a viable option :)

Just to reiterate, it's currently 60 but will be 20 without doing anything more 
than reworking AvailObject within LLVM.  AvailObject has been a C struct, a 
Smalltalk class, a C++ class mostly generated from Smalltalk at the push of a 
button, and a Java class.  Its representation has changed more subtly dozens of 
times without breaking the semantics of Avail, and I'm not expecting the jump 
to LLVM to be any more hazardous.  We've fought hard to get the descriptor 
classes as stable and efficient as they are, and a mere substrate change isn't 
going to change that.  We know where the value is in the VM, and we're not 
going to pull a NetScape (throwing away all their code!), no matter how 
drastically we rework the VM.

And as an alternative to wrapping things in a tuple of size 0..1, one could 
just omit a field entirely from the containing object.  We know how to make 
dispatching in that kind of situation very fast, so at this point it's just a 
simple matter of programming.  So if performance per gigahertz (or per second, 
Watt, or dollar) isn't a deep criterion for success for an Avail program at the 
moment, I suggest that when you draw back your bow, you aim your practice of 
Avail at the place that we'll be in a few years.  Decent performance *without* 
all the usual compromises (devil's bargains) found in other languages.  Don't 
pre-compromise the code you write today just because we didn't wait a few more 
years to let the world know what we're creating.  :-)

-Mark

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [Avail] avail's memory model

Reply via email to