On Jun 14, 2014, at 2:45 PM, Robbert van Dalen wrote: > [...] >>> 4) can tuple strengthening strategies be re-used for other classes (such as >>> “optional”) without incorporating the tuple as a field? >> >> A guillemet group can already be tied to an argument of type <cheese…|0..1>, >> which can be accessed via "x[1] else [tofu]". Technically, the tuple still >> has to be constructed, but when we start inlining non-primitives (hopefully >> in a few months), the intermediate tuples will never actually have to get >> built. But that's just transient objects. The storage doesn't really >> matter for objects that'll only last a microsecond. If we really have a >> *lot* of optional values that have to reside in memory simultaneously, they >> might best be represented by tuple types like <cheese…|0..1>. Or some sort >> of explicit optional primitive type with its own family of descriptors. > > now i wonder what’s the memory footprint of a single element tuple?
Oh... I just noticed I forgot to include the descriptor field for each
AvailObject, so the 160 should account for four more Java pointers, at 8 bytes
each, so 192 bytes for the entire optional.
Like all AvailObjects, a tuple has the usual Java header and two slots, one for
the array of AvailObjects and one for the array of ints. An ordinary tuple
(not one containing, say, a character or a nybble) has one object slot and one
integer slot, so both arrays are necessary. When there are zero object slots
or integer slots in an AvailObject, the single shared empty array of objects or
the single shared empty array of ints is used instead of creating one
especially for that AvailObject. However, since we need one slot of each for
the one-element ordinary tuple (whose descriptor is an ObjectTupleDescriptor),
we have:
3 object headers (AvailObject, AvailObject [], int []) @ 8 bytes each,
3 slots in the AvailObject itself (descriptor, objectSlots, intSlots) @
8 bytes each
1 slot in the int [] @ 4 bytes (to cache the tuple's hash value)
1 slot in the AvailObject [] @ 8 bytes (the tuple element)
So a one-element tuple takes 60 bytes. On the other hand, the zero element
tuple is shared and takes no additional space per use (the slot holding it
would be accounted with whatever structure owns the slot).
Compressed pointers would shave it to 44 bytes. The custom representation
we'll be using with LLVM will look like:
...before the object...
[tuple element #1] @ 8 bytes
[middle-header] @ 8 bytes
[int slot #1 for hash] @ 4 bytes
...after the object...
= 20 bytes.
The middle-header is aligned the same as a pointer, but has the lowest bit set
to distinguish it from the (4-byte aligned) pointers that precede it. It
contains a size field and a table subscript for the descriptor. All pointer
slots come before the middle-header, and all int slots come after. This allows
us to scan a space: Process pointers until you hit a header whose low bit is
set, then use the embedded size field to skip N ints, then repeat. All
pointers point to the middle-headers of objects. The descriptor subscript can
be extracted at offset 0 (with a mask and/or shift, to be determined), which
can then be looked up in an array of all descriptors. Object slots occur at
negative indices (header - 8, header - 16, etc.), with the count encoded in the
header, if necessary. Int slots occur at positive indices (header + 8, header
+ 12, etc.), with the count always encoded the same way in the header (to allow
quick scanning). Right-to-left scanning of a space is not supported.
If we choose a compressed pointer representation instead, we should be able to
shrink it to 12 bytes (4-byte pointers and headers), which is certainly
competitive with existing 32-bit memory layouts. We could even squish it down
to 8 bytes (16 with big pointers) by eliminating the hash field, but then we
would have to replace the tuple with an indirection when we wanted its hash
value -- or we'd have to compute it every time, like in all those lame
languages. At least we'd be guaranteed to get the same value every time.
> don’t get me wrong: i like avail’s strong stance towards (runtime) types.
> because by doing so, avail blurs the difference between compile and runtime -
> which is very powerful.
> but what are the true benefits to keep hold of types at runtime?
> avail usually doesn’t erase types, but at what (memory) cost?
For tracking types at runtime, most values don't go to more effort than
possessing a descriptor plus the usual data fields that you'd expect. This is
a bit surprising, considering that we effectively track precise types, but the
key here is that we track *perfectly* precise types. The type of any value is
an enumeration type whose elements consist of exactly that one value.
Therefore we can always get to the exact type whenever we want it by simply
constructing it. Types don't have identity, so we're not forced to stash a
type pointer in the object. The types don't form an explicit graph structure
with actual pointers between them. Instead, the descriptors describe how to
check whether a value is a member of a type, and how a type is related to
another type.
In a functional language, the best you can ever hope to get is tagged unions
whose membership is explicitly listed in one place. The tag can in theory be
crammed into the pointers in some cases, but I assume a separate word is
dedicated for this purpose instead. So a list has to be a Cons struct or a Nil
struct, and nothing else, ever, no matter how you attempt to extend or reuse a
program. Given those insane limitations, you can actually get by with less
than one bit of information to specify this rudimentary "type" field. Just
allocate a unique memory location to act as the Nil object, since it contains
no state or identity. Then the tag test simply compares the address of the
struct itself against the Nil struct's address. But these two "types" of
objects form an island in the type system (after erasure). There are no
circumstances in which it's even syntactically possible to have something
that's Cons-like, Nil-like, or Foo-like. Cons and Nil are basically a type
system unto themselves, completely isolated from all other types, so it's not
surprising that such a uselessly limited type scheme can be represented
efficiently. In that sense, one could just as correctly say that Java has an
even better representation of zero-bit tags for its primitive types like char
and int. There can never be a place in Java code that doesn't know statically
whether something is an int or not – not only doesn't genericity extend there,
but neither does the rest of the type system! A goal of Avail is to allow all
values to have types that are interoperable with all the other types. I don't
mean they're compatible, or that a cheeseburger can be passed where an int is
required, like in JavaScript and Smalltalk. What I mean is that there are
operations *on* types, that work for all types. And that genericity as a
concept operates with all types. So in Avail you can write higher-order
functions that operate on arbitrary kinds of values without resorting to
cloning source code – and unlike in functional languages, in Avail you can do
*useful* things with the objects, like add numbers together, create iterators
on arbitrary collections, or compare things for equality without knowing which
of exactly three types (tags) they might be. Functional languages require you
to describe... modules? Is that the terminology they use? Anyhow, they
require you to declare a mumble-mumble to say how you're adapting some struct
to behave like some other struct via some homomorphic map of operations. It's
really just feature renaming, of course. And lots of code cloning. And extra
runtime data structures eating up space – not that I think that matters in a
language that has to be tricked, over and over, every time you want to do
something as simple as loop.
A statically typed system assigns a type approximation to each expression of
the language. Sometimes those type approximations are not types (genericity
with erasure), but they should be related to the actual types in some way.
Some languages like C and early Ada constrained values to have exact types.
That's still static typing, but it prevents object-oriented use of such a
system by forbidding runtime polymorphism.
A dynamically typed system is a system in which objects have types that are not
closed. They may be potentially limited to a subset of the type system (if
static typing is also present), but not *necessarily* limited to those types
which are listed in some particular piece of code. The ability to extend at
least some of the types of a system is essential, as is the necessity of having
expressions in the language whose precise runtime type is not known. C does
not have this, but C++ does. JavaScript has this, but has no static typing
whatsoever. Functional languages do not have this, since its types are tiny
islands of tightly-coupled tags masquerading as types. Machine language has
neither static nor dynamic types for the most part, although technically...
integral data is usually tagged statically with size information.
Avail has both static and dynamic typing, of course, and never erases type
information. Erasure always shows up as a compromise – it makes the compiler
writer's job easier (especially when retrofitting a crappy form of genericity
onto an existing crappy type system like Java's), but its limitations show up
almost every day when you use a language that has erasure.
> avail usually doesn’t erase types, but at what (memory) cost?
> my guess is: to enable the runtime reflection/introspection of objects, and
> their genuine types and thus intent.
While we do type manipulation at compile time, types are just values like
anything else in Avail. But I don't feel like we're doing a lot of reflection
per se, since I normally think of reflection as an "escape" mechanism from a
language to accomplish something that wasn't designed in.
>> Over the last few months Todd has proposed a reasonably safe version of the
>> null concept
>
> that’s very interesting. explicit null typing is the holy grail of safety,
> etc.
We've had the @Nullable and @NotNull annotations in Java/Eclipse as a terrific
model of how not to do it. Let's see, primitive types, fiat object types,
arrays, erased genericity, annotations, and now specifically null annotations.
That's six Java type systems that have almost nothing to do with each other,
each of which fails pretty badly even at its own purpose.
>>> my guess is that using a tuple as a field is (currently) wasteful, but that
>>> it is the only easy way to strengthen “optional” via tuple strengthening.
>>
>> The tuple type above should be sufficient.
>
> i ask again: what’s the memory footprint of a single element tuple?
> if it is too big, i would suggest adding an “optional” primitive to the vm,
> as a viable option :)
Just to reiterate, it's currently 60 but will be 20 without doing anything more
than reworking AvailObject within LLVM. AvailObject has been a C struct, a
Smalltalk class, a C++ class mostly generated from Smalltalk at the push of a
button, and a Java class. Its representation has changed more subtly dozens of
times without breaking the semantics of Avail, and I'm not expecting the jump
to LLVM to be any more hazardous. We've fought hard to get the descriptor
classes as stable and efficient as they are, and a mere substrate change isn't
going to change that. We know where the value is in the VM, and we're not
going to pull a NetScape (throwing away all their code!), no matter how
drastically we rework the VM.
And as an alternative to wrapping things in a tuple of size 0..1, one could
just omit a field entirely from the containing object. We know how to make
dispatching in that kind of situation very fast, so at this point it's just a
simple matter of programming. So if performance per gigahertz (or per second,
Watt, or dollar) isn't a deep criterion for success for an Avail program at the
moment, I suggest that when you draw back your bow, you aim your practice of
Avail at the place that we'll be in a few years. Decent performance *without*
all the usual compromises (devil's bargains) found in other languages. Don't
pre-compromise the code you write today just because we didn't wait a few more
years to let the world know what we're creating. :-)
-Mark
signature.asc
Description: Message signed with OpenPGP using GPGMail
