On Sat, Mar 28, 2009 at 8:24 PM, Marvin Humphrey <[email protected]> wrote:
> On Fri, Mar 27, 2009 at 08:21:54AM -0400, Michael McCandless wrote:
>
>> On Fri, Mar 27, 2009 at 1:08 AM, Marvin Humphrey <[email protected]>
>> wrote:
>>
>> > I think I have an approach that's going to allow us to eliminate FastObj:
>> > We lazily create the host object, and treat a NULL host_obj as
>> > semantically equivalent to a refcount of 1.
>
> I'm happy to report that this approach succeeded. FastObj is now history. :)
Most excellent! Though we do need to figure out how tracing GC languages play.
>> Much of this is beyond me, but...
>
> Hopefully we will soon get to the point where that's no longer the case.
>
> Our KS prototype now has only one object model. I'll describe how it works
> for refcounting hosts like Perl and Python.
>
> ---
>
> Every Obj is a struct with a VTable and a host object as its first two
> members:
>
> struct Obj {
> VTable *vtable;
> void *host_obj;
> };
>
> struct VArray {
> VTable *vtable;
> void *host_obj;
> Obj **elems;
> u32_t size;
> u32_t capacity;
> }
>
> When any Lucy object is created, self->host_obj is NULL.
>
> Here's are some simplified sample constructors, for Lucy::Obj, and our
> variable-sized array class Lucy::Util::VArray:
>
> Obj*
> Obj_new() {
> Obj *self = (Obj*)malloc(sizeof(Obj));
> self->vtable = (VTable*)VTable_Inc_RefCount(&OBJ);
> self->host_obj = NULL;
> return self;
> }
>
> VArray*
> VA_new(u32_t capacity)
> {
> VArray *self = (Obj*)malloc(sizeof(VArray));
> self->vtable = (VTable*)VTable_Inc_RefCount(&VARRAY);
> self->host_obj = NULL;
> self->elems = (Obj**)calloc(capacity * sizeof(Obj*));
> self->size = 0;
> self->capacity = capacity;
> return self;
> }
>
> Note that the VTable for the Obj class is OBJ, and the VTable for VArray is
> VARRAY. The same pattern holds true for other classes: TermScorer's VTable is
> TERMSCORER, etc.
OK.
Are VTables also considered Obj's? Ie, you are Inc/DecRef'ing them --
will a VTable be destroyed when it DecRefs to 0?
Do VTables only store methods? Or can they store fields as well?
Can an arbitrary Obj at runtime become a VTable for another Obj?
(True "prototype" programming language). Seems like "no", because an
arbitrary Obj is not allowed to add new members wrt its parent (only
new VTables can do so).
Does VARRAY (VTable for VArray objects) hold a reference to OBJ? How
are these trees of VTables init'd?
And Lucy objs are single inheritance.
(NOTE: I'm just "probing" with these questions... I'm certainly not
implying Lucy should have / needs 1all of these capabilities).
> Here are corresponding destructors for Obj and VArray:
>
> void
> Obj_destroy(Obj *self)
> {
> VTable_Dec_RefCount(self->vtable);
> free(self);
> }
>
> void
> VA_destroy(VArray *self)
> {
> u32_t i;
> for (i = 0; i < self->size, i++) {
> if (self->elems[i]) {
> Obj_Dec_RefCount(self->elems[i]);
> }
> }
> free(self->elems);
> Obj_destroy((Obj*)self); /* super */
> }
OK, it's making sense.
So a VArray is allowed to have C NULLs in its elems array (vs say Java
which always inits the array to hold Java null's). Is there an
explicit object in Lucy that represents null (java), None (Python),
etc.?
> Two items of note about the destructors:
>
> First, note that the destructor for VArray invokes the destructor of its
> parent class, Obj. This superclass call makes it possible for us to add
> members to a base class without having to manually edit all subclasses.
Great.
> Second, there is no mention whatsoever of self->host_obj in the destructor.
> That's because there are only two paths into the destructor, and both of them
> avoid the need for Lucy core code to worry about the cached host object.
>
> 1) The cached host object was never created so it doesn't need to be
> cleaned up.
> 2) Destroy() is being invoked from host-space via e.g. Pythons "__del__"
> method, and after it returns the host will clean up the host object
> itself.
I'm a bit confused: what if you have a Lucy obj, that's got a cached
host obj, such that the host obj is not referred to anywhere in the
host language, but is referred to in Lucy, and Lucy finally decrefs
its last reference. How is the cycle broken in that case? (Ie,
Destroy should be invoked via Lucy).
> Obj declares four methods which each host must implement:
>
> Get_RefCount
> Inc_RefCount
> Dec_RefCount
> To_Host
>
> Mike, since you're familiar with Python, I'll have a go at implementing those
> methods for the Python bindings.
>
> First, the accessor for the object's refcount, which is shared by the Lucy
> object and the Python object. If self->host_obj is NULL, then the refcount is
> 1. Otherwise, we delegate responsibility for tracking the refcount to the
> Python object cached in self->host_obj.
>
> u32_t
> Obj_get_refcount(Obj *self)
> {
> if (self->host_obj == NULL) {
> return 1; /* NULL host_obj implies a refcount of 1. */
> }
> else {
> PyObject *py_object = (PyObject*)self->host_obj;
> return py_object->ob_refcnt;
> }
> }
>
> Next, the method which increments the refcount. Calling this method even once
> guarantees that a Python object will be created, since the first time it is
> called, the refcount will progress from 1 to 2, and we need a place to put
> that number.
>
> This means that there are two ways to indicate a refcount of 1. Either we
> have a newly created Lucy object with a NULL self->host_obj which *implies* a
> refcount of 1, or we have a cached host object which had a refcount of 2 or
> more at some point, but which has fallen back down to an *explicit* refcount
> of 1.
OK. That 1 refCount "belonging" to Lucy. This is essentially an
efficient way to represent the common case of "only Lucy has a single
reference to this object".
> Obj*
> Obj_inc_refcount(Obj *self)
> {
> if (self->host_obj == NULL) {
> self->host_obj = Obj_To_Host(self);
> }
> PyINCREF((PyObject*)self->host_obj);
> return self;
> }
>
> Once the host object is cached, it never goes away -- it's there for the life
> of the Lucy object.
>
> Next, the method to decrement the refcount. Note that we only call Destroy()
> directly if self->host_obj is NULL. If we've created a Python object, then we
> count on it to invoke the __del__ method when its refcount falls to 0; we will
> have defined __del__ to invoke Destroy().
I'm still confused because it seems like there should be times when
Lucy needs to destroy a Lucy obj that had at some point crossed the
bridge, yet the host retained no reference. Ie the destruction may
need to initiate from the Lucy side of the bridge.
> u32_t
> Obj_dec_refcount(Obj *self)
> {
> if (self->host_obj == NULL) {
> /* NULL host object implies a refcount of 1. That's dropping to 0
> * as a result of this call, so it's time to invoke Destroy(). */
> Obj_Destroy(self);
> }
> else {
> /* If the PyObject's ob_refcnt falls to 0, then the destructor will
> * be invoked from Python-space via the "__del__" method */
> PyDECREF((PyObject*)self->host_obj);
> }
> }
>
> The last method we need to define is To_Host(), which, in the parlance of the
> Python C API docs, will return a "new reference".
>
> (I'm not sure that this implementation is correct, but it should convey
> the gist.)
>
> void*
> Obj_to_host(Obj *self)
> {
> if (self->host_obj) {
> /* The Python object is already cached, so incref it and return. */
> PyINCREF((PyObject*)self->host_obj);
> return self->host_obj;
> }
> else {
> /* Lazily create Python object. */
> self->host_obj = PyCObject_FromVoidPtr((void*)self, Obj_Destroy)
> }
> }
(missing a return on the else clause, but I get it).
>> won't there be multiple references in C to a given Lucy object, each of
>> which would need to incRef the RC?
>
> Yes. As soon as the refcount has to be increased above 1, we lazily create a
> Host object to hold the refcount.
>
> ---
>
> Leaving aside the question of tracing GC hosts for now... does the
> cached-host-object delegated refcounting model seem sufficiently clear to you
> for use within Lucy Python bindings?
>
> The rest of the Lucy library doesn't need to know about the host object
> caching -- it just uses the opaque refcounting API, which looks like plain
> old integer refcounting from the outside.
Yes, makes sense (except for the "Lucy destroys object that has cached
host object" case). So this is a great approach, in that a host obj
is not created immediately on creating a Lucy obj. However, it's
still "falsely" created, in order to track refCount > 1 from within
Lucy, even when the obj never crosses the bridge. I wonder in
practice how often that actually happens -- my guess is the
RefCount==1 case is the "long tail" in a typical snapshot of a future
running Lucy app.
If only host languages let us override what decRef does for a given
obj... then we could break the tight cycles ourself and only allocate
a host obj when needed.
Mike