RE: [RfC] vtable->dump

Gordon Henriksen Thu, 04 Sep 2003 16:35:54 -0700

Dan Sugalski wrote:

> Note that a seen hash isn't particularly threadsafe here, at least not
> in any useful way, since we have to make sure the structure we're
> traversing doesn't change during traversal or any threadsafety we put
> in is useless since we're potentially dumping corrupt data.


So? The caller is going to have to address this anyway; dumping the root
set of a live multithreaded program is NEVER going to be a
runtime-guaranteed safe and reliable operation, even if the runtime does
acquire fine-grained locks on aggregates at the risk of deadlock.
(Unless you're dumping the WHOLE program state, including continuations
for all active threads. In THAT case, though, the runtime can simply
halt all of the threads at safe points and then dump state. ["Simply."
Hah!] But still no fine-grained locks.)

Fine-grained automatic locking is only useful in limited scenarios where
the aggregate actually provides the required semantics alone. In the
vast majority of cases, coarser synchronization primitives need to be
used. This is why Java abandoned implicitly synchronized arrays, and why
the CLR never adopted them in the first place. As the runtime, parrot
can't know about those coarser locks or the protocol for acquiring them.

Imagine:

    my $lock;
    my @a;
    my @b;

    sub threadsafe_double_push ($var) {
        sync $lock {
            push @a, $var;
            push @b, $var;
        }
    }

    sub serialize_arrays {
        dump [EMAIL PROTECTED], [EMAIL PROTECTED];
    }

Even if push and dump both synchronize on the aggregates, the dump will
still be able to emit inconsistent values, because push @a has released
its lock on @a before push @b locks @b. And dump doesn't know squat
about $lock, so it can't do this safely. Only the program author has
enough information to write serialize_arrays safely as:

    sub serialize_arrays {
        sync $lock {
            dump [EMAIL PROTECTED], [EMAIL PROTECTED];
        }
    }

I actually have to question the usefulness of runtime-managed
serialization. Most serialization libraries actually provide an
interface or base class which serializable classes must implement, and
it's not at all uncommon (and oftentimes necessary) for objects to omit
transient state or caches from their serialized forms. The traversal HAS
to be able to call back into parrot code in order to implement that, and
what you're suggesting CAN'T.


> While a seen hash is DOD-interruptable (which, admittedly, the scheme
> I'm preferring isn't, or is with a *lot* of care) it's also slower and
> requires a lot more resources when running. What I'd prefer to do
> doesn't require any headers during its run, nor additional memory 
> past the memory (which we can safely GC) used to hold the serialized 
> data.

What you're suggesting also has significant side-effects: It halts
hypothetical multithreaded programs, suspends DoD, prevents the
traversal mechanism from calling back into parrot code, requires
fine-grained locks which are extremely expensive and have been summarily
abandoned to great acclaim in all prior works... and for that, still
doesn't provide a useful form of thread safety in the general case
anyhow.

At some point, the you have to let the runtime do its job and RUN. It
doesn't have to solve every problem entirely internally with witchcraft
and voodoo. Its funadmental job is, after all, providing a Turing
complete problem-solving environment to its clients.

--
 
Gordon Henriksen
IT Manager
ICLUBcentral Inc.
[EMAIL PROTECTED]

RE: [RfC] vtable->dump

Reply via email to