Zoran, I just took a look at your tcl_sv code. I do not believe that it
is thread-safe, but perhaps you can convince me.  Here's my rationale.

Prior to 3.1+ad9, the nsd8x implementation of ns_share worked like this:
when you ns_shared a variable, it called Tcl_DuplicateObj on the value
of the variable and saved the value in a shared hash table. Later, when
you ask for the variable value, there was a callback that fetched the
saved value from the shared hash table and copies it to the variable
(using Tcl_DuplicateObj). If you set the variable value, a callback
copied the new value to the shared hash table (using Tcl_DuplicateObj).

The locking around all this activity was correct for simple objects.
However, consider the case of an object with a tclListType internalRep.
In that case, Tcl_DuplicateObj creates a shallow copy of the list: it
creates a new array of Tcl_Obj*, but populates that array with pointers
to the same Tcl_Objs as the original list, and increments the refCounts
of those contained objects.

This means that you can easily end up with Tcl_Objs referenced
by multiple interpreters simultaneously. This would work if
Tcl_IncrRefCount and Tcl_DecrRefCount were protected with a mutex, but
they aren't. So there's a race condition in both of those functions.
Occasionally two Tcl interpreters will simultaneously try to update
the reference count of a single Tcl_Obj. This leads either to garbage
accumulation, or accessing freed storage.

Because there's no portable way to make Tcl_IncrRefCount and
Tcl_DecrRefCount atomic short of adding mutexes to them, I changed
ns_share and its callbacks to stringify object values. (They call
Tcl_GetStringFromObj. This change appeared in 3.1+ad9 and 3.2.) This way
a shared list is stored as a simple string and we don't have to worry
about Tcl_Objs being shared across interpreters.

>From what I see around lines 339 and 460 of sv.c, you have exactly the
same bug that ns_share had.

I don't think it's sufficient to check for an tclListType internalRep
and only stringify in that case. You have to be sure that
Tcl_DuplicateObj will create a deep copy of the object. I suspect that
tclByteCodeType, tclCmdNameType, tclNsNameType, tclProcBodyType, and
tclRegexpType must be stringified, not to mention Tcl_ObjTypes added by
extensions. At best you can check for internalReps for which it is safe
to call Tcl_DuplicateObj (tclByteArrayType, tclBooleanType, tclIntType,
tclDoubleType, and tclStringType). I'm not sure it's worth it.

This is why the ns_cache module provides both thread-private caches
and shared caches. A thread-private cache stores values as Tcl_Objs. A
shared cache stores values as strings.

Reply via email to