Re: Help Fixing Bug in Durable Reference Creation Inside DoSync Transaction

2010-10-03 Thread Alyssa Kwan
I've come up with a solution to the durable ref creation/declaration
in dosync block bug.  The creation/declaration exclusively locks the
logical dref by key to force other transactions to retry.  I welcome
feedback and code review from anyone interested in out-of-the-box
durability of STM identities.

git://github.com/kwanalyssa/clojure.git

Thanks!
Alyssa

On Oct 3, 12:54 am, Alyssa Kwan alyssa.c.k...@gmail.com wrote:
 I've noticed a bug in my implementation of the durable ref, or dref
 construct.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Help Fixing Bug in Durable Reference Creation Inside DoSync Transaction

2010-10-02 Thread Alyssa Kwan
I've noticed a bug in my implementation of the durable ref, or dref
construct.  This is an attempt to add durability to the STM.  drefs
have all of the ACI guarantees of refs, with the additional property
of being durable, so changes to the state of the identity are
durable.  The API is that there exists a logical namespace keyed by
store name - a string - and a key - an object, which then corresponds
to a value.  Calling (dref store key initVal) creates a durable
ref with the state of initVal under the store name store and key
key, which is immediately written to the underlying BDB JE
database.  Subsequent invocations of (dref) with the same store and
key refer to the same logical object; initVal is ignored.  This is
also true if the subsequent (dref) call is in a later JVM instance;
initVal is ignored and the durable value saved in the BDB JE database
is used.  (dref) is therefore a drop-in replacement for (ref).

This enhancement to the Clojure core is accessible here:
git://github.com/kwanalyssa/clojure.git.

There's some slight enhancements since I last ANN'd this:  the rest of
the Clojure primitives are now supported:  BigDecimals and Ratios; and
any object type acceptable as values are also acceptable as keys.
This opens up the possibility of easily chaining key-value pairs, and
having built in, transactional, durable memoization.  I'm currently
working on adding the persistent data structures such that they're
also persistent on disk, minimizing I/O; and making the persistent
structures lazily-loaded and evicted.  This means potentially having
petabyte-size maps occupying a fraction of that in memory and
seamlessing swapping sections of the keyset and attached entryset
based on usage.  The possibilities for multidimensional databases (the
use case I'm enhancing Clojure to serve), or even simply as massive,
transactional, memoized caches, is awesome.

But there is a bug.

It has to do with creating/declaring durable refs inside of dosync
blocks.  To wit:  (dosync (def a (dref store key 0)) (ref-set a (/
1 0))).

This block will obviously fail (i.e. throw an exception).  What then,
is the var a referring to?  It must refer to the dref that was
declared/created.  But what is the state of that dref identity?

In the non-durable version:  (dosync (def a (ref 0)) (ref-set a (/ 1
0))), the var a points to a ref with a state of 0.  That's because the
ref was implicitly created in it's own single-operation transaction
(like a DB autocommit) that precedes the transaction declared by the
(dosync).  It is possible because the ref has 2 properties:  it is
local, nothing outside the dosync block can concurrently refer to the
ref - the var a is ThreadLocal; and the ref is obviously being
created, it is not pointing to a ref that exists before the dosync
block.

However, the durable version doesn't have those two properties.
Because the dref namespace is global, another transaction concurrently
can be referring to that dref:  either a single-operation create/
declare - (dref store key aDifferentInitVal), or the same create/
declare in a different dosync block.  Also, the (dref) statement may
refer to a dref that already exists in the store:  the (dref) isn't
creating something new, but refers to something already in the DB, in
which case the initVal is ignored.  This makes the transaction
semantics much harder.

In my current implementation, I punt on some of these issues.  Imagine
a scenario where there are 2 vars pointing to the same logical dref.
var a is pointing to (dref store key initVal) and so is var b.
dosync block A containing an (alter) on var a is executing
concurrently as dosync block B, which contains an (alter) on var b.
Although these may be two different POJOs, they have the same store
and key, so they are the same logical object.  Maintaining ACI
guarantees means that acquiring a read lock on the identity referred
to by var a must acquire the same lock on the identity object referred
to by var b.  The same is true with write locks.  If dosync block A
already has a write lock on the logical dref under both var a and var
b, dosync block B must back off and retry after dosync block A either
finishes or is cancelled/errored out for good.  This is a property of
having multiple POJOs potentially pointing to the same logical
identity because of a global namespace.  I punt on this issue by
ensuring that all drefs declared with store key have the same POJO
by using a weak cache.  Because they have the same POJO, a lock on one
(dref) object is a lock on all.

But this leads to a problem.  To illustrate, imagine 3 scenarios:

In the first scenario, the dref doesn't exist in the DB.  The (dref)
statement is creating a new dref.  There are two dosync blocks
concurrently executing:  dosync block A and dosync block B.  dosync
block A just happens to begin executing before dosync block B.
However, the (dref) statement in dosync A just happens to execute
after the (dref) statement in dosync block B.  Because the