On Sat, Jul 3, 2010 at 1:14 AM, Jonas Sicking <jo...@sicking.cc> wrote:
> On Fri, Jul 2, 2010 at 4:40 PM, Pablo Castro <pablo.cas...@microsoft.com> 
> wrote:
>>
>> From: public-webapps-requ...@w3.org [mailto:public-webapps-requ...@w3.org] 
>> On Behalf Of Jonas Sicking
>> Sent: Friday, July 02, 2010 4:00 PM
>>
>>>> We ran into an complicated issue while implementing IndexedDB. In short, 
>>>> what should happen if an object store is modified while a cursor is 
>>>> iterating it? >> Note that the modification can be done within the same 
>>>> transaction, so the read/write locks preventing several transactions from 
>>>> accessing the same table isn't helping here.
>>>>
>>>> Detailed problem description (this assumes the API proposed by mozilla):
>>>>
>>>> Consider a objectStore "words" containing the following objects:
>>>> { name: "alpha" }
>>>> { name: "bravo" }
>>>> { name: "charlie" }
>>>> { name: "delta" }
>>>>
>>>> and the following program (db is a previously opened IDBDatabase):
>>>>
>>>> var trans = db.transaction(["words"], READ_WRITE); var cursor; var result 
>>>> = []; trans.objectStore("words").openCursor().onsuccess = function(e) {
>>>>   cursor = e.result;
>>>>   result.push(cursor.value);
>>>>   cursor.continue();
>>>> }
>>>> trans.objectStore("words").get("delta").onsuccess = function(e) {
>>>>   trans.objectStore("words").put({ name: "delta", myModifiedValue: 17 }); }
>>>>
>>>> When the cursor reads the "delta" entry, will it see the 'myModifiedValue' 
>>>> property? Since we so far has defined that the callback order is defined 
>>>> to be >> the request order, that means that put request will be finished 
>>>> before the "delta" entry is iterated by the cursor.
>>>>
>>>> The problem is even more serious with cursors that iterate indexes.
>>>> Here a modification can even affect the position of the currently iterated 
>>>> object in the index, and the modification can (if i'm reading the spec 
>>>> correctly) >> come from the cursor itself.
>>>>
>>>> Consider the following objectStore "people" with keyPath "name"
>>>> containing the following objects:
>>>>
>>>> { name: "Adam", count: 30 }
>>>> { name: "Bertil", count: 31 }
>>>> { name: "Cesar", count: 32 }
>>>> { name: "David", count: 33 }
>>>> { name: "Erik", count: 35 }
>>>>
>>>> and an index "countIndex" with keyPath "count". What would the following 
>>>> code do?
>>>>
>>>> results = [];
>>>> db.objectStore("people",
>>>> READ_WRITE).index("countIndex").openObjectCursor().onsuccess = function 
>>>> (e) {
>>>>   cursor = e.result;
>>>>   if (!cursor) {
>>>>     alert(results);
>>>>     return;
>>>>   }
>>>>   if (cursor.value.name == "Bertil") {
>>>>     cursor.update({name: "Bertil", count: 34 });
>>>>   }
>>>>   results.push(cursor.value.name);
>>>>   cursor.continue();
>>>> };
>>>>
>>>> What does this alert? Would it alert "Adam,Bertil,Erik" as the cursor 
>>>> would stay on the "Bertil" object as it is moved in the index? Or would it 
>>>> alert "Adam,Bertil,Cesar,David,Bertil,Erik" as we would iterate "Bertil" 
>>>> again at its new position in the index?
>>
>> My first reaction is that both from the expected behavior of perspective 
>> (transaction is the scope of isolation) and from the implementation 
>> perspective it would be better to see live changes if they happened in the 
>> same transaction as the cursor (over a store or index). So in your example 
>> you would iterate one of the rows twice. Maintaining order and membership 
>> stable would mean creating another scope of isolation within the 
>> transaction, which to me would be unusual and it would be probably quite 
>> painful to implement without spilling a copy of the records to disk (at 
>> least a copy of the keys/order if you don't care about protecting from 
>> changes that don't affect membership/order; some databases call these keyset 
>> cursors).
>>
>>>>
>>>> We could say that cursors always iterate snapshots, however this 
>>>> introduces MVCC. Though it seems to me that SNAPSHOT_READ already does 
>>>> that.
>>
>> Actually, even with MVCC you'd see your own changes, because they happen in 
>> the same transaction so the buffer pool will use the same version of the 
>> page. While it may be possible to reuse the MVCC infrastructure, it would 
>> still require the introduction of a second scope for stability.
>
> It's quite implementable using append-only b-trees. Though it might be
> much to ask that implementations are forced to use that.
>
> An alternative to what I suggested earlier is that all read operations
> use "read committed". I.e. they always see the data as it looked at
> the beginning of the transaction. Would this be more compatible with
> existing MVCC implementations?
>

Hmm, so if you modified the object store and then, later in the same
transaction, used a cursor to iterate the object store, the cursor
would not see the earlier modifications? That's not very intiutive to
me...or did I misunderstand?


> I'd imagine this should be as easy to implement as SNAPSHOT_READ.
>
>>>> We could also say that cursors iterate live data though that can be pretty 
>>>> confusing and forces the implementation to deal with entries being added 
>>>> and >> removed during iteration, and it'd be tricky to define all edge 
>>>> cases.
>>
>> Would this be any different from the implementation perspective than dealing 
>> with changes that happen through other transactions once they are committed? 
>> Typically at least in non-MVCC systems committed changes that are "further 
>> ahead" in a cursor scan end up showing up even when the cursor was opened 
>> before the other transaction committed.
>
> IndexedDB doesn't allow write transactions to a given store to start
> while there are read transactions using that store, so that doesn't
> seem to be a problem. Unless I'm misunderstanding something?
>

That was my understanding too: while a cursor was open, you can't
allow a write transaction to start.

>>>> It's certainly debatable how much of a problem any of these edgecases are 
>>>> for users. Note that all of this is only an issue if you modify and read 
>>>> from the >> same records *in the same transaction*. I can't think of a 
>>>> case where it isn't trivial to avoid these problems by separating things 
>>>> into separate transactions.
>>>> However it'd be nice to avoid creating foot-guns for people to play with 
>>>> (think of the children!).
>>>>
>>>> However we still need to define *something*. I would suggest that we 
>>>> define that cursors iterate snapshots. It seems the cleanest for users and 
>>>> easiest >> to define. And once implementations add MVCC support it should 
>>>> be easy to implement. I think we've come up with a decent plan for how to 
>>>> do implement it in sqlite even without proper MVCC, so it should be doable 
>>>> even then.
>>
>> Besides the expectations aspects, I worry that doing this means that opening 
>> a cursor means incurring in substantial cost for all cases (e.g. creating a 
>> keyset or something like that).
>
> I agree we definitely don't want that. We are working on an
> implementation which is backed by a SQL database and completely
> incapable of MVCC, so it seems doable. However I don't yet know how
> much of a complexity and performance penalty that carries.
>

On the other hand, as you said earlier, maybe allowing the live
changes to be visible in the cursor is not such a big problem as apps
can work around these edge cases?

Thanks,
Andrei

Reply via email to