On Sat, Jul 3, 2010 at 1:14 AM, Jonas Sicking <jo...@sicking.cc> wrote: > On Fri, Jul 2, 2010 at 4:40 PM, Pablo Castro <pablo.cas...@microsoft.com> > wrote: >> >> From: public-webapps-requ...@w3.org [mailto:public-webapps-requ...@w3.org] >> On Behalf Of Jonas Sicking >> Sent: Friday, July 02, 2010 4:00 PM >> >>>> We ran into an complicated issue while implementing IndexedDB. In short, >>>> what should happen if an object store is modified while a cursor is >>>> iterating it? >> Note that the modification can be done within the same >>>> transaction, so the read/write locks preventing several transactions from >>>> accessing the same table isn't helping here. >>>> >>>> Detailed problem description (this assumes the API proposed by mozilla): >>>> >>>> Consider a objectStore "words" containing the following objects: >>>> { name: "alpha" } >>>> { name: "bravo" } >>>> { name: "charlie" } >>>> { name: "delta" } >>>> >>>> and the following program (db is a previously opened IDBDatabase): >>>> >>>> var trans = db.transaction(["words"], READ_WRITE); var cursor; var result >>>> = []; trans.objectStore("words").openCursor().onsuccess = function(e) { >>>> cursor = e.result; >>>> result.push(cursor.value); >>>> cursor.continue(); >>>> } >>>> trans.objectStore("words").get("delta").onsuccess = function(e) { >>>> trans.objectStore("words").put({ name: "delta", myModifiedValue: 17 }); } >>>> >>>> When the cursor reads the "delta" entry, will it see the 'myModifiedValue' >>>> property? Since we so far has defined that the callback order is defined >>>> to be >> the request order, that means that put request will be finished >>>> before the "delta" entry is iterated by the cursor. >>>> >>>> The problem is even more serious with cursors that iterate indexes. >>>> Here a modification can even affect the position of the currently iterated >>>> object in the index, and the modification can (if i'm reading the spec >>>> correctly) >> come from the cursor itself. >>>> >>>> Consider the following objectStore "people" with keyPath "name" >>>> containing the following objects: >>>> >>>> { name: "Adam", count: 30 } >>>> { name: "Bertil", count: 31 } >>>> { name: "Cesar", count: 32 } >>>> { name: "David", count: 33 } >>>> { name: "Erik", count: 35 } >>>> >>>> and an index "countIndex" with keyPath "count". What would the following >>>> code do? >>>> >>>> results = []; >>>> db.objectStore("people", >>>> READ_WRITE).index("countIndex").openObjectCursor().onsuccess = function >>>> (e) { >>>> cursor = e.result; >>>> if (!cursor) { >>>> alert(results); >>>> return; >>>> } >>>> if (cursor.value.name == "Bertil") { >>>> cursor.update({name: "Bertil", count: 34 }); >>>> } >>>> results.push(cursor.value.name); >>>> cursor.continue(); >>>> }; >>>> >>>> What does this alert? Would it alert "Adam,Bertil,Erik" as the cursor >>>> would stay on the "Bertil" object as it is moved in the index? Or would it >>>> alert "Adam,Bertil,Cesar,David,Bertil,Erik" as we would iterate "Bertil" >>>> again at its new position in the index? >> >> My first reaction is that both from the expected behavior of perspective >> (transaction is the scope of isolation) and from the implementation >> perspective it would be better to see live changes if they happened in the >> same transaction as the cursor (over a store or index). So in your example >> you would iterate one of the rows twice. Maintaining order and membership >> stable would mean creating another scope of isolation within the >> transaction, which to me would be unusual and it would be probably quite >> painful to implement without spilling a copy of the records to disk (at >> least a copy of the keys/order if you don't care about protecting from >> changes that don't affect membership/order; some databases call these keyset >> cursors). >> >>>> >>>> We could say that cursors always iterate snapshots, however this >>>> introduces MVCC. Though it seems to me that SNAPSHOT_READ already does >>>> that. >> >> Actually, even with MVCC you'd see your own changes, because they happen in >> the same transaction so the buffer pool will use the same version of the >> page. While it may be possible to reuse the MVCC infrastructure, it would >> still require the introduction of a second scope for stability. > > It's quite implementable using append-only b-trees. Though it might be > much to ask that implementations are forced to use that. > > An alternative to what I suggested earlier is that all read operations > use "read committed". I.e. they always see the data as it looked at > the beginning of the transaction. Would this be more compatible with > existing MVCC implementations? >
Hmm, so if you modified the object store and then, later in the same transaction, used a cursor to iterate the object store, the cursor would not see the earlier modifications? That's not very intiutive to me...or did I misunderstand? > I'd imagine this should be as easy to implement as SNAPSHOT_READ. > >>>> We could also say that cursors iterate live data though that can be pretty >>>> confusing and forces the implementation to deal with entries being added >>>> and >> removed during iteration, and it'd be tricky to define all edge >>>> cases. >> >> Would this be any different from the implementation perspective than dealing >> with changes that happen through other transactions once they are committed? >> Typically at least in non-MVCC systems committed changes that are "further >> ahead" in a cursor scan end up showing up even when the cursor was opened >> before the other transaction committed. > > IndexedDB doesn't allow write transactions to a given store to start > while there are read transactions using that store, so that doesn't > seem to be a problem. Unless I'm misunderstanding something? > That was my understanding too: while a cursor was open, you can't allow a write transaction to start. >>>> It's certainly debatable how much of a problem any of these edgecases are >>>> for users. Note that all of this is only an issue if you modify and read >>>> from the >> same records *in the same transaction*. I can't think of a >>>> case where it isn't trivial to avoid these problems by separating things >>>> into separate transactions. >>>> However it'd be nice to avoid creating foot-guns for people to play with >>>> (think of the children!). >>>> >>>> However we still need to define *something*. I would suggest that we >>>> define that cursors iterate snapshots. It seems the cleanest for users and >>>> easiest >> to define. And once implementations add MVCC support it should >>>> be easy to implement. I think we've come up with a decent plan for how to >>>> do implement it in sqlite even without proper MVCC, so it should be doable >>>> even then. >> >> Besides the expectations aspects, I worry that doing this means that opening >> a cursor means incurring in substantial cost for all cases (e.g. creating a >> keyset or something like that). > > I agree we definitely don't want that. We are working on an > implementation which is backed by a SQL database and completely > incapable of MVCC, so it seems doable. However I don't yet know how > much of a complexity and performance penalty that carries. > On the other hand, as you said earlier, maybe allowing the live changes to be visible in the cursor is not such a big problem as apps can work around these edge cases? Thanks, Andrei