Re: Futures and transactions

2013-04-09 Thread Joran Greef
The problem with IndexedDB transactions is when you need to start doing any 
kind of streaming, where there is the potential for the stream write buffer to 
fill up, e.g. syncing over the network:

1. Get references to objects within a collection within a transaction.
2. Compare these to objects over the network.
3. Start writing objects to the network, waiting for the network to drain 
(assuming web sockets) before writing more data.

While this is essentially a long-lived read transaction, this won't work with 
IDB.

Some have argued that the design goal was to avoid long-lived transactions, but 
there is a difference between long-lived read transactions and long-lived write 
transactions.

For MVCC transactions, which I think IDB was once supposed to be aiming for, 
there is by definition no problem with long running readers, since they do not 
block each other or writers, they simply read the database at a snapshot in 
time.

The browser is starting to support stream apis, and I think with that, we need 
transactions that can be retained. That is, keep the same semantics as per 
IDB transactions, but with an additional method retain(milliseconds) that 
would keep the transaction alive for a certain amount of time.

Joran Greef



Re: Sandbox

2012-09-17 Thread Joran Greef
Apps (native/web) need direct access to bare metal.

Browser vendors need to move away from the we do all the thinking and 
designing and implementing top-down model of innovation.

Browser vendors need to provide minimal core OS APIs and get out of the way and 
let open source grow around and do the rest.

For too long now the typical response to this kind of proposal has been how do 
you propose solving the security problems?

That is to say, we should not do any of this unless we can perfectly solve the 
security problems. As if they can be perfectly solved.

And so our most perfect solution has been to completely cripple web apps:

No TCP.

No UDP.

No POSIX.

No Hardware.

Tim Berners-Lee raised this point first awhile back on Public Web Apps: 
http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/0464.html

As a user, I want to write a web app. I trust it. I want to give it UDP, TCP 
and POSIX anointing. I want it to use the resources of my machine to act on my 
behalf and assist me in my work. The browser won't let me. Why?



Re: Sandbox

2012-09-17 Thread Joran Greef
On 17 Sep 2012, at 2:33 PM, Florian Bösch pya...@gmail.com wrote:

 Security is a pretty serious concern if you're distributing apps without any 
 oversight to billions of users automatically upon a single link click.

You are conflating web apps (trusted, installed) with web pages (single link 
click).

 No TCP.
 Wrong, see websockets which upgrade to plain old TCP after the handshake.

No, WebSockets are not plain old TCP.

 
 No UDP.
 Coming with WebRTC in the form of unreliable data channels.

WebRTC is above UDP. It's not UDP. WebRTC is a massive conglomeration of 
protocols and codecs and opinions.

 No POSIX.
 Why would you need cross-OS posix standards and operating system shells when 
 you already have a browser which abstracts cross-OS APIs in its own fashion?

How do you fsync in a browser?

 Tim Berners-Lee raised this point first awhile back on Public Web Apps: 
 http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/0464.html
 I believe his point was subtly different. He was arguing for vendors to come 
 up with ways to solve the usecases he mentioned, not arguing to just blast 
 the OS at the JS developer and let the ensuing security armageddon sort 
 itself out.

No, not at all. Nowhere did he ask for browser vendors to solve the use cases 
he mentioned.



Drag Drop Web Apps

2012-08-10 Thread Joran Greef
Given the advance of HTML 5, and in the interest of developing web apps with 
average functionality, would it now be possible to:

1. Drag files and folders into a web app?
2. Drag files and folders out of a web app?
3. Drag a spreadsheet out of a web app onto the icon of Excel in the dock and 
have it open in Excel?
4. Monitor that same spreadsheet's content (originally provided by the web app) 
for changes when the user edits it and presses CTRL+S?

Or is it only possible to drag things into a browser window but not back out 
and nothing else?

Can a user drag a piece of data into a browser window… and then drag it back 
out?

For example, a user may want to use a Contacts web app, and drag a contact out 
the browser window as a piece of vcard data and land this onto the Contacts app 
in the dock, which would then import the contact, all in a single mouse gesture?

Or is it not possible to provide that kind of user experience?

For example, a user may want to use a PDF web app, and transfer a piece of PDF 
data to the Preview app, but be forced to click a link to download the PDF, 
click the very small Keep button next to the This type of file can harm your 
computer. Do you want to keep anyway? warning, and then drag the PDF onto the 
Preview app, and then go to the Downloads folder to delete the download. At 
least 5 mouse clicks and then a CMD+backspace to accomplish what (from the 
user's point of view at least) should have only taken one drag and drop?

And then this may be vendor specific, but if a user created a piece of PDF data 
and dragged it into the browser window in the first place, does it still make 
sense to warn them that this type of file can harm your computer?

The browser takes on too much responsibility for things it can't possibly 
reason about, and seeks not enough advice from the user where it could. It 
often seems that the browser is built to lecture the user, rather than the 
other way round. I use the browser everyday at work, and sometimes you have to 
ask yourself: who's serving who. Does the user serve the browser, or does the 
browser serve the user?



Non-persistent in-memory storage accessible by same domain tabs

2012-05-24 Thread Joran Greef
Web applications need a way to communicate between two same domain
tabs without polling LocalStorage and without hitting the disk.

It would be useful to have an in-memory get/set/compare_and_set hash
table exposed to scripts running same domain tabs, that is discarded
by the browser when those tabs are closed.

Use cases:

1. Coordinate replication between tabs for an offline app, i.e. one
tab takes responsibility for syncing a user's data to and from
IndexedDB.
2. Sign out from one tab triggers sign out from all other tabs.
3. If something like LevelDB were exposed directly to JS, one could
implement MVCC on top using the shared hash.
4. Library authors would be able to implement their own cross-tab postMessage.

It's difficult to implement these use cases with LocalStorage, without
a coarse resolution, and risky at that, due to the lack of compare and
set primitive in LocalStorage.



IndexedDB: Binary Keys

2012-05-21 Thread Joran Greef
IndexedDB supports binary values as per the structured clone algorithm
as implemented in Chrome and Firefox.

IndexedDB needs to support binary keys (ArrayBuffer, TypedArrays).

Many popular KV stores accept binary keys (BDB, Tokyo, LevelDB). The
Chrome implementation of IDB is already serializing keys to binary.

JS is moving more and more towards binary data across the board
(WebSockets, TypedArrays, FileSystemAPI). IDB is not quite there if it
does not support binary keys.

Binary keys are more efficient than Base 64 encoded keys, e.g. a 128
bit key in base 256 is 16 bytes, but 22 bytes in base 64.

Am working on a production system storing 3 million keys in IndexedDB.
In about 6 months it will be storing 60 million keys in IndexedDB.

Without support for binary keys, that's 330mb wasted storage
(60,000,000 * (22 - 16)) not to mention the wasted CPU overhead spent
Base64 encoding and decoding keys.



IndexedDB: Retrieving a slice of a record value.

2012-04-17 Thread Joran Greef
It would be great if there was a way to use IndexedDB to get just a
slice of a record value, not the entire value.

For example, when storing many large binary values, there may be
useful meta or header info at the start or end of each value, which
could be retrieved directly.

It would be a waste to have to store this data twice, or to read the
entire value.



Re: Installing Web Apps

2012-02-17 Thread Joran Greef
The problem is we're framing the discussion in terms of installing web apps.

We're answering the wrong question.

The real question is whether we want to start seeing powerful applications 
running in the browser.

If we do, then we'll figure out a way to get there. Be it installing, 
permissions, or letting apps use as much storage as they need, but just 
giving me a way to keep tabs on what they're using so I can uninstall them if I 
want. Or letting apps use as much bandwidth or CPU or whatever they need, but 
just giving me a way to keep tabs. Or if I'm really security conscious there 
could be a firewall to let me as user defend certain system calls or 
whitelist specific apps but only if I want.

But none of that is really the issue now. The issue now is that some are 
unimaginatively saying what, browser in a browser?. It's the nobody would 
ever want a personal computer attitude and this needs to change so that the 
next unforeseen innovation can take place.

What do you want to build in the browser?

1. Dropbox (e.g. drag and drop files into the browser, click a link in the app 
to open them in native applications such as Excel, poll the file for changes 
from the browser and sync the chunks that changed)?
2. Web browser?
3. Proxyless POP and SMTP clients that don't waste server bandwidth and let 
users go direct?
4. Spotify client?
5. Skype client?

I want to be building all of the above.




Re: Enable Compression Of A Blob To .zip File

2011-11-30 Thread Joran Greef
It would be great to have a native binding to Zlib and Snappy exposed to 
Javascript in the browser. Zlib covers the expensive disk use-cases, Snappy 
covers the expensive CPU use-cases.

Also a native binding to basic crypto primitives, even if that means just SHA1 
to start, and even if the Node.js crypto api is copied verbatim.

TypedArrays are in current implementations are too slow to help with these, as 
far as I have tried.




Re: [IndexedDB] Transaction Auto-Commit

2011-08-02 Thread Joran Greef
I have been spending time on IDB lately and wanted to give feedback as to the 
transaction auto-commit interface:

I am trying to write a wrapper around IDB to match the interface of my 
server-side data store, which allows you to:

1. Request a read or write transaction asynchronously.
2. GET, MGET, EXISTS or SET against that transaction asynchronously.
3. COMMIT when done to release and commit the transaction or ABORT to release 
but not commit the transaction.
4. Have many concurrent read transactions.
5. Have one write transaction at a time (without blocking readers - MVCC).

As you can imagine, IDB does not support this, since it forces you to issue 
requests against an IDB transaction synchronously (from the viewpoint of the 
rest of the application). In other words, once you have obtained an IDB 
transaction, it is automatically released when your code returns control so 
there is no way to do something such as get a value from IDB, do something 
taking a millisecond or two such as reading from WebSQL and then writing the a 
value back to IDB, all within the same IDB transaction. You'd have to use 
multiple IDB transactions which would be fine if the user only had your 
application open in one tab, but not in multiple tabs.

To get around this, I thought one could use optimistic concurrency control to 
write a nonce to IDB whenever a write transaction is requested from my IDB 
wrapper, use separate IDB transactions, and when writing, generate a conflict 
error if the nonce has changed.

The problem is it's significantly slower to do each GET, MGET, EXISTS, or SET 
on a separate IDB transaction. I think it works out to an extra millisecond or 
two overhead. If you're doing 10 or 20 operations, however small, that's an 
extra 10-20ms wasted overhead.

So then I thought I would request an IDB transaction when a transaction is 
requested from my wrapper, and then check the active flag when it's needed, and 
if active is set to false then re-request the transaction. The trouble is that 
the active flag does not appear to be exposed to JS as far as I can see.

Then I tried using a try/catch whenever an object store is requested from an 
IDB transaction so as to reset the IDB transaction if it's expired. Chrome 
returns NOT_ALLOWED_ERR instead of ...INACTIVE… as it should. But I also 
found that the UA sometimes updates the active flag when my code has not 
returned control so there's a race condition somewhere in there I think, which 
may make this trick impossible. It works fine if I schedule a delay between 
operations of 10ms or more. When it gets down to 1ms though, it starts failing 
every now and then.

I tried the same thing using transaction.oncomplete to set my own active flag, 
but this did not work either.

Throughout, IDB in Chrome performs at least an order of magnitude slower than 
the same code running against an in-house mvcc database on the same machine. 
Firefox is significantly slower than Chrome. Would anyone know what the LevelDB 
benchmark would look like if through IDB on Chrome?

 Note that reads are also blocked if the long-running transaction is a 
 READ_WRITE transaction.

Is it acceptable for a writer to block readers? What if one tab is downloading 
a gigabyte of user data (using a workload-configurable Merkle tree scheme), and 
another tab for the same application needs to show data?

On 25 Jul 2011, at 8:38 PM, Jonas Sicking wrote:

On Mon, Jul 25, 2011 at 6:28 AM, Joran Greef jo...@ronomon.com wrote:
 Regarding transactions in the IndexedDB specification (3.1.7 Transaction):
 
 Once a transaction no longer can become active, and if the transaction 
 hasn't been aborted, the implementation must automatically attempt to 
 commit it. This usually happens after all requests placed against the 
 transaction has been executed and their returned results handled, but no 
 new requests has been placed against the transaction.
 
 What does no longer can become active mean?

Well.. generally it's exactly the text you are quoting. after all
requests placed against the transaction has been executed and their
returned results handled, but no new requests has been placed against
the transaction.

If you want the full exact definition, look for all the places that
references the active flag for transactions.

 Authors can still cause transactions to run for a long time, however this 
 is generally not a usage pattern which is recommended and can lead to bad 
 user experience in some implementations.
 
 How exactly can an author still cause a transaction to span several 
 asynchronous events?

All transactions span all the asynchronously firing events that are
fired against the requests placed against the transaction. So as long
as you're scheduling requests against the transaction it'll keep
running. So if you schedule a million requests against a transaction,
it'll take a while for it to finish. That's all the above quote says.

 For example, start a transaction, read a value, use

[IndexedDB] Transaction Auto-Commit

2011-07-25 Thread Joran Greef
Regarding transactions in the IndexedDB specification (3.1.7 Transaction):

 Once a transaction no longer can become active, and if the transaction 
 hasn't been aborted, the implementation must automatically attempt to commit 
 it. This usually happens after all requests placed against the transaction 
 has been executed and their returned results handled, but no new requests 
 has been placed against the transaction.

What does no longer can become active mean?

 Authors can still cause transactions to run for a long time, however this 
 is generally not a usage pattern which is recommended and can lead to bad 
 user experience in some implementations.

How exactly can an author still cause a transaction to span several 
asynchronous events? For example, start a transaction, read a value, use that 
value to do something asynchronous outside of IDB (perhaps for a millisecond or 
two or up to a second), and then write the result of that back to the 
transaction?

If it is indeed possible for an author to prolong a transaction, does that mean 
the UA is implementing a delay to give transactions with asynchronous 
dependencies the chance to add requests?

Surely an explicit commit in this case would be preferable for performance 
reasons (with a UA timeout protecting against developer forgetfulness)? Then 
again, if a developer forgot an explicit commit, it would only block writes for 
his particular application.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-06 Thread Joran Greef
On 06 Apr 2011, at 8:56 AM, Jonas Sicking wrote:
 
 1. Treat object values as opaque (necessary to avoid 
 deserialization/serialization overhead, this is mandatory for storing 
 anything over 50,000 objects on a device like an iPad or iPhone).
 
 Please explain this in more detail as I have no idea what you mean by
 treat as opaque. Are you saying that we should not allow storing
 objects but rather only allow storing strings? If not, surely any type
 of object needs to be serialized upon storage. If you are simply
 suggesting forbidding storing objects, then this doesn't seem like a
 blocker. Simply store a string and we won't serialize anything.
 
 I'm also interested in what you are basing the claim on overhead on.
 Have you profiled a IndexedDB implementation? If so, which? And if
 Firefox, did you do so before or after we switched away from using a
 JSON serializer?

Yes, it must accept a string value and store that directly. The opaque 
terminology comes from some of the BDB papers.

I tested both Chrome and Firefox implementations 3 weeks ago. Both were an 
order of magnitude slower than using SQLite as a key-value store (storing 
strings as blobs). You can use whatever serializer you like, but it will always 
be slower than avoiding serialization completely (this is possible by the way, 
my application does not deserialize objects received from the server before 
storing them). Even if your serializer takes only 1ms per serialize call, 
that's 50 seconds for 50,000 objects. For my use-case that is unacceptable, 
considering that SQLite is available in Chrome and Safari. I will encourage my 
users to use those browsers and continue developing for SQLite until IndexedDB 
resolves this issue.

How would you support indices (see below) if you say Simply store a string and 
we won't serialize anything.?

 2. Enable indices to be modified at time of putting/deleting objects (index 
 references provided by application at time of putObject/deleteObject call).
 
 I don't believe that this is a blocker. You can simply modify the
 object you are storing to add properties and then index of these
 properties. What you are suggesting only has the advantage that it
 allows storing objects without modifying them. While that can be
 important, it isn't a blocker to at least creating a prototype
 implementation.

How would you index objects passed to putObject as a string (see above)? Plus 
you have the unnecessary object creation overhead. How fast is it to create 
50,000 objects on an iPad? What would that do to the GC and why would you want 
to do that if you don't need to?

I would like to see Mozilla do as they say: re-implement a SQLite on 
IndexedDB themselves, that is just as fast and memory efficient as the 
original, before suggesting that this is possible, that the web therefore be 
deprived of SQLite. Furthermore, that Mozilla stop using SQLite for all 
internal use, and rely solely on IndexedDB instead. That is essentially the 
request that Mozilla are making of web developers today.

It's clear that scores of web developers are upset with the decision to 
deprecate WebSQL. It's not clear that IndexedDB provides anything close in 
terms of actual raw performance. This surprised me greatly since I assumed 
IndexedDB would naturally leverage established indexed key-value ideas (for 
instance to quote BDB - In Berkeley DB, the key and value in a record are 
opaque to Berkeley DB) which would give it an edge over SQLite.

Pragmatically speaking, would it really be so hard for Mozilla to join Chrome, 
Safari and Opera and provide an embedding of SQLite along with IndexedDB?

If IndexedDB is as good as you suggest it is, then I am sure developers will 
flock to it, and you won't need to speculate as to whether or not SQLite will 
take over the web and then break backwards compatibility (despite a stated 
objective and proven track record of not doing so). And if SQLite did ever 
break backwards compatibility then developers would have IndexedDB. And if 
applications relying on SQLite are abandoned by their authors and broken as a 
result of not upgrading, then arguably those applications should be deprecated 
and not SQLite.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-06 Thread Joran Greef
On 06 Apr 2011, at 6:26 PM, Shawn Wilsher wrote:

 On 4/4/2011 8:07 AM, Joran Greef wrote:
 SQLite has a fantastic track record of maintaining backwards compatibility.
 Sort of.  They didn't between SQLite 2 and SQLite 3.  There also have been 
 some (albeit minor) backwards compatibility issues with SQLite 3.x releases.  
 The most serious of which deal with performance characteristics changing 
 because they changed how the optimizer works.
 
 These type of things are acceptable to deal with in browser code because you 
 can change your code unlike on the web (unless you want to have different 
 code for each browser, and then each browser version).  It's that, or 
 browsers can ship one version of SQLite for all eternity.
 
 Cheers,
 
 Shawn

We only need one fixed version of SQLite to be shipped across Chrome, Safari, 
Opera, Firefox and IE. That in itself would represent a tremendous goal for 
IndexedDB to target and to try and achieve. When it actually does, and 
surpasses the fixed version of SQLite, those developers requiring the raw 
performance and reliability of SQLite could then switch over.

It is too soon to deprecate SQLite in the browser. IndexedDB is only getting 
started. It is beta and nowhere near the performance and test coverage of 
SQLite.

A fixed version of SQLite across browsers would be helpful at this stage. If 
Mozilla could lead the way on this it would be fantastic. Perhaps that would 
satisfy all parties on these issues?

It would also give IndexedDB implementors sufficient incentive to optimize 
their implementations, and developers the safety net of SQLite until such time 
as they do.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-06 Thread Joran Greef
On 06 Apr 2011, at 6:49 PM, Shawn Wilsher wrote:

 On 4/4/2011 10:18 AM, Joran Greef wrote:
 How would you create an index on an existing object store in IndexedDB 
 containing more than 50,000 objects on an iPad, without incurring any object 
 deserialization/serialization overhead, without being an order of magnitude 
 slower than SQLite, and without bringing the iPad to its knees? If you can 
 do it with even one IndexedDB implementation out there then kudos and hats 
 off to you. :)
 You keep bringing this point up, but only a naive implementation of IndexedDB 
 would bring a device to it's knees (or a poorly implemented thread scheduler, 
 which I don't expect the iPad to have).  The API is asynchronous, which means 
 it doesn't need to (nor should it) happen on any thread that the UI is being 
 drawn on.
 
 You still have a point about it possibly taking longer, but even then, that 
 will be implementation dependent.
 
 Cheers,
 
 Shawn
 

I bring up the iPad example because I had experience with a LocalStorage 
implementation (I think it was Safari) loading the contents of LocalStorage 
into memory synchronously on first access, blocking the UI thread. I am 
probably wrong on this one but I think I remember reading on Web Apps that this 
was one of the motivations behind limiting LocalStorage quota to around 10mb. 
At the time I was one of those who believed that LocalStorage would support 
storage of at least 10 GB as a matter of course. I hope you can understand my 
slight distrust of subsequent storage APIs (other than those of proven track 
record) in this light.

It would still take longer (easily 30-50 seconds per 50,000 objects more than 
an opaque key-value store built on SQLite) even if the IndexedDB implementation 
was asynchronous. The developer would also have a tough time reasoning about 
when index migrations would be finished, since IndexedDB offers no control over 
the migration process and provides no way to modify index memberships directly. 
For those that care about these things, IndexedDB does not provide sufficient 
low-level storage primitives.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-06 Thread Joran Greef
On 06 Apr 2011, at 7:14 PM, Shawn Wilsher wrote:

 On 4/6/2011 9:44 AM, Joran Greef wrote:
 We only need one fixed version of SQLite to be shipped across Chrome, 
 Safari, Opera, Firefox and IE. That in itself would represent a tremendous 
 goal for IndexedDB to target and to try and achieve. When it actually does, 
 and surpasses the fixed version of SQLite, those developers requiring the 
 raw performance and reliability of SQLite could then switch over.
 I don't believe any browser vendor would be interested in shipping two 
 different version of SQLite (one for internal use, and one for the web).  I 
 can say, with certainty, that Mozilla is not.
 
 Cheers,
 
 Shawn

If Mozilla enjoys using the latest version of SQLite (and I assume they are not 
planning on replacing internal SQLite embeddings with IndexedDB - not at this 
stage at least), then web developers deserve the latest version.

Ship the latest version of SQLite (even with the -moz prefix). Developers 
targeting HTML 5 are used to API changes, waiting on browsers and trying to 
reason about broken implementations. The library writers will quickly grow over 
any SQLite version changes should they even ever arise.

Would you run the Mozilla production database on any browser's implementation 
of IndexedDB? How can you expect developers to run their production client code 
on IndexedDB? It's simply not ready and will not be for at least a year or two 
or three. How likely is it that SQLite (given it's history) will remove the 
SELECT, INSERT, UPDATE, DELETE statements before then?


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-06 Thread Joran Greef
On 06 Apr 2011, at 7:24 PM, Tab Atkins Jr. wrote:

 When a security bug is encountered, either the browsers update to a
 new version of sqlite (if it's already been fixed), thus potentially
 breaking sites, or they patch sqlite and then upgrade to the patched
 version, thus potentially breaking sites, or they fork sqlite and
 patch the error only in their forked version, still potentially
 breaking sites but also forking the project.  The only thing that is
 *not* a valid possibility is the browsers staying on the single fixed
 version, thus continuing to expose their users to the security bug.
 
 ~TJ

Browser vendors are moving to shorter and shorter release cycles. People have 
stopped viewing these things through the IE6-here-forever lens. Browsers are 
starting to update themselves automatically, even nightly. If a security issue 
were to be found, it would be highly unlikely that its patch would break any 
SQL interface of SQLite.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-06 Thread Joran Greef
On 06 Apr 2011, at 7:42 PM, Boris Zbarsky wrote:

 On 4/6/11 10:30 AM, Joran Greef wrote:
 If Mozilla enjoys using the latest version of SQLite (and I assume they are 
 not planning on replacing internal SQLite embeddings with IndexedDB - not at 
 this stage at least), then web developers deserve the latest version.
 
 This is not obvious a priori, for what it's worth.

The point was made with reference to Mozilla expecting web developers to run 
production client code on IndexedDB, when Mozilla themselves run production 
code on SQLite.

Boris, Jonas and Shaun, we could talk round and round in circles. It seems 
you're not too concerned by any of the performance and design problems re: 
indexedDB that I have raised. You ask for proposals but it's clear you're not 
sold on these issues. If you were, I am sure you would be among the first to 
provide them.

Do you have real-world experience developing web-based applications, targeting 
mobile and desktop, with offline support for storing, indexing, migrating and 
synchronizing several million objects? Or are we all arguing in the realm of 
conjecture (it should be able to) without having encountered any of these 
issues ourselves, or having any basis for our claims?


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-05 Thread Joran Greef
On 06 Apr 2011, at 2:53 AM, Pablo Castro wrote:

 The goal of IndexedDB has always been to enable things like RelationalDB and 
 CouchDB to be built on top, while maintaining a reasonable level of 
 functionality for those that wanted to use it directly. I really like the 
 idea of thinking of RelationalDB as something that's built as a library on 
 top of IndexedDB. Are there specific tweaks we can make to IndexedDB so it 
 can be a good lower-layer for RelationalDB, such that RelationalDB could be 
 built as a pure JavaScript library?
 
 Thanks
 -pablo

1. Treat object values as opaque (necessary to avoid 
deserialization/serialization overhead, this is mandatory for storing anything 
over 50,000 objects on a device like an iPad or iPhone).
2. Enable indices to be modified at time of putting/deleting objects (index 
references provided by application at time of putObject/deleteObject call).
3. Provide a simpler, more powerful locking mechanism, opaque to IndexedDB, to 
provide finer-grained application-specific locking (i.e. have we just entered 
into a sync process with the master database).

If I may say so, it does seem odd that some would advocate the difficulties of 
speccing merely the interface of something like SQLite, and then advise others 
to suggest re-implementing it entirely. If there was a specific BTree API in 
the browser and a powerful asynchronous sLocalStorage mechanism this might be 
something for the brave, but IndexedDB is a little too tightly coupled to it's 
own interface agenda at the moment to make this goal possible.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-04 Thread Joran Greef
On 04 Apr 2011, at 4:39 PM, Jonas Sicking wrote:

 Hence it would still be the case that we would be relying on the
 SQLite developers to maintain a stable SQL interpretation...

SQLite has a fantastic track record of maintaining backwards compatibility.

IndexedDB has as yet no track record, no consistent implementations, no 
widespread deployment, only measurably poor performance and a lukewarm indexing 
and querying API.

If anything it's the other way round. You have yet to convince developers that 
IndexedDB will be faster, more stable, more powerful, more memory efficient 
than SQLite and with better test coverage at that.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-04 Thread Joran Greef
On 04 Apr 2011, at 5:26 PM, Keean Schupke wrote:

 This is ignoring the possibility that something like RelationalDB could be 
 used, where a well defined common subset of SQL can be used (and I use 
 well-defined in the formal sense). This would allow a relatively thin wrapper 
 on top of most SQL implementations and would allow SQLite (or BDB) to be used 
 as the backend.

Yes, if an implementation of RelationalDB arrives which is solid and fast with 
support for set operations that would be great. The important thing is that we 
have two competing APIs (and preferably a strong API with a great track record).


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-04 Thread Joran Greef
On 04 Apr 2011, at 6:10 PM, Mikeal Rogers wrote:

 it's not very hard to write the abstraction you're talking about on top of 
 IndexedDB, and until you do it i'm going to have a hard time taking you 
 seriously because it's clearly doable.


You assume I have not written the abstraction I am talking about on top of 
IndexedDB?

 the constructs in IndexedDB are pretty low level but sufficient if you know 
 how to implement databases. performance is definitely an issue, but making 
 these constructs faster would be much easier than trying to tweak an off the 
 shelf SQL implementation to your use case.


How exactly would you make a schema-enforcing interface faster than a stateless 
interface?

How would you implement application-managed indices on top of IndexedDB without 
being slower than SQLite?

How would you implement set operations on indices in IndexedDB without being 
slower or less memory efficient than SQLite?

How would you create an index on an existing object store in IndexedDB 
containing more than 50,000 objects on an iPad, without incurring any object 
deserialization/serialization overhead, without being an order of magnitude 
slower than SQLite, and without bringing the iPad to its knees? If you can do 
it with even one IndexedDB implementation out there then kudos and hats off to 
you. :)

I understand your point of view. I once thought the same. You would think that 
IndexedDB would be more than satisfactory for these things. The question is 
whether IndexedDB provides adequate and performant database primitives, to the 
same degree as SQLite (and of course SQL is merely an interface to database 
storage primitives, I do not recalling saying otherwise).

You can build IndexedDB on top of SQLite (as some browsers are indeed doing), 
but you cannot build SQLite on IndexedDB.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-04 Thread Joran Greef
On 04 Apr 2011, at 6:04 PM, Tab Atkins Jr. wrote:

 It's new.

Do you think it would be wise then to advocate doing away with SQLite before 
IndexedDB has had a chance to prove itself? Surely two competing APIs would be 
the fastest way to bring IndexedDB up to speed?

 Ironically, the poor performance is because it's using sqlite as a
 backing-store in the current implementation.  That's being fixed by
 replacing sqlite.

Yes I am aware of this. There are some design flaws in IndexedDB. For instance, 
it does not regard objects as opaque (as would a typical key-value store), 
which means that creating an index on an existing object store would require 
deserializing/serializing every object therein. Doing that for 50,000 objects 
on an iPad would be breathtaking.

I have written object stores on top of SQLite and they are already an order of 
magnitude faster than IndexedDB with a more powerful and memory efficient API 
to boot.

 Kinda the point, in that the power/complexity of SQL confuses a huge
 number of develoeprs, who end up coding something which doesn't
 actually use the relational model in any significant way, but still
 pays the cost of it in syntax.

I was not referring to SQL but to the underlying primitives exposed through the 
SQL interface. For example, set operations on indices, or the ability to index 
objects with array values.



Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-02 Thread Joran Greef
On Sat, Apr 2, 2011 at 00:42:40, Glenn Maynard wrote:

 You can certainly ask if they're interested in doing so, not for our
 benefit (whoever our means), but for the benefit of the Web as a whole,
 and there's nothing at all rude in asking.  I'd say the opposite: it's rude
 to assume they wouldn't be interested, rather than asking and letting them
 come to their own decision.  (I don't know where the notion of forcing
 them to do anything came from.)

I have been reading up more on the history of SQLite. It is a stellar 
implementation, just to highlight a few points:

1. Most of the SQLite source code is devoted purely to testing and 
verification. An automated test suite runs millions and millions of test cases 
involving hundreds of millions of individual SQL statements and achieves 100% 
branch test coverage.

2. SQLite can also be made to run in minimal stack space (4KiB) and very 
little heap (100KiB), making SQLite a popular database engine choice on memory 
constrained gadgets such as cellphones, PDAs, and MP3 players.

3. Faster than popular client/server database engines for most common 
operations.

4. Supports terabyte-sized databases and gigabyte-sized strings and blobs.

5. The developers continue to expand the capabilities of SQLite and enhance 
its reliability and performance while maintaining backwards compatibility with 
the published interface spec, SQL syntax, and database file format.

It is easier to build a performant IndexedDB on SQLite than to build a 
performant SQLite on IndexedDB. Maybe that is something to think about. 
Developers need working database primitives, more than they need convenience.

There may be conjectural reasons for Mozilla not implementing WebSQL, but the 
track history of SQLite is hard to ignore. Mozilla is already embedding SQLite 
for other uses, and appears to be a sponsor of the project.

SQLite may not be a specification in our sense of the word, but in a Web 
sense of the word, it is so widely deployed already that it would be hard not 
to call it a standard.



Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-02 Thread Joran Greef
 I am incredibly uncomfortable with the idea of putting the
 responsibility of the health of the web in the hands of one project.
 In fact, one of the main reasons I started working at Mozilla was to
 prevent this.
 
 / Jonas

I agree with you. All the more reason to support both WebSQL and IndexedDB. It 
is not a case of either/or. It would be healthy to have competing APIs.


Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-31 Thread Joran Greef
On 31 Mar 2011, at 1:01 AM, Jonas Sicking wrote:

 Anyhow, I do think that the idea of passing in index values at the
 same time as a entry is created/modified is an interesting idea. And I
 have said so in the past on this list. It's definitely something we
 should consider for v2.

 Oh, and if we did this, I wouldn't really know how to support things
 like collations. Neither if you did collations using built in sets of
 locales (like in Pablo's recent proposal), nor if you used some sort
 of callback to do collation.
 
 / Jonas

That's fine. You don't need to figure it out. Just look at how stateless 
databases have done it (or not done it) and do likewise.

I submit to you that there is inadequate understanding of the concerns raised, 
hence the lack of urgency in trying to address them. That there is even a need 
for a V2 is symptomatic of this.

It may be a good idea to start looking at these things not as interesting 
ideas but as essential database concepts.

If someone were trying to build some kind of transactional indexed key value 
store for the web, and they wanted to do a truly great job of it, they would 
certainly want to learn everything they could from databases that have made 
contributions to the field.


Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-31 Thread Joran Greef
On 31 Mar 2011, at 9:53 AM, Jonas Sicking wrote:

 I previously have asked for a detailed proposal, but so far you have
 not supplied one but instead keep referring to other unnamed database
 APIs.

I have already provided an adequate interface proposal for putObject and 
deleteObject.

I have already referenced at least Redis and Tokyo Cabinet as examples of 
stateless database interfaces, on numerous occasions.

 For example, you've asked for callbacks to
 implement collations, but what do we do if those callbacks don't
 return consistent results?

I have not once asked for callbacks, let alone callbacks to implement 
collations. You have jumped to this conclusion from my previous post, and 
missed the point of it entirely.


Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-31 Thread Joran Greef
On 31 Mar 2011, at 9:34 AM, Jeremy Orlow wrote:

 We have made an effort to understand other contributions to the field.
 
 I'm not convinced that these are essential database concepts and having 
 personally spent quite some time working with the API in JS and implementing 
 it, I feel pretty confident that what we have for v1 is pretty solid.  There 
 are definitely some things I wouldn't mind re-visiting or looking at closer, 
 possibly even for v1, but they all seem reasonable to study further for v2 as 
 well.
 
 We've spent a lot of time over the last year and a half talking about 
 IndexedDB.  But now it's shipping in Firefox 4 and soon Chrome 11.  So 
 realistically v1 is not going to change much unless we are convinced that 
 what's there is fundamentally broken.
 
 We intentionally limited the scope of v1, which is why we know there'll be a 
 v2.  We can't solve all the problems at once, and the difficulty of speccing 
 something is typically exponential to the size of the API.
 
 Maybe a constructive way to discuss this would be to look at what use cases 
 will be difficult or impossible to achieve with the current design?

Application-managed indices for starters. I would consider that to be essential 
when designing indexed key/value stores, and I would consider that to be the 
contribution made by almost every other indexed key/value store to date. If we 
have to use IDB the way FriendFeed used MySQL to achieve application-managed 
indices then I would argue that the API is in fact fundamentally broken and 
we would be better off with an embedding of SQLite by Mozilla.

Regarding the difficulty of speccing something is typically exponential to the 
size of the API, if people want to build a Rube Goldberg device then they must 
deal with the spec issues of that.

If we were provided with the primitives for an indexed key/value store with 
application-managed indices (as Nikunj suggested at the time), we would have 
been well out of the starting blocks by now, and issues such as computed 
indexes, indexing array values etc. would have been non-issues.

Summary:

1. There's a problem.
2. It can still be fixed with a minimum of fuss.
3. This requires an adjustment to the putObject and deleteObject interfaces 
(see previous threads).


Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-31 Thread Joran Greef
On 31 Mar 2011, at 12:52 PM, Keean Schupke wrote:

 I totally agree with everything so far...
 
 3. This requires an adjustment to the putObject and deleteObject interfaces 
 (see previous threads).
 
 I disagree that a simple API change is the answer. The problem is 
 architectural, not just a superficial API issue.

Yes, for IndexedDB to be stateless with respect to application schema, one 
would need to:

1. Provide the application with a first-class means to manage indexes at time 
of putting/deleting objects.
2. Treat objects as opaque (remove key path, structured clone mechanisms, 
application must provide an id and JSON value to put/delete calls, reduces 
serialization/deserialization overhead where application already has the object 
as a string).
3. Remove setVersion (redundant, application migrates objects and indexes using 
transactions as it needs to).
4. Remove createIndex.

This would rip so much from the spec as to reduce it to a bunch of tatters, 
defining nothing more than an interface for index/key/value primitives in terms 
of well-established interfaces.

Essentially, we need LocalStorage with asynchronous IO (based on Node's 
callback style), large quota support, and a BTree API. Failing that, a decent 
FileSystem API on which to build these.


Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-31 Thread Joran Greef
On 31 Mar 2011, at 7:27 PM, Jeremy Orlow wrote:

 1. Provide the application with a first-class means to manage indexes at 
 time of putting/deleting objects.
 
 I'm OK with doing this for v1 if the others are.  It doesn't seem like that 
 big of an addition and it would give a decent amount of additional 
 flexibility.

Thanks Jeremy that would be great.

 (reduces serialization/deserialization overhead where application already 
 has the object as a string)
 
 I'm not sure why you think this would reduce overhead.

How long would it take an iPad to JSON deserialize/serialize 500 / 5,000 / 
50,000 / 500,000 / 5,000,000 2KB objects? That's a reasonable device and those 
are reasonable workloads. In it's present state, IndexedDB needs to do this 
every time setVersion is called with a createIndex in there... you see the 
problem is there's no way for the application to control this. The application 
would arguably be able to find better ways of migrating indexes than using key 
paths which necessitate deserialization/serialization to be performed on the 
client. For instance, you could use batch jobs on the server to do this on 
behalf of clients, and this would make sense especially where many 
clients/devices share the same objects. With IndexedDB this is not possible. 
With pure storage primitives it would have been possible. This is just one 
use-case, and for every one of these there will be plenty more.

 Like I said above, although I think we should make it possible to operate 
 more statelessly, I don't see a reason we need to remove stuff like this. 
 Some users will find it more convenient to work this way.

Agreed on both counts. It is clearly too late to remove it now. But it may be a 
good idea in future to keep the focus on providing low-level primitives rather 
than convenience features, since the latter often get in the way of the former.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-03-31 Thread Joran Greef
 This is painful to read.  WebSQL development died because SQLite, the most 
 widely-deployed database software in the world, was too good?  That sounds 
 like a catastrophic failure of the W3C process.
 
 -- 
 Glenn Maynard

Hear.

I am starting to think that Mozilla will step up and provide an embedding of 
SQLite, even if it has to only think of it as such. It will have to.

People would rather use a working database than something crippled albeit 
specced (see LocalStorage or IndexedDB).

It was things like XHR in all their unspecced glory that brought the web to 
where it is today.


Re: Mail List Etiquette [Was: WebSQL] Any future plans, or has IndexedDB replaced WebSQL?]

2011-03-31 Thread Joran Greef
Thank you Art.

To clarify, I have heard from a contributor to the specification in question 
who referred to LocalStorage himself as little more than a toy, expressing 
his frustrations at the specification. It is well known that most LocalStorage 
implementations do not support more than 10mb, some load the entire contents 
into memory synchronously on first access, and there were some issues around 
locking that were not addressed as far as I recall. LocalStorage does not work 
as advertised. Many developers, including myself, got excited, spent hours with 
it, only to see these issues left unresolved. It would be true to say that most 
LocalStorage implementations are crippled in this sense. No one need be 
offended since specification and implementation are two separate things. I do 
wish however, that the specification would have addressed large quota support, 
and encouraged certain implementation practices, and in this sense I feel that 
not enough was done. The same with WebSQL. And recently I learned that IDB 
prevents applications from managing indices? These things are disappointing to 
us developers. I think we have a right to be critical on these issues where 
criticism is due. If the specification is inadequate, or burdened by politics, 
we should be free to say so (respectfully and professionally of course, but 
also honestly and directly and with the right measure of urgnency), without 
fear of offending anyone or being policed for it.

Joran Greef

On 31 Mar 2011, at 9:37 PM, Arthur Barstow wrote:

 This is painful to read.  WebSQL development died because SQLite, the most 
 widely-deployed database software in the world, was too good?  That sounds 
 like a catastrophic failure of the W3C process.
 
 -- 
 Glenn Maynard
 Hear.
 
 I am starting to think that Mozilla will step up and provide an embedding of 
 SQLite, even if it has to only think of it as such. It will have to.
 
 People would rather use a working database than something crippled albeit 
 specced (see LocalStorage or IndexedDB).
 
 It was things like XHR in all their unspecced glory that brought the web to 
 where it is today.

Joran - as one of the moderators of public-webapps, I find your comments above 
offensive to those that work on the specs you mention.

All - this is a reminder that all e-mails on this list are expected to be 
respectful and professional.

Please see the following for more information about the etiquette and usage of 
this list:

  http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/1216.html

-Regards, Art Barstow






Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-31 Thread Joran Greef
On 31 Mar 2011, at 10:07 PM, Shawn Wilsher wrote:

 On 3/31/2011 11:47 AM, Joran Greef wrote:
 Let those who introduced these design flaws be among the first to take 
 responsibility and fix them.
 You aren't being constructive, and that's a surefire way to be ignored.  You 
 have yet to convince the working group that these are design flaws in the 
 first place.
 
 /sdwilsh

Agreed. I am actively using the API with real-world data and I am providing 
feedback. You are welcome to use it or not. It is not for me to convince 
anyone. As I said, if people think there is a problem, let those who introduced 
it fix it.

Joran Greef


Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-26 Thread Joran Greef
 On 26 Mar 2011, at 10:14 AM, Nikunj Mehta wrote:
 
 What is the minimum that can be in IDB? I am guessing the following:
 
 1. Sorted key-opaque value transactional store
 2. Lookup of keys by values (or parts thereof)

Yes, this is what we need. In programmer speak: objects (opaque strings), sets 
(hash indexes), sorted sets (range indexes).

 I know of no efficient way of doing callbacks with JS. Moreover, avoiding 
 indices completely seems to miss the point.

Callbacks are unnecessary. This is what you would want to do as a developer 
using the current form of IDB:

objectStore.putObject({ name: Joran, emails: [jo...@gmail.com, 
jo...@ronomon.com] }, { id: 'arbitraryObjectIdProvidedByTheApplication', 
indexes: [emails=jo...@gmail.com, emails=jo...@ronomon.com, name=Joran] 
});

IDB would then store the user object using the id provided by the application, 
and make sure it's referenced by this id in the emails=jo...@gmail.com, 
emails=jo...@ronomon.com, name=Joran index references provided (creating 
these indexes along the way if need be). The application is responsible for 
passing in the extra id and indexes options to putObject.

Supporting range indexes would be a question of expanding the above to let the 
developer pass in a sort score along with the index reference.

 Next, originally, I also had floated the idea of application managed indices, 
 but implementors thought of it as cruft.

I can understand how application managed indices would lead to less work on the 
part of the spec committee. There seems to be some perverse human 
characteristic that likes to make easy things difficult. Ships will sail around 
the world but the Flat Earth Society will flourish.

 I, for one, am not enamored by key paths. However, I am also morbidly aware 
 of the perils in JS land when using callback like mechanisms. Certainly, I 
 would like to hear from developers like you how you find IDB if you were to 
 not use any createIndex at all. Or at least that you would like to manage 
 your own indices.

I am begging to be able to manage my indices. I know my data. I do not want to 
use any createIndex to declare indexes in advance of when I may or may not use 
them. What advantage would that give me? I want to create/update indexes only 
when I put or delete objects and I want to have control over which indexes to 
update accordingly. With one small change to the putObject and deleteObject 
interfaces, in the form of the indexes option, we can make that possible.

We need these primitives in IDB: opaque strings, sets, sorted sets. Ideally, 
IDB need simply store these things and provide the standard interfaces (see 
Redis) to them along with a transactional mechanism. That's the perfect 
low-level API on which to build almost any database wrapper.


[IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-20 Thread Joran Greef

 On 20 Mar 2011, at 4:54 AM, Jonas Sicking wrote:
 
 I don't understand what you are saying about application state though,
 so please do start that as a separate thread.

At present, there's no way for an application to tell IDB what indexes to 
modify w.r.t. an object at the exact moment when putting or deleting that 
object. That's because this behavior is defined in advance using createIndex 
in a setVersion transaction. And then how IDB extracts the referenced value 
from the object is done using an IDB idea of key paths. But right there, in 
defining the indexes in advance (and not when the index is actually modified, 
which is when the object itself is modified), you've captured application state 
(data relationships that should be known only to the application) within IDB. 
Because this is done in advance (because IDB seems to have inherited this 
assumption that this is just the way MySQL happens to do it), there's a 
disconnect between when the index is defined and when it's actually used. And 
because of key paths you now need to spec out all kinds of things like how to 
handle compound keys, multiple values. It's becoming a bit of a spec-fest.

That this bubble of state gets captured in IDB, it also means that IDB now 
needs to provide ways of updating that captured state within IDB when it 
changes in the application (which will happen, so essentially you now have your 
indexing logic stuck in the database AND in the application and the application 
developer now has to try and keep BOTH in sync using this awkward pre-defined 
indexes interface), thus the need for a setVersion transaction in the first 
place. None of this would be necessary if the application could reference 
indexes to be modified (and created if they don't exist, or deleted if they 
would then become empty) AT THE POINT of putting or deleting an object. Things 
like data migrations would also be better served if this were possible since 
this is something the application would need to manage anyway. Do you follow?

The application is the right place to be handling indexing logic. IDB just 
needs to provide an interface to the indexing implementation, but not handle 
extracting values from objects or deciding which indexes to modify. That's the 
domain of the application. It's a question of encapsulation. IDB is crossing 
the boundaries by demanding to know ABOUT the data stored, and not just 
providing a simple way to put an object, and a simple way to put a reference to 
an object to an index, and a simple way to query an index and intersect or 
union an index with another. Essentially an object and its index memberships 
need to be completely opaque to IDB and you are doing the opposite. Take a look 
at the BDB interface. Do you see a setVersion or createIndex semantic in there? 
Take a look at Redis and Tokyo and many other things. Do you see a setVersion 
or createIndex semantic in there? Do these databases have any idea about the 
contents of objects? Any concept of key paths? No, and that's the whole reason 
these databases were created in the first place. I'm sure you have read the BDB 
papers. Obviously this is not the approach of MySQL. But if IDB is trying to be 
MySQL but saying it wants to be BDB then I don't know. In any event, Firefox 
would be brave to also embed SQLite. Let the better API win.

How much simpler could it be? At the end of the day, it's all objects and sets 
and sorted sets, and see Redis' epiphany on this point. IDB just needs to 
provide transactional access to these sets. The application must decide what 
goes in and out of these sets, and must be able to do it when it wants to, not 
some time in advance. I bring this up because I once wrote the exact same kind 
of database that you are writing now (where one thinks it would be good if the 
database did NOT treat objects as opaque... that the database should be smart 
about the contents of objects and share control for how objects relate to each 
other etc.) and I have since seen how much better, simpler, faster the 
alternative is. So unless you have formidable reasons for maintaining the 
status quo in light of the above, even if you don't understand this concept of 
application state getting stuck in IDB, and even though you advocate that 
WebSQL is not deprecated and that we can consider LocalStorage to be an 
alternative, then it is my hope that you will heed this and make something of 
it. I'm sorry if this is not the kind of feedback you want to hear at this 
stage, but IDB needs to be good for more than just HTML 5 todo list demos.


Re: [IndexedDB] Compound and multiple keys

2011-03-19 Thread Joran Greef
 On 16 Mar 2011, at 7:59 PM, Jonas Sicking wrote:
 
 The best way to do this is likely to start a new thread (as the changes you 
 are
 suggesting isn't limited to Compound and multiple keys), and put a
 draft proposal there.
 
 It by no means has to be perfect (it took us a long time to polish IDB
 into what it is today), but it needs to be more detailed than what you
 are saying above.
 

More thoughts:

Firstly, my proposal for handling compound and multiple keys has already been 
put forward in a previous thread (i.e. adding the option to specify indexes to 
be modified when putting/deleting objects) so I see no need to create yet 
another thread.

Secondly, in terms of IDB storing parts of application state, it is clear that 
this is a problem that needs to be addressed. I think you have said as much 
yourself? If so, then those drafting the IDB specification must take 
responsibility for fixing this, since it is an issue they created in the first 
place. Unless, of course they do not really believe it to be an issue, in which 
case it would be a filibuster to ask for a draft proposal.


Re: [IndexedDB] Compound and multiple keys

2011-03-16 Thread Joran Greef
 On 3/9/2011 09:45:51 Shawn Wilsher wrote:
 
 That makes sense since the original proposal was heavily based on BDB. 
 It's shifted a bit as we have made tweaks to improve it for the web.
 
 Cheers
 
 Shawn

I agree. If I may add my two cents worth: one thing that IDB has not yet 
learned from BDB is statelessness. At the moment IDB requires a bit of 
application state to be mixed up in IDB (i.e. by predefining indexes as opposed 
to allowing the application to specify indexes to be modified when putting or 
deleting objects). So it's not a pure data+indexes store, it's actually a 
data+indexes+application state store. This is making IDB more complex than it 
needs to be and is making the IDB interface less powerful (things like compound 
keys etc. would already be possible if IDB were stateless). For instance, if 
IDB is to store application state, then the spec needs to define what happens 
when the application state changes. If IDB were stateless, this would not be 
necessary. After the web having had no options for offline storage for so many 
years, it is probably safe to say that web applications do not need help with 
things like migrations, pre-defined schemas or anything fancy or helpful like 
that, they just need a pure data+indexes solution (but they need this to be 
comprehensive: at least set operations supported on indexes, and indexes 
defined by the application when putting or deleting objects and NOT before). In 
my honest opinion, IDB is not yet there and from the discussions does not seem 
to be headed in that direction. It's trying to make unnecessary things easy 
when it really needs to be just a powerful low-level data store with 
first-class indexing. I'm not sure how many users of IDB are actively involved 
in this discussion, but after spending hours on it over the past few months, 
and having built databases over LocalStorage and WebSQL, as a real-world user, 
may I ask that these concerns begin to be addressed?


Re: [IndexedDB] Compound and multiple keys

2011-03-16 Thread Joran Greef
 On 16 Mar 2011, at 7:59 PM, Jonas Sicking wrote:
 
 It seems like you are suggesting pretty big changes. The best way to
 do this is likely to start a new thread (as the changes you are
 suggesting isn't limited to Compound and multiple keys), and put a
 draft proposal there.

Not necessarily. Adding the option to specify indexes to be modified when 
putting or deleting an object would go a long way already, solving the problem 
of compound and multiple keys in the process.

The next step after that, supporting compose-able set operations on indexes, 
would take some work, in terms of figuring out the best interface for doing it, 
hopefully keeping it fairly tightly coupled to the standard set operations 
themselves.

 It by no means has to be perfect (it took us a long time to polish IDB
 into what it is today), but it needs to be more detailed than what you
 are saying above.

Will do. The proposed changes have the potential to reduce the spec and 
implementation of IDB. The problem of IDB being exposed to a dose of 
application state certainly needs to be addressed.

 Also, I should mention that time is running out on major changes. We
 already have two database APIs, WebSQL and IDB, (three if you count
 localStorage), so there both needs to be significant advantages over
 the already existing APIs, and you would make yourself a favor by
 acting fast as the other specifications are gaining momentum literally
 by the day.
 
 / Jonas

Do you really consider LocalStorage to be a database and what do you mean by 
database then? And how can you say that we have a database API in WebSQL if 
it is currently deprecated? Are there plans afoot to embed SQLite in Firefox? 
That would be a great idea by the way.

As far as I am aware, LocalStorage cannot be used as a database. I have tried. 
Most browsers do not permit more than 10mb and do not provide a means for the 
user to adjust storage quota. Browsers provide no locking mechanism (although 
you could simulate a lock service on top of LocalStorage if you could tolerate 
the latency) and some implementations (Safari as far as I can recall) load the 
entire contents of LocalStorage into memory on first access, blocking the UI. 
As you know, WebSQL is deprecated and only available in WebKit and Opera. 
Chrome as far as I am aware provides no mechanism to adjust WebSQL quota limits.

So that means we actually only have one potential cross-browser database API 
(and not three as you have stated), and that is IDB. It may be a good idea to 
slow down and get it right.




Re: [IndexedDB] Two Real World Use-Cases

2011-03-07 Thread Joran Greef
On 08 Mar 2011, at 7:23 AM, Dean Landolt wrote:

 This doesn't seem right. Assuming your WebSQL implementation had all the same 
 indexes isn't it doing pretty much the same things as using separate 
 objectStores in IDB? Why would it be an order of magnitude slower? I'm sure 
 whatever implementation you're using hasn't seen much optimization but you 
 seem to be implying there's something more fundamental? The only thing I can 
 think of to blame would be the fat in the objectStore interface -- like, for 
 instance, the index building facilities. It seems to me your proposed 
 solution is to add yet more fat to the interface (more complex indexing), but 
 wouldn't it be just as suitable to instead strip down objectStores to their 
 bare essentials to make them more suitable to act as indexes? Then the 
 indexing functionality and all the hard decisions could be punted to 
 libraries where they'd be free to innovate.

Exactly. It's not what one would expect, and indication of the poor state of 
the IDB implementation (which is essentially a wrapper around SQLite anyway).

If someone is advising that object stores be used to handle indexes then may I 
be the first to raise a red flag and say that IDB is failing us (and it would 
have been better for the spec team to provide a locking mechanism for 
LocalStorage so it could be used in that way). The whole point of IDB as far as 
I can see is to provide transactional indexed access to a key value store.

 Why? You wouldn't necessarily have to store the whole object in each index, 
 just the index key, a value and some pointer to the original source object. 
 Something to resolve this pointer to the source would need to be spec'd (a la 
 couchdb's include_docs), but that's simple. Even better, say it were possible 
 to define a link relation on an object store that can resolve to its source 
 object -- you could define a source link relation and the property to use -- 
 and this would have the added bonus of being more broadly applicable than 
 just linking an index record to its source instance.

Think of the object creation and JSON serialization/deserialization overhead 
for putting 50 indexes and you have got more than enough waste there already.

 We can fix all of this right now very simply:
 
 1. Enable objectStore.put and objectStore.delete to accept a setIndexes 
 option and an unsetIndexes option. The value passed for either option would 
 be an array (string list) of index references.
 
 This would only work for indexes arrays of strings, right? Things can get 
 much more complicated than that, and when they do you'd have to use an 
 objectStore to do your indexing anyway, right?

No it would work for pretty much anything. The application would be free to 
determine the indexes, and also to convert query parameters into indexes when 
querying. It's essentially computed indexes without the hassles of IDB trying 
to do it (there was an interesting thread last year on the challenges of 
storing am index computing function in IDB).

 Why is it more theoretically performant than using objectStores in the raw?

It's a more direct interface. Think about it for a second. Using objectStores 
in the raw is interpolating O(n) complexity with multiple function calls, to 
give just one reason. If IDB can receive a list of indexes to add and remove an 
object to and from, then it can also do things like perform a set difference 
first to save unnecessary IO. I have written a database or two with this 
technique and it's certainly faster.

 I don't necessarily understand the stateful vs. stateless distinction here. I 
 don't see how your proposed solution removes the requirement for IDB to 
 enforce constraints when certain indexes are present. Developers would 
 already be able to use IDB statefully (with predefined schemas) -- they'd 
 just use a library that has a schema mechanism. I doubt such a library for 
 IDB already exists, but it'd be quite easy to port perstore, for instance, 
 which is derived from the IDB API and already has this functionality using 
 json-schema. There will no doubt be many ORM-like libraries that will pop up 
 as soon as IDB starts to stabilize (or as soon as it gets a node.js 
 implementation).

The trouble is you always think a database would be quite easy until you 
actually try to do it yourself. At first when I dug into IDB I didn't think 
there would be any problems that could not be handled in some way. I have 
actually switched back to WebSQL now and will encourage my users to use Safari 
or Chrome as long as these browsers support WebSQL (and I hope Chrome will at 
least finish up by adding a quota interface for WebSQL). IDB right now is like 
a completely neutered slower SQLite without any of the benefits to be expected 
of a transactional indexed KV store. It's really sad.

For examples of stateless databases see the interfaces for Redis (the best 
example, and a perfect target for IDB), Berkeley, Tokyo. For a statefull 

Re: [IndexedDB] Two Real World Use-Cases

2011-03-06 Thread Joran Greef
 On 05 Mar 2011, at 3:50 AM, Jonas Sicking wrote:
 
 What we do need to do sooner rather than later though is allowing
 multiple index values for a given entry using arrays. We also need to
 add support for compound keys. But lets deal with those issues in a
 separate thread.

Multiple index values for a given entry using arrays, as well as compound keys, 
can be handled by letting the application provide an array of index references 
when putting or deleting objects. There is no need to make a Rube Goldberg 
device out of it.

Regards

Joran Greef


Re: [IndexedDB] Two Real World Use-Cases

2011-03-03 Thread Joran Greef
Hi Jonas

I have been trying out your suggestion of using a separate object store to do 
manual indexing (and so support compound indexes or index object properties 
with arrays as values).

There are some problems with this approach:

1. It's far too slow. To put an object and insert 50 index records (typical 
when updating an inverted index) this way takes 100ms using IDB versus 10ms 
using WebSQL (with a separate indexes table and compound primary key on index 
name and object key). For instance, my application has a real requirement to 
replicate 4,000,000 emails between client and server and I would not be 
prepared to accept latencies of 100ms to store each object. That's more than 
the network latency.

2. It's a waste of space.

Using a separate object store to do manual indexing may work in theory but it 
does not work in practice. I do not think it can even be remotely suggested as 
a panacea, however temporary it may be.

We can fix all of this right now very simply:

1. Enable objectStore.put and objectStore.delete to accept a setIndexes option 
and an unsetIndexes option. The value passed for either option would be an 
array (string list) of index references.

2. The object would first be removed as a member from any indexes referenced by 
the unsetIndexes option. Any referenced indexes which would be empty thereafter 
would be removed.

3. The object would then be added as a member to any indexes referenced by the 
setIndexes option. Any referenced indexes which do not yet exist would be 
created.

This would provide the much-needed indexing capabilities presently lacking in 
IDB without sacrificing performance.

It would also enable developers to use IDB statefully (MySQL-like pre-defined 
schemas with the DB taking on the complexities of schema migration and data 
migration) or statelessly (See Berkeley DB with the application responsible for 
the complexities of data maintenance) rather than enforcing an assumption at 
such an early stage.

Regards

Joran Greef


[IndexedDB] Two Real World Use-Cases

2011-03-01 Thread Joran Greef
I have been following the development behind IndexedDB with interest. Thank you 
all for your efforts.

I understand that the initial version of IndexedDB will not support indexing 
array values.

May I suggest an alternative derived from my home-brew server database evolved 
from experience using MySql, WebSql, LocalStorage, CouchDb, Tokyo Cabinet and 
Redis?

1. Be able to put an object and pass an array of index names which must 
reference the object. This may remove the need for a complicated indexing spec 
(perhaps the reason why this issue has been pushed into the future) and give 
developers all the flexibility they need.

2. Be able to intersect and union indexes. This covers a tremendous amount of 
ground in terms of authorization and filtering.

These two needs are critical.

Without them, I will either carry on using WebSql for as long as possible, or 
be forced to use IndexedDb as a simple key value store and layer my own 
indexing on top.

I am writing an email application and have to deal with secondary indexes of up 
to 4,000,000 keys. It would not be ideal to do intersects and unions on these 
indexes in the application layer.

Regards

Joran Greef


Re: [IndexedDB] Two Real World Use-Cases

2011-03-01 Thread Joran Greef
On 01 Mar 2011, at 7:27 PM, Jeremy Orlow wrote:

 1. Be able to put an object and pass an array of index names which must 
 reference the object. This may remove the need for a complicated indexing 
 spec (perhaps the reason why this issue has been pushed into the future) and 
 give developers all the flexibility they need.
 
 You're talking about having multiple entries in a single index that point 
 towards the same primary key?  If so, then I strongly agree, and I think 
 others agree as well.  It's mostly a question of syntax.  A while ago we 
 brainstormed a couple possibilities.  I'll try to send out a proposal this 
 week.  I think this + compound keys should probably be our last v1 features 
 though.  (Though they almost certainly won't make Chrome 11 or Firefox 4, 
 unfortunately, hopefully they'll be done in the next version of each, and 
 hopefully that release with be fairly soon after for both.)

Yes, for example this user object { name: Joran Greef, emails: 
[jo...@ronomon.com, jorangr...@gmail.com] } with indexes on the emails 
property, would be found in the jo...@ronomon.com index as well as in the 
jorangr...@gmail.com index.

What I've been thinking though is that the problem even with formally 
specifying indexes in advance of object put calls, is that this pushes too much 
application model logic into the database layer, making the database enforce a 
schema (at least in terms of indexes). Of course IDB facilitates migrations in 
the form of setVersion, but most schema migrations are also coupled with 
changes to the data itself, and this would still have to be done by the 
application in any event. So at the moment IDB takes too much responsibility on 
behalf of the application (computing indexes, pre-defined indexes, pseudo 
migrations) and not enough responsibility for pure database operations (index 
intersections and index unions).

I would argue that things like migrations and schema's are best handled by the 
application, even if this is more work for the application, as most people will 
write wrappers for IDB in any event and IDB is supposed to be a core-level API. 
The acid-test must be that the database is oblivious to schemas or anything 
pre-defined or application-specific (i.e. stateless). Otherwise IDB risks being 
a database for newbies who wouldn't use it, and a database that others would 
treat as a KV anyway (see MySQL at FriendFeed).

A suggested interface then for putting or deleting objects, would be: 
objectStore.put(object, [indexname1, indexname2, indexname3]) and then 
IDB would need to ensure that the object would be referenced by the given index 
names. When removing the object, the application would need to provide the 
indexes again (or IDB could keep track of the indexes associated with an 
object).

Using a function to compute indexes would not work as this would entrap 
application-specific schema knowledge within the function (which would need to 
be persisted) and these may subsequently change in the application, which would 
then need a way to modify the function again. The key is that these things must 
be stateless.

The objects must be opaque to IDB (no need for serialization/deserialization 
overhead at the DB layer). Things like key-paths etc. could be removed and the 
object id just passed in to put or delete calls.

 2. Be able to intersect and union indexes. This covers a tremendous amount of 
 ground in terms of authorization and filtering.
 
 Our plan was to punt some sort of join language to v2.  Could you give a more 
 concrete proposal for what we'd add?  It'd make it easier to see if it's 
 something realistic for v1 or not.

If you can perform intersect or union operations (and combinations of these) on 
indexes (which are essentially sets or sorted sets), then this would be the 
join language. It has the benefit that the interface would then be described in 
terms of operations on data structures (set operations on sets) rather than a 
custom language which would take longer to spec out.

I've written databases over append-only files, S3, WebSQL and even LocalStorage 
(!) and from what I've found with my own applications, you could handle 
everything from multi-tenant authorization to adequate filtering with the 
following operations:

1. intersect([ index1, index2 ])
2. union([ index1, index2 ])
3. intersect([ union([ index1, index2 ]), index3, index4, index5, index6, 
index7 ])

Hopefully, a join language described in terms of pure set operations would be 
much simpler to implement and easier to use and reason with.

In fact I think if IDB offered only a single object store and an indexing 
system described above, it would be completely perfect. That's all that's 
needed. No need for a V2. Just a focus on high-performance thereafter.




Web Storage Mutex

2009-12-11 Thread Joran Greef
The use of the storage mutex to avoid race conditions is currently considered 
by certain implementors to be too high a performance burden, to the point where 
allowing data corruption is considered preferable. Alternatives that do not 
require a user-agent-wide per-origin script lock are eagerly sought after.

It's not a question of mutex versus data corruption, but of implementation:

Database storage is served by SQLite. LocalStorage would be better served by 
Tokyo Cabinet: http://1978th.net/tokyocabinet/. I doubt the current 
localStorage implementation is better than the current Tokyo Cabinet 
implementation.

Joran Greef