On Mon, Mar 23, 2009 at 10:41 AM, Jeremy Orlow <jor...@google.com> wrote:
> Ian, are you OK with #1? (I ask since I think you're the only one who's > been concerned with it so far.) > Yes > > On Sat, Mar 21, 2009 at 1:08 PM, Michael Nordman <micha...@chromium.org>wrote: > >> +chromium-dev. >> >> On Sat, Mar 21, 2009 at 9:30 AM, John Abd-El-Malek <j...@chromium.org>wrote: >> >>> >>> >>> On Fri, Mar 20, 2009 at 7:04 PM, Jeremy Orlow <jor...@chromium.org>wrote: >>> >>>> *If you don't care where various bits of the localStorage >>>> implementation live and you aren't scared about letting stuff out of the >>>> sandbox, you can stop reading now.* >>>> >>>> * >>>> * >>>> Background: >>>> >>>> For those who don't know the spec by heart: SessionStorage can be >>>> thought of as 'tab local' storage space for each origin. LocalStorage is >>>> shared across all browser windows of the same origin and is persistent. >>>> All >>>> data is stored in key/value pairs where both the key and value are strings. >>>> It's possible to subscribe to DOM storage events. Events and ease of use >>>> are why a developer might use localStorage even though the database >>>> interface exists. The exact spec is here: >>>> http://dev.w3.org/html5/webstorage/ >>>> >>>> >>>> *Where should the localStorage implementation live? >>>> * >>>> >>>> I'm planning on implementing localStorage very soon within Chromium. >>>> Unfortunately, how to do this is not very clearcut. Here are all the >>>> possibilities I know of so far: (Note that I'm intentionally ignoring the >>>> backing file format for now...as that debate will partially depend on how >>>> it's implemented.) >>>> >>>> 1) The most obvious solution is to have have the browser process keep >>>> track of the key/values for each origin and write it to disk. The problem >>>> with this approach is that we're allowing user supplied data to exist in >>>> memory (possibly the stack at times, though we could probably avoid this if >>>> we tried) outside of a sandbox. Ian Fette (and I'm sure others) have >>>> pretty >>>> big reservations for this reason. That said, this is definitely the >>>> simplest and cleanest solution, so if we can figure out something that >>>> we're >>>> confident with security wise, this is how I'd like to do it. >>>> >>> >>> I really don't see the big issue here. We already do this with renderer >>> supplied data such as FORMs, POST, even really long URLs. The main point is >>> to ensure that we don't trust that data. >>> >> > Agreed. (I meant to point this out and forgot.) > > There should bey very little being done with the data within the browser > process. There won't be any need to trust or even interpret the data--the > browser process should just be a conduit. > Yeah, I might have been overly concerned when I talked with Jeremy as I think I mis-interpreted exactly what was being done in the browser process. Sorry about that. > > >>> >>>> >>>> 2) What follows from #1 is simply pulling all the localStorage code >>>> into its own (sandboxed) process. The problem is that, unless a lot of the >>>> internet starts using localStorage, it seems disproportionately heavy >>>> weight. Starting it on demand and killing it off if localStorage hasn't >>>> been used for a while would mitigate. >>>> >>>> 3) A completely different solution is to use shared memory + the code >>>> recently written to pass file handles between processes. The shared memory >>>> would be used to coordinate between processes and to store key/val data. >>>> One render process for each origin will take responsibility for syncing >>>> data >>>> to disk. Event notifications can occur either via IPC (though sharing >>>> key/val data can NOT for latency/responsiveness reasons) or shared >>>> memory--whichever is easier. Obviously the chief problem with this is >>>> memory usage. I'm sure it'll also be more complex and have a greater >>>> bug/exploit cross section. >>>> >>>> 4) A variation of #3 would be to keep all key/val data in the file and >>>> only use shared memory for locking (if necessary). I'm not going to >>>> discuss >>>> the implementation details because I don't want us to get hung up on them, >>>> but the general idea would be for each process to have an open file handle >>>> for their origin(s) and somehow (shared memory, flock, etc) coordinate with >>>> the other processes. This will almost certainly be slower than memory (if >>>> nothing else, due to system calls) but it'll use less memory and possibly >>>> be >>>> easier to make secure. >>>> >>>> 5) One last option is to layer the whole thing on top of the HTML 5 >>>> database layer. Unfortunately, there's no efficient way for this layer to >>>> support events. Even hooking directly into sqlite won't work since its >>>> triggers layer apparently only notifies you (i.e. works) if the >>>> insert/delete/update happens in your own process. Of course sqlite can be >>>> the backing for any other option, but please, let's hold off on that >>>> discussion for now. >>>> >>> >>> It seems that either way you have to build your own custom notification >>> system in order to alert all renderer processes if a url they have loaded >>> has updated storage values. Why not use sqlite in each renderer process >>> then, with this system build on top of it? >>> >> > Well, that's the other option, but I think it makes the system a lot more > complicated. In addition, #1 fits in much better with webkit's existing > model for localStorage (though I'm still not sure if that means I'll be able > to reuse most of the code or not). > > Also, one central broker (i.e. the browser process) will make handling > http://dev.w3.org/html5/webstorage/#threads correctly much less complex. > Basically, events, locking, and the storing of data will all be done in the > same code paths with the same messages, rather than having > 3 separate systems. (I know this is kind of hand-wavy, but I've given it > some thought, and I really do think it'll blow up complexity wise. If > anyone is doubtful and wants me to explain further, I can try.) > > >> >>> >>>> >>>> >>>> *So here are my questions:* >>>> >>>> How paranoid should we be about passing a user created string to the >>>> browsing process and having it send the data on to the renderer and some >>>> backend like sqlite? >>>> >>> >> Good question. >> >> 100% of the untrusted web content that ends up in sandboxed processes ends >> up flowing thru the browser process, and much of it gets cached on disk. >> Every page and embedded resource, script generated values of in form posts, >> cookie strings etc, all of it is resides in the process browser for a time >> and on disk. This content is no different... so how paranoid... since its >> not interpreted, we shouldn't be overly concerned with this. >> >> This does not share the same security concerns as the Database or as >> Workers because the untrusted content is not interpreted in a way that >> carries security risks in a trusted process. In the Database case, we don't >> want to interpret untrusted SQL commands out of the sandbox. Ditto workers + >> script. >> >> >>> >>>> Do we trust sqlite enough to use it outside of a sandbox? (Hopefully, >>>> because we're already doing this, right? If not are there other mechanisms >>>> for storing the data on disk that we do trust?) >>>> >>>> Would we feel more comfortable with #1 if the renderer processes somehow >>>> mangled the keys and values before sending them out? For example, they >>>> could base64 encode them or even do something non-deterministic so that >>>> attackers have no guarantee about what the memory would look like that's >>>> passing through the browser process? >>>> >>>> >>>> And, most importantly, which option seems best to you? (Or is there an >>>> option 6 that I missed?) I'd rank them 1, 2, 4, 3 personally. >>>> >>> >> #1 >> >> >>> >>>> >>>> >>>> >>> >> >> >> >> > --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---