On Thu, Nov 5, 2009 at 2:01 PM, John Abd-El-Malek <j...@chromium.org> wrote:
> > > On Thu, Nov 5, 2009 at 1:42 PM, Jeremy Orlow <jor...@chromium.org> wrote: > >> On Thu, Nov 5, 2009 at 1:34 PM, John Abd-El-Malek <j...@chromium.org>wrote: >> >>> >>> >>> On Thu, Nov 5, 2009 at 1:15 PM, Jeremy Orlow <jor...@chromium.org>wrote: >>> >>>> On Mon, Nov 2, 2009 at 9:50 PM, John Abd-El-Malek <j...@chromium.org>wrote: >>>> >>>>> Over the last week, I've been making some changes to how threads are >>>>> used in the browser process, with the goal of simplifying cross-thread >>>>> access and improving stability. >>>>> >>>>> *The problem* >>>>> We were using a number of patterns that were problematic: >>>>> >>>>> 1) using ChromeThread::GetMessageLoop >>>>> -this isn't safe, since it could return a valid pointer, but since >>>>> the caller isn't holding on to a lock anymore, the target MessageLoop >>>>> could >>>>> be destructed while it's being used >>>>> >>>>> 2) caching of MessageLoop pointers in order to use them later for >>>>> PostTask and friends >>>>> -this was more efficient previously (more on that later) since using >>>>> ChromeThread::GetMessageLoop involved a lock >>>>> -but it spread logic about the order of thread destruction all over >>>>> the code. Code that moved from the IO thread to the file thread and back, >>>>> in order to avoid doing disk access on the IO thread, ended up having to >>>>> do >>>>> an extra hop through the UI thread on the way back to the IO thread since >>>>> the file thread outlives the Io thread. Of course, most code learnt this >>>>> the hard way, after doing the straight forward IO->file->IO thread hop and >>>>> updating the code after seeing reliability or user crashes >>>>> -it made the browser shutdown fragile and hence difficult to update >>>>> >>>>> 3) file thread hops using RefCountedThreadSafe objects which have >>>>> non-trivial destructors >>>>> -to reduce jank, frequently an object on the UI or IO thread would >>>>> execute some code on the file thread, then post the result back to the >>>>> original thread. We make this easy using RefCountedThreadSafe and >>>>> NewRunnableMethod so this pattern happened often, but it's not always >>>>> safe: NewRunnableMethod holds an extra reference on the object to ensure >>>>> that it doesn't invoke a method on a deleted object. But it's quite >>>>> possible that before the file thread's stack unwinds and it releases the >>>>> extra reference, that the response task on the original thread executed >>>>> and >>>>> released its own additional reference. The file thread is then left with >>>>> the remaining reference and the object gets destructed there. While for >>>>> most objects this is ok, many have non-trivial destructors, with the worst >>>>> being ones that register with the per-thread NotificationService. >>>>> Dangling >>>>> pointers would be left behind and tracking these crashes from ChromeBot or >>>>> the crash dumps has wasted several days at least for me. >>>>> >>>>> 4) having browser code take different code paths if a thread didn't >>>>> exist >>>>> -this could be either deceptively harmless (i.e. execute >>>>> synchronously when it was normally asynchronous), when in fact it makes >>>>> shutdown slower because disk access is moved to the UI thread >>>>> -it could lead to data loss, if tasks are silently not posted because >>>>> the code assumes this only happens in unit tests, when it could occur on >>>>> browser shutdown as well >>>>> >>>>> >>>>> *The solution* >>>>> 1+2: Use ChromeThread::PostTask and friends (i.e. PostDelayedTask, >>>>> DeleteSoon, ReleaseSoon) which are safe and efficient: no locks are >>>>> grabbed >>>>> if the target thread is known to outlive the current thread. The four >>>>> static methods have the same signature as the ones from MessageLoop, with >>>>> the addition of the first parameter to indicate the target thread. >>>>> >>>>> ChromeThread::PostTask(ChromeThread::FILE, FROM_HERE, task); >>>>> >>>>> 3: If your object must be destructed on a specific thread, use a trait >>>>> from ChromeThread: >>>>> >>>>> class Foo : public base::RefCountedThreadSafe<Foo, >>>>> ChromeThread::DeleteOnIOThread> >>>>> >>>> >>>> This is really cool and a great idea, but I'm a little concerned about >>>> what happens when the thread has already been torn down. It seems that it >>>> calls DeleteSoon which calls PostNonNestableTask which calls >>>> PostTaskHelper. >>>> PostTaskHelper deletes the task if the thread is shut down. >>>> >>>> This works fine if something is only supposed to touch 2 threads. But >>>> what if 2 threads simultaneously delete something which is supposed to be >>>> deleted on a third thread that's already been shut down? >>>> >>> >>> I'm not sure which object that you're referring to? The PostTaskHelper >>> will delete the task. But if you have a DeleteSoon task, deleting the task >>> (i.e. because the target thread is gone) doesn't delete the object. >>> >> >> So this means the memory will just be leaked? That won't work for DOM >> Storage since the backing database is shut down in destructors. >> > > If the object doesn't need to be closed on a specific thread, and you want > to ensure that it's deleted on any thread, then ReleaseSoon and > ChromeThread::DeleteOnIOThread aren't going to serve your need. You'll > probably want to have either a singleton or an object like > ResourceDispatcherHost that BrowserProcess can delete directly. > >> >> As for simultaneously deleting an object, if more than 1 thread are >>> accessing an object, they should each have a reference to it. They can't >>> ensure that releasing their reference will cause deletion, since there could >>> be other refences. >>> >> >> Sure, but what if the ref count is not thread safe? >> > > If different threads have references to an object, it should derive from > RefCountedThreadSafe. Doing manual ref counting on one thread and handing > out pointers to other threads is recipe for disaster.. > > >> Or they access an object that's not thread safe. I often do this because >> I only touch those members on specific threads, but if I >> can't guarantee that only one thread could be deleting an object at a time, >> things fall on the floor. >> > > How exactly is your object deleted on more than one thread at a time? Do > you mean that one will try to delete it, and it has a lock and so the second > will early return? But back to the original point, what exactly is breaking > down here? > You're not understanding me correctly, but I think I have my answer. Feel free to ping me in person if you want me to explain what's going on. > > >> >> And what if they both call a method on another class that's not thread >>>> safe? >>>> >>> >>> If they use NewRunnableMethod on an object that's not thread safe, then >>> the result task would only execute on the target thread. If the task >>> couldn't execute because the target thread is gone, then the method won't be >>> invoked. >>> >>> >>>> I ask because this scenario is going to prevent me from using this for >>>> DOM Storage. >>>> >>>> A possible solution would be to guarantee that, if the thread is dead, >>>> destruction happens on the UI thread. At least in this case, I think it >>>> would work. >>>> >>>> Thoughts? >>>> >>> >>> DeleteOnIOThread and the other variants were added specifically to >>> enforce that an object is deleted on a specific thread. Without this, as an >>> example objects that use the NotificationService (which is per thread) won't >>> be able to unregister themselves, which would lead to corruption later since >>> the service will call out to deleted objects. So no, we wouldn't want to >>> delete these objects on the UI thread because the problem would still exist. >>> >> >> My question was based on the assumption I laid out above. Apparently >> that's not true and the memory is just leaked? >> >> >>> >>>> >>>>> 4: I've removed all the special casing and always made the objects in >>>>> the browser code behave in one way. If you're writing a unit test and >>>>> need >>>>> to use an object that goes to a file thread (where before it would proceed >>>>> synchronously), you just need: >>>>> >>>>> ChromeThread file_thread(ChromeThread::FILE, MessageLoop::current()); >>>>> foo->StartAsyncTaskOnFileThread(); >>>>> MessageLoop::current()->RunAllPending(); >>>>> >>>>> There are plenty of examples now in the tests. >>>>> >>>>> >>>>> *Gotchas* >>>>> -PostTask will silently delete a task if the thread doesn't exist. >>>>> This is done to avoid having all the code that uses it have special cases >>>>> if the target thread exists or not, and to make Valgrind happy. As usual, >>>>> the task for DeleteSoon/ReleaseSoon doesn't do anything in its destructor, >>>>> so this won't cause unexpected behavior with them. But be wary of posted >>>>> Task objects which have non-trivial destructors or smart pointers as >>>>> members. I'm still on the fence about this, since while the latter is >>>>> theoretical now, it could lead to problems in the future. I might change >>>>> it >>>>> so that the tasks are not deleted when I'm ready for more Valgrind fun. >>>>> >>>>> If you absolutely must know if a task was posted or not, you can check >>>>> the return value of PostTask and friends. But note that even if the task >>>>> was posted successfully, there's no guarantee that it will run because the >>>>> target thread could already have a QuitTask queued up. >>>>> >>>>> g_browser->io_thread() and file_thread() are still around (the former >>>>> for IPC code, and the latter for Linux proxy code which is in net and so >>>>> can't use ChromeThread). Don't use them unless absolutely necessary. >>>>> >>>>> >>>>> *The Catch* >>>>> Ensuring we don't fall back into our old patterns through code you >>>>> write and review. >>>>> >>>>> >>>>> *More information* >>>>> http://code.google.com/p/chromium/issues/detail?id=25354 >>>>> >>>>> >>>>> >>>>> >>>> >>> >> > --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---