Re: GCD killed my performance
Just because you are using Core Foundation interfaces doesn’t mean you aren’t getting Objective-C code behind the scenes. On Apr 26, 2014, at 2:44 PM, ChanMaxthon wrote: > Also when you are using custom run loop sources (which is required here) > pretty much all code that is interfacing the run loop is CF code, so no > Objective-C method calls here (and the few remained ones can be IMP-cached so > they are just as fast.) > > Sent from my iPad > >> On Apr 27, 2014, at 5:36 AM, Jens Alfke wrote: >> >> >>> On Apr 26, 2014, at 2:10 PM, ChanMaxthon wrote: >>> >>> Since you are interfacing with database maybe you can use a little >>> transaction interface which is its own thread and run loop. That may be >>> able to cut down your amount of syscalls. That is, not using GCD but old >>> fashioned NSThread, NSRunLoop (and CFRunLoop) and NSCondition. >> >> I’m pretty sure NSRunLoop has more overhead than a dispatch_queue does. It’s >> doing more work and it’s using Obj-C messaging. >> >>> OS X implemented GCD at kernel level which introduced lots of syscalls >>> which are super expensive. The old school method is largely user land so it >>> may help a little by keeping syscalls to a minimum. @synchronized uses >>> Objective-C runtime functions which was once code behind those old school >>> classes. >> >> That's the opposite of what the docs say: "A dispatch semaphore works like a >> traditional semaphore, except that when the resource is available, it takes >> less time to acquire a dispatch semaphore. The reason is that Grand Central >> Dispatch does not call into the kernel for this particular case. It calls >> into the kernel only when the resource is not available and the system needs >> to park your thread until the semaphore is signaled.” —Concurrency >> Programming Guide >> >> —Jens > ___ > > Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) > > Please do not post admin requests or moderator comments to the list. > Contact the moderators at cocoa-dev-admins(at)lists.apple.com > > Help/Unsubscribe/Update your Subscription: > https://lists.apple.com/mailman/options/cocoa-dev/david.duncan%40apple.com > > This email sent to david.dun...@apple.com -- David Duncan ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
On Apr 26, 2014, at 5:36 PM, Jens Alfke wrote: >> OS X implemented GCD at kernel level which introduced lots of syscalls which >> are super expensive. The old school method is largely user land so it may >> help a little by keeping syscalls to a minimum. @synchronized uses >> Objective-C runtime functions which was once code behind those old school >> classes. > > That's the opposite of what the docs say: "A dispatch semaphore works like a > traditional semaphore, except that when the resource is available, it takes > less time to acquire a dispatch semaphore. The reason is that Grand Central > Dispatch does not call into the kernel for this particular case. It calls > into the kernel only when the resource is not available and the system needs > to park your thread until the semaphore is signaled.” —Concurrency > Programming Guide They say this for dispatch semaphores, but it doesn’t necessarily apply to dispatch queues. (If I were less lazy, I suppose I could check the source.) ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
Also when you are using custom run loop sources (which is required here) pretty much all code that is interfacing the run loop is CF code, so no Objective-C method calls here (and the few remained ones can be IMP-cached so they are just as fast.) Sent from my iPad > On Apr 27, 2014, at 5:36 AM, Jens Alfke wrote: > > >> On Apr 26, 2014, at 2:10 PM, ChanMaxthon wrote: >> >> Since you are interfacing with database maybe you can use a little >> transaction interface which is its own thread and run loop. That may be able >> to cut down your amount of syscalls. That is, not using GCD but old >> fashioned NSThread, NSRunLoop (and CFRunLoop) and NSCondition. > > I’m pretty sure NSRunLoop has more overhead than a dispatch_queue does. It’s > doing more work and it’s using Obj-C messaging. > >> OS X implemented GCD at kernel level which introduced lots of syscalls which >> are super expensive. The old school method is largely user land so it may >> help a little by keeping syscalls to a minimum. @synchronized uses >> Objective-C runtime functions which was once code behind those old school >> classes. > > That's the opposite of what the docs say: "A dispatch semaphore works like a > traditional semaphore, except that when the resource is available, it takes > less time to acquire a dispatch semaphore. The reason is that Grand Central > Dispatch does not call into the kernel for this particular case. It calls > into the kernel only when the resource is not available and the system needs > to park your thread until the semaphore is signaled.” —Concurrency > Programming Guide > > —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
Weird. My old code used to be GCD heavy and even deadlocked once but new projects that used old school methods are faster and never locked up. My projects are heavy in network IO. I built a transactional network interface there, which is somehow a clone of Distributed Objects that is based on HTTP. Sent from my iPad > On Apr 27, 2014, at 5:36 AM, Jens Alfke wrote: > > >> On Apr 26, 2014, at 2:10 PM, ChanMaxthon wrote: >> >> Since you are interfacing with database maybe you can use a little >> transaction interface which is its own thread and run loop. That may be able >> to cut down your amount of syscalls. That is, not using GCD but old >> fashioned NSThread, NSRunLoop (and CFRunLoop) and NSCondition. > > I’m pretty sure NSRunLoop has more overhead than a dispatch_queue does. It’s > doing more work and it’s using Obj-C messaging. > >> OS X implemented GCD at kernel level which introduced lots of syscalls which >> are super expensive. The old school method is largely user land so it may >> help a little by keeping syscalls to a minimum. @synchronized uses >> Objective-C runtime functions which was once code behind those old school >> classes. > > That's the opposite of what the docs say: "A dispatch semaphore works like a > traditional semaphore, except that when the resource is available, it takes > less time to acquire a dispatch semaphore. The reason is that Grand Central > Dispatch does not call into the kernel for this particular case. It calls > into the kernel only when the resource is not available and the system needs > to park your thread until the semaphore is signaled.” —Concurrency > Programming Guide > > —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
On Apr 26, 2014, at 2:10 PM, ChanMaxthon wrote: > Since you are interfacing with database maybe you can use a little > transaction interface which is its own thread and run loop. That may be able > to cut down your amount of syscalls. That is, not using GCD but old fashioned > NSThread, NSRunLoop (and CFRunLoop) and NSCondition. I’m pretty sure NSRunLoop has more overhead than a dispatch_queue does. It’s doing more work and it’s using Obj-C messaging. > OS X implemented GCD at kernel level which introduced lots of syscalls which > are super expensive. The old school method is largely user land so it may > help a little by keeping syscalls to a minimum. @synchronized uses > Objective-C runtime functions which was once code behind those old school > classes. That's the opposite of what the docs say: "A dispatch semaphore works like a traditional semaphore, except that when the resource is available, it takes less time to acquire a dispatch semaphore. The reason is that Grand Central Dispatch does not call into the kernel for this particular case. It calls into the kernel only when the resource is not available and the system needs to park your thread until the semaphore is signaled.” —Concurrency Programming Guide —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
You can write your own dispatch queue by using a CFRunLoopSource and add an NSCondition to it when you need it to be synchronous (implementing dispatch_sync) Sent from my iPad > On Apr 27, 2014, at 4:55 AM, Quincey Morris > wrote: > >> On Apr 26, 2014, at 12:02 , Kyle Sluder wrote: >> >> FWIW, I’ve been of the opinion for a while that including dispatch_sync in >> the API was a mistake. Too often it becomes a crutch used without >> understanding, resulting in stop-start behavior across threads or cores. > > I don’t know if you’ll agree, but it seems to me that there’s a distinction > between a *locking* mechanism such as @synchronized, and a *queuing* > mechanism, which Jens seems to have demonstrated dispatch_sync to be. > > I understand that both mechanisms may at a lower level depend on both queues > and locks, but the distinction I’m making is that a locking mechanism is used > when we hope that the lock will generally be granted without contention, > while a queuing mechanism is used when we expect there will generally be some > contending operation in progress. > > ___ > > Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) > > Please do not post admin requests or moderator comments to the list. > Contact the moderators at cocoa-dev-admins(at)lists.apple.com > > Help/Unsubscribe/Update your Subscription: > https://lists.apple.com/mailman/options/cocoa-dev/xcvista%40me.com > > This email sent to xcvi...@me.com ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
Since you are interfacing with database maybe you can use a little transaction interface which is its own thread and run loop. That may be able to cut down your amount of syscalls. That is, not using GCD but old fashioned NSThread, NSRunLoop (and CFRunLoop) and NSCondition. OS X implemented GCD at kernel level which introduced lots of syscalls which are super expensive. The old school method is largely user land so it may help a little by keeping syscalls to a minimum. @synchronized uses Objective-C runtime functions which was once code behind those old school classes. Sent from my iPad > On Apr 27, 2014, at 4:55 AM, Quincey Morris > wrote: > >> On Apr 26, 2014, at 12:02 , Kyle Sluder wrote: >> >> FWIW, I’ve been of the opinion for a while that including dispatch_sync in >> the API was a mistake. Too often it becomes a crutch used without >> understanding, resulting in stop-start behavior across threads or cores. > > I don’t know if you’ll agree, but it seems to me that there’s a distinction > between a *locking* mechanism such as @synchronized, and a *queuing* > mechanism, which Jens seems to have demonstrated dispatch_sync to be. > > I understand that both mechanisms may at a lower level depend on both queues > and locks, but the distinction I’m making is that a locking mechanism is used > when we hope that the lock will generally be granted without contention, > while a queuing mechanism is used when we expect there will generally be some > contending operation in progress. > > ___ > > Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) > > Please do not post admin requests or moderator comments to the list. > Contact the moderators at cocoa-dev-admins(at)lists.apple.com > > Help/Unsubscribe/Update your Subscription: > https://lists.apple.com/mailman/options/cocoa-dev/xcvista%40me.com > > This email sent to xcvi...@me.com ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
On Apr 26, 2014, at 12:02 , Kyle Sluder wrote: > FWIW, I’ve been of the opinion for a while that including dispatch_sync in > the API was a mistake. Too often it becomes a crutch used without > understanding, resulting in stop-start behavior across threads or cores. I don’t know if you’ll agree, but it seems to me that there’s a distinction between a *locking* mechanism such as @synchronized, and a *queuing* mechanism, which Jens seems to have demonstrated dispatch_sync to be. I understand that both mechanisms may at a lower level depend on both queues and locks, but the distinction I’m making is that a locking mechanism is used when we hope that the lock will generally be granted without contention, while a queuing mechanism is used when we expect there will generally be some contending operation in progress. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
> On Apr 25, 2014, at 10:35 AM, Seth Willits wrote: > >> On Apr 25, 2014, at 8:08 AM, Jens Alfke wrote: >> >> I’m ending up at the opposite of the received wisdom, namely: >> * dispatch_sync is a lot cheaper than dispatch_async >> * only use dispatch_async if you really need to, or for an expensive >> operation, because it will slow down all your dispatch_sync calls > > Saying "dispatch_async will slow down *all* dispatch_sync calls" throws up > red flags for me. > > In my mind there's still a possibility that You're Doing It Wrong. FWIW, I’ve been of the opinion for a while that including dispatch_sync in the API was a mistake. Too often it becomes a crutch used without understanding, resulting in stop-start behavior across threads or cores. The correct solution is usually to rewrite the code in continuation-passing style, even if that means a cascade of blocks. Microsoft realized this when designing WinRT and _removed all the synchronous APIs_. They also added the async/await keywords to C# to avoid the seventeen-levels-of-indentation problem that plagues ObjC code. Some people at Apple agree that use of dispatch_sync is a code smell, but that if they didn't include it a hundred different re-implementations would arise, all differently broken. --Kyle Sluder ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
On Apr 25, 2014, at 5:42 PM, Roland King wrote: > They can be very useful finding places where everything blocks waiting for > one piece of code to execute, or you ping madly from thread-to-thread, > queue-to-queue. Thanks, that sounds very useful. I’lll give it a try when I dive back into this. Today I got the GCD-enabled code almost back up to the performance it had on iOS before it was thread-safe and parallel. Still counts as a victory, because with more cores (as on my Mac) it’s a lot faster. Some of the speed came from not using dispatch_async for trivial blocks, some from fixing bugs in my code, some from allocating fewer objects… Slowly chipping away at it and making little gains one at a time. —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
Have you clicked the 'strategy' buttons at the top left of instruments, I for some reason can only do this after I've recorded a trace, not during? They are easily overlooked but show per-core or per-thread traces. They can be very useful finding places where everything blocks waiting for one piece of code to execute, or you ping madly from thread-to-thread, queue-to-queue. There are stack traces available at each event. It's a good tool for understanding how dispatch queues are being used and finding out whether your dispatch_sync really is staying on the same CPU or not. On 25 Apr, 2014, at 11:08 pm, Jens Alfke wrote: > > On Apr 25, 2014, at 1:11 AM, Jonathan Taylor > wrote: > >> Have you looked at the output from System Trace on both systems? I often >> find that to be informative. > > OK, I tried this and it did turn out to be very informative :) even though I > don’t know how to interpret any of the numbers. But just the pretty charts > alone told the story: > - With @synchronized there was very little activity in the System Calls or > Scheduling tracks. > - With GCD there was a whole ton of activity. > I was surprised there’s this much of a difference, because there’s no actual > concurrency in the code at this point! In the commit I’ve rolled back to, all > I’ve done is taken my existing single-threaded code and wrapped the C calls > with either @synchronized or dispatch_sync. My understanding is that while > dispatch_sync is technically switching to a different dispatch queue, if > there isn’t any contention it will just do some bookkeeping and run the block > on the same thread’s stack. So in this case I wouldn’t expect there to be any > actual thread switching going on; except there is. > > … So then I searched the project for “dispatch_async” and found that there > was actually _one_ call to it, so my statement about “no actual concurrency” > above was a lie. The block it runs doesn’t really need to be async; I was > just running it that way because I didn’t need it to complete right away. I > changed that call to dispatch_sync, and voila! Almost all the thread > scheduling and system calls went away; the system trace now looks like the > @synchronized one, and the benchmark times are now slightly better than > @synchronized! > > I guess this makes sense: dispatch_sync is super cheap in the uncontended > case, but if there’s a dispatch_async pending, then that one obviously has to > run first, and it’s probably been scheduled onto another thread, so the > dispatch_sync has to either queue onto that thread or at least do some > more-expensive locking to wait for the other thread to finish the async call. > > I’m ending up at the opposite of the received wisdom, namely: > * dispatch_sync is a lot cheaper than dispatch_async > * only use dispatch_async if you really need to, or for an expensive > operation, because it will slow down all your dispatch_sync calls > > I wish there were a big fat super-dense O’Reilly or Big Nerd Ranch book about > GCD so I didn’t have to figure all this out on my own... > > —Jens > ___ > > Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) > > Please do not post admin requests or moderator comments to the list. > Contact the moderators at cocoa-dev-admins(at)lists.apple.com > > Help/Unsubscribe/Update your Subscription: > https://lists.apple.com/mailman/options/cocoa-dev/rols%40rols.org > > This email sent to r...@rols.org ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
On Apr 25, 2014, at 8:08 AM, Jens Alfke wrote: > I’m ending up at the opposite of the received wisdom, namely: > * dispatch_sync is a lot cheaper than dispatch_async > * only use dispatch_async if you really need to, or for an expensive > operation, because it will slow down all your dispatch_sync calls Saying "dispatch_async will slow down *all* dispatch_sync calls" throws up red flags for me. In my mind there's still a possibility that You're Doing It Wrong. You haven't completely outlined exactly what you've been doing, so it's very hard to offer any solid advice. For example, you didn't mention which queues you're dispatching too, how often your syncs are happen relative to the async, which threads they're being dispatch from, what in particular is happening within them, didn't show any instrument traces… I don't doubt you've found something, but your conclusion doesn't paint the whole picture. And as a matter of fact, I think your first email shows this: "On my MacBook Pro this gave me a nice speedup of 50% or more." If it was all down to async universally slowing down all dispatch_sync calls, then wouldn't you expect it to be slower there too? It seems to me you need a better theory as to why the change you made worked. But really, we're flying blind here. -- Seth Willits ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
On Apr 25, 2014, at 1:11 AM, Jonathan Taylor wrote: > Have you looked at the output from System Trace on both systems? I often find > that to be informative. OK, I tried this and it did turn out to be very informative :) even though I don’t know how to interpret any of the numbers. But just the pretty charts alone told the story: - With @synchronized there was very little activity in the System Calls or Scheduling tracks. - With GCD there was a whole ton of activity. I was surprised there’s this much of a difference, because there’s no actual concurrency in the code at this point! In the commit I’ve rolled back to, all I’ve done is taken my existing single-threaded code and wrapped the C calls with either @synchronized or dispatch_sync. My understanding is that while dispatch_sync is technically switching to a different dispatch queue, if there isn’t any contention it will just do some bookkeeping and run the block on the same thread’s stack. So in this case I wouldn’t expect there to be any actual thread switching going on; except there is. … So then I searched the project for “dispatch_async” and found that there was actually _one_ call to it, so my statement about “no actual concurrency” above was a lie. The block it runs doesn’t really need to be async; I was just running it that way because I didn’t need it to complete right away. I changed that call to dispatch_sync, and voila! Almost all the thread scheduling and system calls went away; the system trace now looks like the @synchronized one, and the benchmark times are now slightly better than @synchronized! I guess this makes sense: dispatch_sync is super cheap in the uncontended case, but if there’s a dispatch_async pending, then that one obviously has to run first, and it’s probably been scheduled onto another thread, so the dispatch_sync has to either queue onto that thread or at least do some more-expensive locking to wait for the other thread to finish the async call. I’m ending up at the opposite of the received wisdom, namely: * dispatch_sync is a lot cheaper than dispatch_async * only use dispatch_async if you really need to, or for an expensive operation, because it will slow down all your dispatch_sync calls I wish there were a big fat super-dense O’Reilly or Big Nerd Ranch book about GCD so I didn’t have to figure all this out on my own... —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
Have you looked at the output from System Trace on both systems? I often find that to be informative. That might or might not be the way to tell, but have you considered that the very different CPU characteristics might mean that the actual timing and pattern of database commands is different on the iPhone, resulting in e.g. a different level of contention for the queues? Suppose dispatch_sync is fast on an empty queue but has additional overhead on a full queue - resulting in different performance on one platform to another? On 25 Apr 2014, at 04:42, cocoa-dev-requ...@lists.apple.com wrote: > I’m writing an Objective-C API around a database library, and trying to add > some optimizations. There’s a lot of room for parallelizing, since tasks like > indexing involve a combination of I/O-bound and CPU-bound operations. As a > first step, I made my API thread-safe by creating a dispatch queue and > wrapping the C database calls in dispatch_sync blocks. Then I did some > reorganization of the code so different parts run on different queues, > allowing I/O and computation to run in parallel. > > On my MacBook Pro this gave me a nice speedup of 50% or more. > > But when I tested the code on my iPhone 5 today, I found performance had > dropped by about a third. Profiling shows that most of the time is being > spent in thread/queue management or Objective-C refcount bookkeeping. It > looks as though adding GCD introduced a lot of CPU overhead, and the two > cores on my iPhone aren’t enough to make up for that, while the eight cores > in my MacBook Pro make it worthwhile. > > I tried backing out all the restructuring of my code, so there’s no actual > parallelism going on, just the dispatch_sync calls. Predictably, performance > is even worse; slightly more than half as fast as without them. > > So, I’m pretty disappointed. I know that dispatch queues aren’t free, but I > wasn’t expecting them to be this expensive! I’m not doing anything silly like > wrapping dispatch_sync around trivial calls. The APIs I’m using it on do > things like reading and writing values from the persistent store. I was > expecting the cost of thread-safety to be lost in the noise compared to that. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
On Apr 24, 2014, at 22:49 , Jens Alfke wrote: > It is, but most of it appears to be memory management _caused_ by GCD, since > it goes away when I replace the dispatch calls with @synchronized. GCD is > apparently causing a lot of blocks to get copied to the heap. Well, you know what you’re seeing in Instruments, but this characterization seems improbable. Ignoring the GCD-specific entries, if block copy was causing a large number of retains and releases (if that’s what you meant by “refcount bookkeeping”), I’d expect there to be a large number of block copies, too. If block copies (as I’d also expect) are much more time-consuming than the retains/releases, by comparison you’d barely notice the retain/release times in Instruments. If the retains/releases caused by block copies do dominate, that suggests the block copies are comparatively very cheap, which in turn suggests a horrible bug in block copies. With luck someone might jump in with a plausible answer, but this is starting to sound TSI-worthy. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
On Apr 24, 2014, at 10:30 PM, Quincey Morris wrote: > Approaching this naively, this result suggests that the block content, while > not trivial, is too fine-grained — is divided too finely. For example, if > you’re putting (essentially) one database read/write operation (or even a > handful) in each block, perhaps that’s too small a unit of work for GCD. I’m coming to that conclusion. I thought that a file-based b-tree lookup would be complex enough to drown out that overhead, but maybe not. > a physical disk access is presumably much slower on average than an access to > iOS persistent memory. (Of course, if your MacBook Pro is purely SSD, that > might blow that idea out of the water. I don’t know how SSD access speeds > compare to iOS memory.) No, you’ve got it backwards. My 2012 MacBook Pro has a damn fast SSD (one of the Apple on-the-motherboard ones), but the flash storage on iOS devices is really slow, slower than a decent hard disk. I interpret the difference in performance as showing that, while there’s a lot of extra CPU overhead to using GCD, it’s still a net win when it lets you spread your code out over eight cores instead of one. But when there are only two cores, it seems to be not worth the expense. > Unless I missed something, all of the responses in this thread went to the > GCD issue. But if memory management is showing up “hot” like that too, there > may be something else to investigate. It is, but most of it appears to be memory management _caused_ by GCD, since it goes away when I replace the dispatch calls with @synchronized. GCD is apparently causing a lot of blocks to get copied to the heap. —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
On Apr 24, 2014, at 20:14 , Jens Alfke wrote: > On my MacBook Pro this gave me a nice speedup of 50% or more. > > But when I tested the code on my iPhone 5 today, I found performance had > dropped by about a third. > I know that dispatch queues aren’t free, but I wasn’t expecting them to be > this expensive! I’m not doing anything silly like wrapping dispatch_sync > around trivial calls. The APIs I’m using it on do things like reading and > writing values from the persistent store. Approaching this naively, this result suggests that the block content, while not trivial, is too fine-grained — is divided too finely. For example, if you’re putting (essentially) one database read/write operation (or even a handful) in each block, perhaps that’s too small a unit of work for GCD. That idea is given *some* plausibility by the different outcomes on Mac and iPhone — a physical disk access is presumably much slower on average than an access to iOS persistent memory. (Of course, if your MacBook Pro is purely SSD, that might blow that idea out of the water. I don’t know how SSD access speeds compare to iOS memory.) > Profiling shows that most of the time is being spent in thread/queue > management or Objective-C refcount bookkeeping. Unless I missed something, all of the responses in this thread went to the GCD issue. But if memory management is showing up “hot” like that too, there may be something else to investigate. One option might be to look at what Instruments reports for the hotspots in terms of *counts* rather than times, and compare OS X with iOS. This would necessitate arranging things so you could get counts for known increments of work. If iOS gives comparable counts but much longer times, you will have one way of proceeding; if it gives much larger counts but similar or smaller times per count, you will have another way of proceeding. ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
Is there any way to batch up more work to do in each block? Then your ratio of real work to overhead would go up. On Apr 25, 2014, at 12:35 AM, Jens Alfke wrote: > > On Apr 24, 2014, at 9:04 PM, Dave Fernandes > wrote: > >> What’s the CPU utilization? Are you actually getting full use of them, or >> are your threads blocked waiting for something? > > Fairly high — I think 175% or so (out of 200% possible). The problem is that > a large fraction of that is taken up with busywork. In the CPU profile list > from Instruments, the top 25 or so stack frames are all system > infrastructure; you have to read down quite a ways to find any application > code at all. > > —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
On Apr 24, 2014, at 9:04 PM, Dave Fernandes wrote: > What’s the CPU utilization? Are you actually getting full use of them, or are > your threads blocked waiting for something? Fairly high — I think 175% or so (out of 200% possible). The problem is that a large fraction of that is taken up with busywork. In the CPU profile list from Instruments, the top 25 or so stack frames are all system infrastructure; you have to read down quite a ways to find any application code at all. —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
What’s the CPU utilization? Are you actually getting full use of them, or are your threads blocked waiting for something? On Apr 24, 2014, at 11:14 PM, Jens Alfke wrote: > I’m writing an Objective-C API around a database library, and trying to add > some optimizations. There’s a lot of room for parallelizing, since tasks like > indexing involve a combination of I/O-bound and CPU-bound operations. As a > first step, I made my API thread-safe by creating a dispatch queue and > wrapping the C database calls in dispatch_sync blocks. Then I did some > reorganization of the code so different parts run on different queues, > allowing I/O and computation to run in parallel. > > On my MacBook Pro this gave me a nice speedup of 50% or more. > > But when I tested the code on my iPhone 5 today, I found performance had > dropped by about a third. Profiling shows that most of the time is being > spent in thread/queue management or Objective-C refcount bookkeeping. It > looks as though adding GCD introduced a lot of CPU overhead, and the two > cores on my iPhone aren’t enough to make up for that, while the eight cores > in my MacBook Pro make it worthwhile. > > I tried backing out all the restructuring of my code, so there’s no actual > parallelism going on, just the dispatch_sync calls. Predictably, performance > is even worse; slightly more than half as fast as without them. > > So, I’m pretty disappointed. I know that dispatch queues aren’t free, but I > wasn’t expecting them to be this expensive! I’m not doing anything silly like > wrapping dispatch_sync around trivial calls. The APIs I’m using it on do > things like reading and writing values from the persistent store. I was > expecting the cost of thread-safety to be lost in the noise compared to that. > > Any suggestions on what to try next? > > —Jens > ___ > > Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) > > Please do not post admin requests or moderator comments to the list. > Contact the moderators at cocoa-dev-admins(at)lists.apple.com > > Help/Unsubscribe/Update your Subscription: > https://lists.apple.com/mailman/options/cocoa-dev/dave.fernandes%40utoronto.ca > > This email sent to dave.fernan...@utoronto.ca ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
On Apr 24, 2014, at 8:42 PM, Ken Thomases wrote: > You may be aware of this, but dispatch_sync() is not necessary or even > particularly relevant to thread-safety. The use of a serial queue or, > possibly, a reader/write mechanism using barriers, is what achieves thread > safety. Initial experimentation showed that dispatch_async was significantly slower than dispatch_sync. This makes sense because dispatch_async has to copy the block (thus allocating an object on the heap and retaining any captured object variables) while dispatch_sync can get away with running the block before the call returns, which avoids all that overhead. > Using a synchronous call is only necessary if your API has synchronous > semantics. For example, if a call provides immediate results to the caller. > Reading from a database would typically have to be synchronous, but writing > to it can often be asynchronous. Yeah, I was torn about making the write calls async. But in the underlying C API both the read and write calls return error codes, since there could be disk or memory errors, and I didn’t want to ignore the return codes on the write functions. (My mama didn’t raise no boys to skip proper error handling.) —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
On Apr 24, 2014, at 10:14 PM, Jens Alfke wrote: > I’m writing an Objective-C API around a database library, and trying to add > some optimizations. There’s a lot of room for parallelizing, since tasks like > indexing involve a combination of I/O-bound and CPU-bound operations. As a > first step, I made my API thread-safe by creating a dispatch queue and > wrapping the C database calls in dispatch_sync blocks. You may be aware of this, but dispatch_sync() is not necessary or even particularly relevant to thread-safety. The use of a serial queue or, possibly, a reader/write mechanism using barriers, is what achieves thread safety. Using a synchronous call is only necessary if your API has synchronous semantics. For example, if a call provides immediate results to the caller. Reading from a database would typically have to be synchronous, but writing to it can often be asynchronous. All you care about is that future reads will always see what was previously written, which the serial queue or barriers will guarantee. That doesn't necessarily have any bearing on the overhead of GCD vs. the resources of an iPhone, but I thought I'd point it out. Regards, Ken ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
Re: GCD killed my performance
Follow-up: I tried replacing every instance of dispatch_sync(_queue, ^{ … }); with @synchronized(self) { … } Things got faster again — looks like @synchronized is a few percent slower than no thread-safety, but _significantly_ faster than dispatch_sync. Which seems to contradict what the GCD overview says about dispatch queues being faster than regular locking techniques. I looked at the disassembly, and @synchronized compiles into calls to objc_sync_enter() and objc_sync_exit(), which in turn call pthread_mutex_lock and pthread_mutex_unlock; Instruments shows all these functions consuming nearly zero CPU time during my benchmark. As opposed to with GCD, where the dispatch-queue runtime calls were most of the hottest code in the entire run. I’m not sure what’s going on here. GCD seems to be pretty well respected by people I trust (I read Mike Ash’s blog posts about it pretty thoroughly while doing my refactoring, for example) and yet my experience with it so far is that the overhead is too high to make all the fun queue-and-block-based programming worthwhile, at least on iOS. :( —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com
GCD killed my performance
I’m writing an Objective-C API around a database library, and trying to add some optimizations. There’s a lot of room for parallelizing, since tasks like indexing involve a combination of I/O-bound and CPU-bound operations. As a first step, I made my API thread-safe by creating a dispatch queue and wrapping the C database calls in dispatch_sync blocks. Then I did some reorganization of the code so different parts run on different queues, allowing I/O and computation to run in parallel. On my MacBook Pro this gave me a nice speedup of 50% or more. But when I tested the code on my iPhone 5 today, I found performance had dropped by about a third. Profiling shows that most of the time is being spent in thread/queue management or Objective-C refcount bookkeeping. It looks as though adding GCD introduced a lot of CPU overhead, and the two cores on my iPhone aren’t enough to make up for that, while the eight cores in my MacBook Pro make it worthwhile. I tried backing out all the restructuring of my code, so there’s no actual parallelism going on, just the dispatch_sync calls. Predictably, performance is even worse; slightly more than half as fast as without them. So, I’m pretty disappointed. I know that dispatch queues aren’t free, but I wasn’t expecting them to be this expensive! I’m not doing anything silly like wrapping dispatch_sync around trivial calls. The APIs I’m using it on do things like reading and writing values from the persistent store. I was expecting the cost of thread-safety to be lost in the noise compared to that. Any suggestions on what to try next? —Jens ___ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com