Re: GCD killed my performance

2014-04-26 Thread David Duncan
Just because you are using Core Foundation interfaces doesn’t mean you aren’t 
getting Objective-C code behind the scenes.

On Apr 26, 2014, at 2:44 PM, ChanMaxthon  wrote:

> Also when you are using custom run loop sources (which is required here) 
> pretty much all code that is interfacing the run loop is CF code, so no 
> Objective-C method calls here (and the few remained ones can be IMP-cached so 
> they are just as fast.)
> 
> Sent from my iPad
> 
>> On Apr 27, 2014, at 5:36 AM, Jens Alfke  wrote:
>> 
>> 
>>> On Apr 26, 2014, at 2:10 PM, ChanMaxthon  wrote:
>>> 
>>> Since you are interfacing with database maybe you can use a little 
>>> transaction interface which is its own thread and run loop. That may be 
>>> able to cut down your amount of syscalls. That is, not using GCD but old 
>>> fashioned NSThread, NSRunLoop (and CFRunLoop) and NSCondition.
>> 
>> I’m pretty sure NSRunLoop has more overhead than a dispatch_queue does. It’s 
>> doing more work and it’s using Obj-C messaging.
>> 
>>> OS X implemented GCD at kernel level which introduced lots of syscalls 
>>> which are super expensive. The old school method is largely user land so it 
>>> may help a little by keeping syscalls to a minimum. @synchronized uses 
>>> Objective-C runtime functions which was once code behind those old school 
>>> classes.
>> 
>> That's the opposite of what the docs say: "A dispatch semaphore works like a 
>> traditional semaphore, except that when the resource is available, it takes 
>> less time to acquire a dispatch semaphore. The reason is that Grand Central 
>> Dispatch does not call into the kernel for this particular case. It calls 
>> into the kernel only when the resource is not available and the system needs 
>> to park your thread until the semaphore is signaled.” —Concurrency 
>> Programming Guide
>> 
>> —Jens
> ___
> 
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
> 
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
> 
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/david.duncan%40apple.com
> 
> This email sent to david.dun...@apple.com

--
David Duncan


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-26 Thread Dave Fernandes

On Apr 26, 2014, at 5:36 PM, Jens Alfke  wrote:

>> OS X implemented GCD at kernel level which introduced lots of syscalls which 
>> are super expensive. The old school method is largely user land so it may 
>> help a little by keeping syscalls to a minimum. @synchronized uses 
>> Objective-C runtime functions which was once code behind those old school 
>> classes.
> 
> That's the opposite of what the docs say: "A dispatch semaphore works like a 
> traditional semaphore, except that when the resource is available, it takes 
> less time to acquire a dispatch semaphore. The reason is that Grand Central 
> Dispatch does not call into the kernel for this particular case. It calls 
> into the kernel only when the resource is not available and the system needs 
> to park your thread until the semaphore is signaled.” —Concurrency 
> Programming Guide

They say this for dispatch semaphores, but it doesn’t necessarily apply to 
dispatch queues. (If I were less lazy, I suppose I could check the source.)
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-26 Thread ChanMaxthon
Also when you are using custom run loop sources (which is required here) pretty 
much all code that is interfacing the run loop is CF code, so no Objective-C 
method calls here (and the few remained ones can be IMP-cached so they are just 
as fast.)

Sent from my iPad

> On Apr 27, 2014, at 5:36 AM, Jens Alfke  wrote:
> 
> 
>> On Apr 26, 2014, at 2:10 PM, ChanMaxthon  wrote:
>> 
>> Since you are interfacing with database maybe you can use a little 
>> transaction interface which is its own thread and run loop. That may be able 
>> to cut down your amount of syscalls. That is, not using GCD but old 
>> fashioned NSThread, NSRunLoop (and CFRunLoop) and NSCondition.
> 
> I’m pretty sure NSRunLoop has more overhead than a dispatch_queue does. It’s 
> doing more work and it’s using Obj-C messaging.
> 
>> OS X implemented GCD at kernel level which introduced lots of syscalls which 
>> are super expensive. The old school method is largely user land so it may 
>> help a little by keeping syscalls to a minimum. @synchronized uses 
>> Objective-C runtime functions which was once code behind those old school 
>> classes.
> 
> That's the opposite of what the docs say: "A dispatch semaphore works like a 
> traditional semaphore, except that when the resource is available, it takes 
> less time to acquire a dispatch semaphore. The reason is that Grand Central 
> Dispatch does not call into the kernel for this particular case. It calls 
> into the kernel only when the resource is not available and the system needs 
> to park your thread until the semaphore is signaled.” —Concurrency 
> Programming Guide
> 
> —Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-26 Thread ChanMaxthon
Weird. My old code used to be GCD heavy and even deadlocked once but new 
projects that used old school methods are faster and never locked up. My 
projects are heavy in network IO.

I built a transactional network interface there, which is somehow a clone of 
Distributed Objects that is based on HTTP.

Sent from my iPad

> On Apr 27, 2014, at 5:36 AM, Jens Alfke  wrote:
> 
> 
>> On Apr 26, 2014, at 2:10 PM, ChanMaxthon  wrote:
>> 
>> Since you are interfacing with database maybe you can use a little 
>> transaction interface which is its own thread and run loop. That may be able 
>> to cut down your amount of syscalls. That is, not using GCD but old 
>> fashioned NSThread, NSRunLoop (and CFRunLoop) and NSCondition.
> 
> I’m pretty sure NSRunLoop has more overhead than a dispatch_queue does. It’s 
> doing more work and it’s using Obj-C messaging.
> 
>> OS X implemented GCD at kernel level which introduced lots of syscalls which 
>> are super expensive. The old school method is largely user land so it may 
>> help a little by keeping syscalls to a minimum. @synchronized uses 
>> Objective-C runtime functions which was once code behind those old school 
>> classes.
> 
> That's the opposite of what the docs say: "A dispatch semaphore works like a 
> traditional semaphore, except that when the resource is available, it takes 
> less time to acquire a dispatch semaphore. The reason is that Grand Central 
> Dispatch does not call into the kernel for this particular case. It calls 
> into the kernel only when the resource is not available and the system needs 
> to park your thread until the semaphore is signaled.” —Concurrency 
> Programming Guide
> 
> —Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-26 Thread Jens Alfke

On Apr 26, 2014, at 2:10 PM, ChanMaxthon  wrote:

> Since you are interfacing with database maybe you can use a little 
> transaction interface which is its own thread and run loop. That may be able 
> to cut down your amount of syscalls. That is, not using GCD but old fashioned 
> NSThread, NSRunLoop (and CFRunLoop) and NSCondition.

I’m pretty sure NSRunLoop has more overhead than a dispatch_queue does. It’s 
doing more work and it’s using Obj-C messaging.

> OS X implemented GCD at kernel level which introduced lots of syscalls which 
> are super expensive. The old school method is largely user land so it may 
> help a little by keeping syscalls to a minimum. @synchronized uses 
> Objective-C runtime functions which was once code behind those old school 
> classes.

That's the opposite of what the docs say: "A dispatch semaphore works like a 
traditional semaphore, except that when the resource is available, it takes 
less time to acquire a dispatch semaphore. The reason is that Grand Central 
Dispatch does not call into the kernel for this particular case. It calls into 
the kernel only when the resource is not available and the system needs to park 
your thread until the semaphore is signaled.” —Concurrency Programming Guide

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-26 Thread ChanMaxthon
You can write your own dispatch queue by using a CFRunLoopSource and add an 
NSCondition to it when you need it to be synchronous (implementing 
dispatch_sync)

Sent from my iPad

> On Apr 27, 2014, at 4:55 AM, Quincey Morris 
>  wrote:
> 
>> On Apr 26, 2014, at 12:02 , Kyle Sluder  wrote:
>> 
>> FWIW, I’ve been of the opinion for a while that including dispatch_sync in 
>> the API was a mistake.  Too often it becomes a crutch used without 
>> understanding, resulting in stop-start behavior across threads or cores.
> 
> I don’t know if you’ll agree, but it seems to me that there’s a distinction 
> between a *locking* mechanism such as @synchronized, and a *queuing* 
> mechanism, which Jens seems to have demonstrated dispatch_sync to be.
> 
> I understand that both mechanisms may at a lower level depend on both queues 
> and locks, but the distinction I’m making is that a locking mechanism is used 
> when we hope that the lock will generally be granted without contention, 
> while a queuing mechanism is used when we expect there will generally be some 
> contending operation in progress.
> 
> ___
> 
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
> 
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
> 
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/xcvista%40me.com
> 
> This email sent to xcvi...@me.com

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-26 Thread ChanMaxthon
Since you are interfacing with database maybe you can use a little transaction 
interface which is its own thread and run loop. That may be able to cut down 
your amount of syscalls. That is, not using GCD but old fashioned NSThread, 
NSRunLoop (and CFRunLoop) and NSCondition.

OS X implemented GCD at kernel level which introduced lots of syscalls which 
are super expensive. The old school method is largely user land so it may help 
a little by keeping syscalls to a minimum. @synchronized uses Objective-C 
runtime functions which was once code behind those old school classes.

Sent from my iPad

> On Apr 27, 2014, at 4:55 AM, Quincey Morris 
>  wrote:
> 
>> On Apr 26, 2014, at 12:02 , Kyle Sluder  wrote:
>> 
>> FWIW, I’ve been of the opinion for a while that including dispatch_sync in 
>> the API was a mistake.  Too often it becomes a crutch used without 
>> understanding, resulting in stop-start behavior across threads or cores.
> 
> I don’t know if you’ll agree, but it seems to me that there’s a distinction 
> between a *locking* mechanism such as @synchronized, and a *queuing* 
> mechanism, which Jens seems to have demonstrated dispatch_sync to be.
> 
> I understand that both mechanisms may at a lower level depend on both queues 
> and locks, but the distinction I’m making is that a locking mechanism is used 
> when we hope that the lock will generally be granted without contention, 
> while a queuing mechanism is used when we expect there will generally be some 
> contending operation in progress.
> 
> ___
> 
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
> 
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
> 
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/xcvista%40me.com
> 
> This email sent to xcvi...@me.com

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-26 Thread Quincey Morris
On Apr 26, 2014, at 12:02 , Kyle Sluder  wrote:

> FWIW, I’ve been of the opinion for a while that including dispatch_sync in 
> the API was a mistake.  Too often it becomes a crutch used without 
> understanding, resulting in stop-start behavior across threads or cores.

I don’t know if you’ll agree, but it seems to me that there’s a distinction 
between a *locking* mechanism such as @synchronized, and a *queuing* mechanism, 
which Jens seems to have demonstrated dispatch_sync to be.

I understand that both mechanisms may at a lower level depend on both queues 
and locks, but the distinction I’m making is that a locking mechanism is used 
when we hope that the lock will generally be granted without contention, while 
a queuing mechanism is used when we expect there will generally be some 
contending operation in progress.

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-26 Thread Kyle Sluder
> On Apr 25, 2014, at 10:35 AM, Seth Willits  wrote:
> 
>> On Apr 25, 2014, at 8:08 AM, Jens Alfke  wrote:
>> 
>> I’m ending up at the opposite of the received wisdom, namely:
>> * dispatch_sync is a lot cheaper than dispatch_async
>> * only use dispatch_async if you really need to, or for an expensive 
>> operation, because it will slow down all your dispatch_sync calls
> 
> Saying "dispatch_async will slow down *all* dispatch_sync calls" throws up 
> red flags for me.
> 
> In my mind there's still a possibility that You're Doing It Wrong.

FWIW, I’ve been of the opinion for a while that including dispatch_sync in the 
API was a mistake.  Too often it becomes a crutch used without understanding, 
resulting in stop-start behavior across threads or cores. The correct solution 
is usually to rewrite the code in continuation-passing style, even if that 
means a cascade of blocks. Microsoft realized this when designing WinRT and 
_removed all the synchronous APIs_. They also added the async/await keywords to 
C# to avoid the seventeen-levels-of-indentation problem that plagues ObjC code.

Some people at Apple agree that use of dispatch_sync is a code smell, but that 
if they didn't include it a hundred different re-implementations would arise, 
all differently broken.

--Kyle Sluder
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-25 Thread Jens Alfke

On Apr 25, 2014, at 5:42 PM, Roland King  wrote:

> They can be very useful finding places where everything blocks waiting for 
> one piece of code to execute, or you ping madly from thread-to-thread, 
> queue-to-queue. 

Thanks, that sounds very useful. I’lll give it a try when I dive back into this.

Today I got the GCD-enabled code almost back up to the performance it had on 
iOS before it was thread-safe and parallel. Still counts as a victory, because 
with more cores (as on my Mac) it’s a lot faster. Some of the speed came from 
not using dispatch_async for trivial blocks, some from fixing bugs in my code, 
some from allocating fewer objects… Slowly chipping away at it and making 
little gains one at a time.

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-25 Thread Roland King
Have you clicked the 'strategy' buttons at the top left of instruments, I for 
some reason can only do this after I've recorded a trace, not during? They are 
easily overlooked but show per-core or per-thread traces. They can be very 
useful finding places where everything blocks waiting for one piece of code to 
execute, or you ping madly from thread-to-thread, queue-to-queue. There are 
stack traces available at each event. It's a good tool for understanding how 
dispatch queues are being used and finding out whether your dispatch_sync 
really is staying on the same CPU or not. 


On 25 Apr, 2014, at 11:08 pm, Jens Alfke  wrote:

> 
> On Apr 25, 2014, at 1:11 AM, Jonathan Taylor  
> wrote:
> 
>> Have you looked at the output from System Trace on both systems? I often 
>> find that to be informative.
> 
> OK, I tried this and it did turn out to be very informative :) even though I 
> don’t know how to interpret any of the numbers. But just the pretty charts 
> alone told the story:
> - With @synchronized there was very little activity in the System Calls or 
> Scheduling tracks.
> - With GCD there was a whole ton of activity.
> I was surprised there’s this much of a difference, because there’s no actual 
> concurrency in the code at this point! In the commit I’ve rolled back to, all 
> I’ve done is taken my existing single-threaded code and wrapped the C calls 
> with either @synchronized or dispatch_sync. My understanding is that while 
> dispatch_sync is technically switching to a different dispatch queue, if 
> there isn’t any contention it will just do some bookkeeping and run the block 
> on the same thread’s stack. So in this case I wouldn’t expect there to be any 
> actual thread switching going on; except there is.
> 
> … So then I searched the project for “dispatch_async” and found that there 
> was actually _one_ call to it, so my statement about “no actual concurrency” 
> above was a lie. The block it runs doesn’t really need to be async; I was 
> just running it that way because I didn’t need it to complete right away. I 
> changed that call to dispatch_sync, and voila! Almost all the thread 
> scheduling and system calls went away; the system trace now looks like the 
> @synchronized one, and the benchmark times are now slightly better than 
> @synchronized!
> 
> I guess this makes sense: dispatch_sync is super cheap in the uncontended 
> case, but if there’s a dispatch_async pending, then that one obviously has to 
> run first, and it’s probably been scheduled onto another thread, so the 
> dispatch_sync has to either queue onto that thread or at least do some 
> more-expensive locking to wait for the other thread to finish the async call.
> 
> I’m ending up at the opposite of the received wisdom, namely:
> * dispatch_sync is a lot cheaper than dispatch_async
> * only use dispatch_async if you really need to, or for an expensive 
> operation, because it will slow down all your dispatch_sync calls
> 
> I wish there were a big fat super-dense O’Reilly or Big Nerd Ranch book about 
> GCD so I didn’t have to figure all this out on my own...
> 
> —Jens
> ___
> 
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
> 
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
> 
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/rols%40rols.org
> 
> This email sent to r...@rols.org


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-25 Thread Seth Willits
On Apr 25, 2014, at 8:08 AM, Jens Alfke  wrote:

> I’m ending up at the opposite of the received wisdom, namely:
> * dispatch_sync is a lot cheaper than dispatch_async
> * only use dispatch_async if you really need to, or for an expensive 
> operation, because it will slow down all your dispatch_sync calls

Saying "dispatch_async will slow down *all* dispatch_sync calls" throws up red 
flags for me.

In my mind there's still a possibility that You're Doing It Wrong. You haven't 
completely outlined exactly what you've been doing, so it's very hard to offer 
any solid advice. For example, you didn't mention which queues you're 
dispatching too, how often your syncs are happen relative to the async, which 
threads they're being dispatch from, what in particular is happening within 
them, didn't show any instrument traces… 

I don't doubt you've found something, but your conclusion doesn't paint the 
whole picture. And as a matter of fact, I think your first email shows this:

"On my MacBook Pro this gave me a nice speedup of 50% or more."

If it was all down to async universally slowing down all dispatch_sync calls, 
then wouldn't you expect it to be slower there too? It seems to me you need a 
better theory as to why the change you made worked. But really, we're flying 
blind here.



--
Seth Willits




___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-25 Thread Jens Alfke

On Apr 25, 2014, at 1:11 AM, Jonathan Taylor  
wrote:

> Have you looked at the output from System Trace on both systems? I often find 
> that to be informative.

OK, I tried this and it did turn out to be very informative :) even though I 
don’t know how to interpret any of the numbers. But just the pretty charts 
alone told the story:
- With @synchronized there was very little activity in the System Calls or 
Scheduling tracks.
- With GCD there was a whole ton of activity.
I was surprised there’s this much of a difference, because there’s no actual 
concurrency in the code at this point! In the commit I’ve rolled back to, all 
I’ve done is taken my existing single-threaded code and wrapped the C calls 
with either @synchronized or dispatch_sync. My understanding is that while 
dispatch_sync is technically switching to a different dispatch queue, if there 
isn’t any contention it will just do some bookkeeping and run the block on the 
same thread’s stack. So in this case I wouldn’t expect there to be any actual 
thread switching going on; except there is.

… So then I searched the project for “dispatch_async” and found that there was 
actually _one_ call to it, so my statement about “no actual concurrency” above 
was a lie. The block it runs doesn’t really need to be async; I was just 
running it that way because I didn’t need it to complete right away. I changed 
that call to dispatch_sync, and voila! Almost all the thread scheduling and 
system calls went away; the system trace now looks like the @synchronized one, 
and the benchmark times are now slightly better than @synchronized!

I guess this makes sense: dispatch_sync is super cheap in the uncontended case, 
but if there’s a dispatch_async pending, then that one obviously has to run 
first, and it’s probably been scheduled onto another thread, so the 
dispatch_sync has to either queue onto that thread or at least do some 
more-expensive locking to wait for the other thread to finish the async call.

I’m ending up at the opposite of the received wisdom, namely:
* dispatch_sync is a lot cheaper than dispatch_async
* only use dispatch_async if you really need to, or for an expensive operation, 
because it will slow down all your dispatch_sync calls

I wish there were a big fat super-dense O’Reilly or Big Nerd Ranch book about 
GCD so I didn’t have to figure all this out on my own...

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-25 Thread Jonathan Taylor
Have you looked at the output from System Trace on both systems? I often find 
that to be informative.

That might or might not be the way to tell, but have you considered that the 
very different CPU characteristics might mean that the actual timing and 
pattern of database commands is different on the iPhone, resulting in e.g. a 
different level of contention for the queues? Suppose dispatch_sync is fast on 
an empty queue but has additional overhead on a full queue - resulting in 
different performance on one platform to another?


On 25 Apr 2014, at 04:42, cocoa-dev-requ...@lists.apple.com wrote:

> I’m writing an Objective-C API around a database library, and trying to add 
> some optimizations. There’s a lot of room for parallelizing, since tasks like 
> indexing involve a combination of I/O-bound and CPU-bound operations. As a 
> first step, I made my API thread-safe by creating a dispatch queue and 
> wrapping the C database calls in dispatch_sync blocks. Then I did some 
> reorganization of the code so different parts run on different queues, 
> allowing I/O and computation to run in parallel.
> 
> On my MacBook Pro this gave me a nice speedup of 50% or more.
> 
> But when I tested the code on my iPhone 5 today, I found performance had 
> dropped by about a third. Profiling shows that most of the time is being 
> spent in thread/queue management or Objective-C refcount bookkeeping. It 
> looks as though adding GCD introduced a lot of CPU overhead, and the two 
> cores on my iPhone aren’t enough to make up for that, while the eight cores 
> in my MacBook Pro make it worthwhile.
> 
> I tried backing out all the restructuring of my code, so there’s no actual 
> parallelism going on, just the dispatch_sync calls. Predictably, performance 
> is even worse; slightly more than half as fast as without them.
> 
> So, I’m pretty disappointed. I know that dispatch queues aren’t free, but I 
> wasn’t expecting them to be this expensive! I’m not doing anything silly like 
> wrapping dispatch_sync around trivial calls. The APIs I’m using it on do 
> things like reading and writing values from the persistent store. I was 
> expecting the cost of thread-safety to be lost in the noise compared to that.

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-24 Thread Quincey Morris
On Apr 24, 2014, at 22:49 , Jens Alfke  wrote:

> It is, but most of it appears to be memory management _caused_ by GCD, since 
> it goes away when I replace the dispatch calls with @synchronized. GCD is 
> apparently causing a lot of blocks to get copied to the heap.

Well, you know what you’re seeing in Instruments, but this characterization 
seems improbable. Ignoring the GCD-specific entries, if block copy was causing 
a large number of retains and releases (if that’s what you meant by “refcount 
bookkeeping”), I’d expect there to be a large number of block copies, too. If 
block copies (as I’d also expect) are much more time-consuming than the 
retains/releases, by comparison you’d barely notice the retain/release times in 
Instruments. If the retains/releases caused by block copies do dominate, that 
suggests the block copies are comparatively very cheap, which in turn suggests 
a horrible bug in block copies.

With luck someone might jump in with a plausible answer, but this is starting 
to sound TSI-worthy.

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-24 Thread Jens Alfke

On Apr 24, 2014, at 10:30 PM, Quincey Morris 
 wrote:

> Approaching this naively, this result suggests that the block content, while 
> not trivial, is too fine-grained — is divided too finely. For example, if 
> you’re putting (essentially) one database read/write operation (or even a 
> handful) in each block, perhaps that’s too small a unit of work for GCD.

I’m coming to that conclusion. I thought that a file-based b-tree lookup would 
be complex enough to drown out that overhead, but maybe not.

> a physical disk access is presumably much slower on average than an access to 
> iOS persistent memory. (Of course, if your MacBook Pro is purely SSD, that 
> might blow that idea out of the water. I don’t know how SSD access speeds 
> compare to iOS memory.)

No, you’ve got it backwards. My 2012 MacBook Pro has a damn fast SSD (one of 
the Apple on-the-motherboard ones), but the flash storage on iOS devices is 
really slow, slower than a decent hard disk.

I interpret the difference in performance as showing that, while there’s a lot 
of extra CPU overhead to using GCD, it’s still a net win when it lets you 
spread your code out over eight cores instead of one. But when there are only 
two cores, it seems to be not worth the expense. 

> Unless I missed something, all of the responses in this thread went to the 
> GCD issue. But if memory management is showing up “hot” like that too, there 
> may be something else to investigate.


It is, but most of it appears to be memory management _caused_ by GCD, since it 
goes away when I replace the dispatch calls with @synchronized. GCD is 
apparently causing a lot of blocks to get copied to the heap.

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-24 Thread Quincey Morris
On Apr 24, 2014, at 20:14 , Jens Alfke  wrote:

> On my MacBook Pro this gave me a nice speedup of 50% or more.
> 
> But when I tested the code on my iPhone 5 today, I found performance had 
> dropped by about a third.

> I know that dispatch queues aren’t free, but I wasn’t expecting them to be 
> this expensive! I’m not doing anything silly like wrapping dispatch_sync 
> around trivial calls. The APIs I’m using it on do things like reading and 
> writing values from the persistent store.

Approaching this naively, this result suggests that the block content, while 
not trivial, is too fine-grained — is divided too finely. For example, if 
you’re putting (essentially) one database read/write operation (or even a 
handful) in each block, perhaps that’s too small a unit of work for GCD. That 
idea is given *some* plausibility by the different outcomes on Mac and iPhone — 
a physical disk access is presumably much slower on average than an access to 
iOS persistent memory. (Of course, if your MacBook Pro is purely SSD, that 
might blow that idea out of the water. I don’t know how SSD access speeds 
compare to iOS memory.)

> Profiling shows that most of the time is being spent in thread/queue 
> management or Objective-C refcount bookkeeping. 

Unless I missed something, all of the responses in this thread went to the GCD 
issue. But if memory management is showing up “hot” like that too, there may be 
something else to investigate.

One option might be to look at what Instruments reports for the hotspots in 
terms of *counts* rather than times, and compare OS X with iOS. This would 
necessitate arranging things so you could get counts for known increments of 
work. If iOS gives comparable counts but much longer times, you will have one 
way of proceeding; if it gives much larger counts but similar or smaller times 
per count, you will have another way of proceeding.

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-24 Thread Dave Fernandes
Is there any way to batch up more work to do in each block? Then your ratio of 
real work to overhead would go up.

On Apr 25, 2014, at 12:35 AM, Jens Alfke  wrote:

> 
> On Apr 24, 2014, at 9:04 PM, Dave Fernandes  
> wrote:
> 
>> What’s the CPU utilization? Are you actually getting full use of them, or 
>> are your threads blocked waiting for something?
> 
> Fairly high — I think 175% or so (out of 200% possible). The problem is that 
> a large fraction of that is taken up with busywork. In the CPU profile list 
> from Instruments, the top 25 or so stack frames are all system 
> infrastructure; you have to read down quite a ways to find any application 
> code at all.
> 
> —Jens

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-24 Thread Jens Alfke

On Apr 24, 2014, at 9:04 PM, Dave Fernandes  wrote:

> What’s the CPU utilization? Are you actually getting full use of them, or are 
> your threads blocked waiting for something?

Fairly high — I think 175% or so (out of 200% possible). The problem is that a 
large fraction of that is taken up with busywork. In the CPU profile list from 
Instruments, the top 25 or so stack frames are all system infrastructure; you 
have to read down quite a ways to find any application code at all.

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-24 Thread Dave Fernandes
What’s the CPU utilization? Are you actually getting full use of them, or are 
your threads blocked waiting for something?

On Apr 24, 2014, at 11:14 PM, Jens Alfke  wrote:

> I’m writing an Objective-C API around a database library, and trying to add 
> some optimizations. There’s a lot of room for parallelizing, since tasks like 
> indexing involve a combination of I/O-bound and CPU-bound operations. As a 
> first step, I made my API thread-safe by creating a dispatch queue and 
> wrapping the C database calls in dispatch_sync blocks. Then I did some 
> reorganization of the code so different parts run on different queues, 
> allowing I/O and computation to run in parallel.
> 
> On my MacBook Pro this gave me a nice speedup of 50% or more.
> 
> But when I tested the code on my iPhone 5 today, I found performance had 
> dropped by about a third. Profiling shows that most of the time is being 
> spent in thread/queue management or Objective-C refcount bookkeeping. It 
> looks as though adding GCD introduced a lot of CPU overhead, and the two 
> cores on my iPhone aren’t enough to make up for that, while the eight cores 
> in my MacBook Pro make it worthwhile.
> 
> I tried backing out all the restructuring of my code, so there’s no actual 
> parallelism going on, just the dispatch_sync calls. Predictably, performance 
> is even worse; slightly more than half as fast as without them.
> 
> So, I’m pretty disappointed. I know that dispatch queues aren’t free, but I 
> wasn’t expecting them to be this expensive! I’m not doing anything silly like 
> wrapping dispatch_sync around trivial calls. The APIs I’m using it on do 
> things like reading and writing values from the persistent store. I was 
> expecting the cost of thread-safety to be lost in the noise compared to that.
> 
> Any suggestions on what to try next?
> 
> —Jens
> ___
> 
> Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)
> 
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
> 
> Help/Unsubscribe/Update your Subscription:
> https://lists.apple.com/mailman/options/cocoa-dev/dave.fernandes%40utoronto.ca
> 
> This email sent to dave.fernan...@utoronto.ca


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-24 Thread Jens Alfke

On Apr 24, 2014, at 8:42 PM, Ken Thomases  wrote:

> You may be aware of this, but dispatch_sync() is not necessary or even 
> particularly relevant to thread-safety.  The use of a serial queue or, 
> possibly, a reader/write mechanism using barriers, is what achieves thread 
> safety.

Initial experimentation showed that dispatch_async was significantly slower 
than dispatch_sync. This makes sense because dispatch_async has to copy the 
block (thus allocating an object on the heap and retaining any captured object 
variables) while dispatch_sync can get away with running the block before the 
call returns, which avoids all that overhead.

> Using a synchronous call is only necessary if your API has synchronous 
> semantics.  For example, if a call provides immediate results to the caller.  
> Reading from a database would typically have to be synchronous, but writing 
> to it can often be asynchronous.

Yeah, I was torn about making the write calls async. But in the underlying C 
API both the read and write calls return error codes, since there could be disk 
or memory errors, and I didn’t want to ignore the return codes on the write 
functions. (My mama didn’t raise no boys to skip proper error handling.)

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-24 Thread Ken Thomases
On Apr 24, 2014, at 10:14 PM, Jens Alfke wrote:

> I’m writing an Objective-C API around a database library, and trying to add 
> some optimizations. There’s a lot of room for parallelizing, since tasks like 
> indexing involve a combination of I/O-bound and CPU-bound operations. As a 
> first step, I made my API thread-safe by creating a dispatch queue and 
> wrapping the C database calls in dispatch_sync blocks.

You may be aware of this, but dispatch_sync() is not necessary or even 
particularly relevant to thread-safety.  The use of a serial queue or, 
possibly, a reader/write mechanism using barriers, is what achieves thread 
safety.

Using a synchronous call is only necessary if your API has synchronous 
semantics.  For example, if a call provides immediate results to the caller.  
Reading from a database would typically have to be synchronous, but writing to 
it can often be asynchronous.  All you care about is that future reads will 
always see what was previously written, which the serial queue or barriers will 
guarantee.

That doesn't necessarily have any bearing on the overhead of GCD vs. the 
resources of an iPhone, but I thought I'd point it out.

Regards,
Ken


___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Re: GCD killed my performance

2014-04-24 Thread Jens Alfke
Follow-up: I tried replacing every instance of
dispatch_sync(_queue, ^{ … });
with
@synchronized(self) { … }

Things got faster again — looks like @synchronized is a few percent slower than 
no thread-safety, but _significantly_ faster than dispatch_sync. Which seems to 
contradict what the GCD overview says about dispatch queues being faster than 
regular locking techniques. I looked at the disassembly, and @synchronized 
compiles into calls to objc_sync_enter() and objc_sync_exit(), which in turn 
call pthread_mutex_lock and pthread_mutex_unlock; Instruments shows all these 
functions consuming nearly zero CPU time during my benchmark. As opposed to 
with GCD, where the dispatch-queue runtime calls were most of the hottest code 
in the entire run.

I’m not sure what’s going on here. GCD seems to be pretty well respected by 
people I trust (I read Mike Ash’s blog posts about it pretty thoroughly while 
doing my refactoring, for example) and yet my experience with it so far is that 
the overhead is too high to make all the fun queue-and-block-based programming 
worthwhile, at least on iOS. :(

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

GCD killed my performance

2014-04-24 Thread Jens Alfke
I’m writing an Objective-C API around a database library, and trying to add 
some optimizations. There’s a lot of room for parallelizing, since tasks like 
indexing involve a combination of I/O-bound and CPU-bound operations. As a 
first step, I made my API thread-safe by creating a dispatch queue and wrapping 
the C database calls in dispatch_sync blocks. Then I did some reorganization of 
the code so different parts run on different queues, allowing I/O and 
computation to run in parallel.

On my MacBook Pro this gave me a nice speedup of 50% or more.

But when I tested the code on my iPhone 5 today, I found performance had 
dropped by about a third. Profiling shows that most of the time is being spent 
in thread/queue management or Objective-C refcount bookkeeping. It looks as 
though adding GCD introduced a lot of CPU overhead, and the two cores on my 
iPhone aren’t enough to make up for that, while the eight cores in my MacBook 
Pro make it worthwhile.

I tried backing out all the restructuring of my code, so there’s no actual 
parallelism going on, just the dispatch_sync calls. Predictably, performance is 
even worse; slightly more than half as fast as without them.

So, I’m pretty disappointed. I know that dispatch queues aren’t free, but I 
wasn’t expecting them to be this expensive! I’m not doing anything silly like 
wrapping dispatch_sync around trivial calls. The APIs I’m using it on do things 
like reading and writing values from the persistent store. I was expecting the 
cost of thread-safety to be lost in the noise compared to that.

Any suggestions on what to try next?

—Jens
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com