On Nov 22, 2009, at 11:57 PM, Stefan Behnel wrote:

> Antoine Pitrou, 22.11.2009 18:06:
>> Stefan Behnel <stefan...@...> writes:
>>> Just to be sure, not only the computed goto itself is a non-standard
>>> feature, but also taking the address of a label using "&&", right?
>>
>> Indeed. I'm not sure why you care about computed gotos here. Most  
>> generators
>> only have a single yield statement, so you'll have a single-case  
>> switch
>> statement (or two cases if you take into account the case where the  
>> generator is
>> being entered rather than resumed), so the code will be quite  
>> simple anyway and
>> the CPU's branch predictor should do a good job.
>
> Agreed. However, the common single-yield case can always be  
> optimised into
>
>    if (likely(closure._resume_label == ...))
>       goto __L1_resume_from_yield;
>
> ('likely' being a branch prediction hint here), in which case it  
> wouldn't
> matter if we use an int or a pointer. One test is needed anyway, as  
> the
> label is initially 0. As a result, you'd just get an asymptotic 100%  
> branch
> prediction with a single initial branch.
>
> The main advantage of computed gotos for the multiple-yield cases is  
> that
> the code generation itself becomes simpler, as the generator function
> wouldn't have to know the yield labels (which are created at code
> generation time and only at need, so only the yield node itself  
> knows if it
> gets generated). But given that they are not portable, and assuming  
> that
> there are cases where knowing the labels is simply faster (such as the
> single-yield case above), we will need to propagate that information
> anyway, so we can just as well drop the computed gotos completely.  
> But you
> got me thinking about the branch prediction.
>
> Disclaimer: I'm actually aware that this deals with micro- 
> optimisations
> that might not turn into any performance gain at all. I just found it
> interesting to look at what kind of code we can expect to see.
>
> For the multiple yield case, I think a scenario like the following  
> would be
> quite common:
>
>    def gen():
>        while 1:
>            # calculate some correlated values x,y,z
>            yield x
>            yield y
>            yield z
>
> In that case, there will be zero branch prediction, as the code will  
> jump
> to a different label on each call, despite them being close in the  
> code and
> doing exactly the same thing.
>
> Looking at
>
> http://docs.python.org/library/itertools.html
>
> another common scenario would be this:
>
>    def gen(flag):
>        if flag:
>            while 1:
>                yield ...
>        else:
>            while 1:
>                yield ...
>
> in which case the branch prediction would actually work pretty well,  
> but
> would depend on how the switch code is arranged. Any ideas how  
> computed
> gotos would perform here? They'd always jump to the same address,  
> after
> all, but I have no idea if the CPU learns from that...
>
> A third scenario uses an initial setup run (also from itertools):
>
>    def gen():
>        for x in something:
>            yield f(x)
>        while 1:
>            for x in something_else:
>                 yield f(x)
>
> Here, the branch prediction would only fail in two cases during the  
> life
> cycle (at startup and when switching to the while loop), and the final
> performance would depend on the switch statement again.

I realize this is just an interesting exercise, but one thing to keep  
in mind is that unless the calling code is trivial (and maybe even if  
it is), the branch prediction registers for the generator will  
probably be flushed by the next time you call it.

- Robert

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to