From: Shannon Nelson
> Sent: 16 March 2017 00:18
> To: David Laight; netdev@vger.kernel.org; da...@davemloft.net
> On 3/15/2017 1:50 AM, David Laight wrote:
> > From: Shannon Nelson
> >> Sent: 14 March 2017 17:25
> > ...
> >> +  if (unlikely(is_multicast_ether_addr(eth_hdr(skb)->h_dest)))
> >> +          dev->stats.multicast++;
> >
> > I'd guess that:
> >     dev->stats.multicast += is_multicast_ether_addr(eth_hdr(skb)->h_dest);
> > generates faster code.
> > Especially if is_multicast_ether_addr(x) is (*x >> 7).

I'd clearly got brain-fade there, mcast bit is the first transmitted bit
(on ethernet) but the bytes are sent LSB first (like async).
> >     David
> 
> Hi David, thanks for the comment.  My local instruction level
> performance guru is on vacation this week so I can't do a quick check
> with him today on this.  However, I"m not too worried here since the
> inline code for is_multicast_ether_addr() is simply
> 
>       return 0x01 & addr[0];
> 
> and objdump tells me that on sparc it compiles down to a simple single
> byte load and compare:
> 
>      325c:    c2 08 80 03     ldub  [ %g2 + %g3 ], %g1
>      3260:    80 88 60 01     btst  1, %g1
>      3264:    32 60 00 b3     bne,a,pn   %xcc, 3530 <vnet_rx_one+0x430>
>      3268:    c2 5c 61 68     ldx  [ %l1 + 0x168 ], %g1
>               dev->stats.multicast++;

Followed by a branch that might be marked 'assume taken' so the
normal path takes the branch.
I guess that is followed by 'add 1 to %g1', 'stx %g1, [ %l1 + 0x168 ]'
and a branch to 3530.
GCC must be using that condition to generate get the bottom of a loop
to 'fallthrough' to its top!

My version should generate something like:
                ldub  [ %g2 + %g3 ], %g1
                ldx   [ %l1 + 0x168 ], %g2
                and   1, %g1
                add   %g1, %g2, %g2
                stx   %g2, [ %l1 + 0x168 ]
While this looks like 5 instructions (rather than 2) it has fewer pipeline
stalls and can be 'spread out' into the surrounding lines of code to
reduce the stalls further.

> I don't think this driver will ever be used on anything bug sparc, so
> I'm not worried about how x86 might compile this.

On x86 gcc is likely to ignore the 'unlikely' and generate a forwards
(predicted not taken) branch around the increment.
I've had to but asm comments in the else part of conditionals like
that to force gcc to generate a forwards jump to the 'unlikely' statements.

        David



Reply via email to