Another comment on relay_write fast pathing

> +static inline unsigned relay_write(struct rchan *chan,
> +                                const void *data,
> +                                unsigned length)
> +{
> +     unsigned long flags;
> +     struct rchan_buf *buf = relay_get_buf(chan, smp_processor_id());
> +
> +     local_irq_save(flags);
> +     if (unlikely(buf->offset + length > chan->subbuf_size))
> +             length = relay_switch_subbuf(buf, length);
> +     memcpy(buf->data + buf->offset, data, length);

I said earlier gcc would optimize the memcpy, but thinking about
this again I'm not so sure anymore. The problem is that with variable 
buf->offset gcc cannot prove the alignment of the destination, and with 
unknown alignment it tends to generate a out of line call to memcpy, which
is quite typically slow. 

You can take a look at the generated assembly if that's true
or not, but I suspect it is.

To avoid this it may be needed to play 

if (__builtin_constant_p(length)) 
        switch (length) { 
        case 1: /* optimized version for 1 byte */ break;
        case 2: ...
        case 4: ...
        case 8:  ...
        }
else
        memcpy(...); 

games like e.g. uaccess.h does. Problem is that sometimes gcc seems
to break __builtin_constant_p inside inlines, so it may be needed
to move it into a macro (i would double check if that is really needed
though for this case, the code is much nicer with a inline) 

-Andi                   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to