On 03/29/2017 01:29 PM, Al Viro wrote:
> On Wed, Mar 29, 2017 at 01:08:12PM -0700, Vineet Gupta wrote:
> 
>> Hi Al,
>>
>> Thx for taking this up. It seems ARC was missing INLINE_COPY* switch likely 
>> due to
>> existing 2 variants (inline/out-of-line) we already have.
>> I've added a patch for that (attached too) - boot tested the series on ARC.
> 
> BTW, I wonder if inlining all of the copy_{to,from}_user() is actually a win.

Just to be clear, your series was doing this for everyone.

> It's probably arch-dependent and it would be nice if somebody compared
> performance with and without inlining those...  ARC, in particular, has
> __arc_copy_{to,from}_user() inlining a whole lot, even in case of non-constant
> size and your patch, AFAICS, will inline all of it in *all* cases. 

Yes we do inline all of it: the non-constant case is actually simpler, it is a
simple byte loop.

                "       mov.f   lp_count, %0            \n"
                "       lpnz 3f                         \n"
                "       ldb.ab  %1, [%3, 1]             \n"
                "1:     stb.ab  %1, [%2, 1]             \n"
                "       sub     %0, %0, 1               \n"

Doing it out of line (3 args) will be 4 instructions anyways.

For constant size, there's laddered copy for blocks of 16 bytes + stragglers 
1-15.
We do "manual" constant propagation there to compile time optimize away the
straggler part. But yes all of this is emitted inline.


> It might
> end up being a win, but that's not apriori obvious...  Do you have any
> profiling results in that area?

Unfortunately not at the moment. The reason for adding out-of-line variant was 
not
so much as performance but to improve the footprint for -Os case (some customer 
I
think).

Reply via email to