On Tue, Jul 11, 2017 at 3:09 AM, Richard Earnshaw (lists)
<[email protected]> wrote:
> On 11/07/17 05:16, Andrew Pinski wrote:
>> I was looking into some bitfield code for aarch64 and was wondering
>> why SLOW_BYTE_ACCESS is set to 0. I can't seem to figure out why
>> though.
>> The header says:
>> Although there's no difference in instruction count or cycles,
>> in AArch64 we don't want to expand to a sub-word to a 64-bit access
>> if we don't have to, for power-saving reasons. */
>>
>> But that does not make sense because with SLOW_BYTE_ACCESS to 0, GCC
>> expands a sub-word access to a 64bit access.
>>> When I set to SLOW_BYTE_ACCESS to 1, I get between 38% to 208% speed
>> up for accesses of a bitfields inside a loop on ThunderX CN88xx.
>
> What's the test case?
>
>>
>> Should we change SLOW_BYTE_ACCESS (or maybe better yet get rid of it)?
>>
>
> The documentation for SLOW_BYTE_ACCESS is just plain confusing, IMO.
> And your comment above seems to be contrary to the documentation as well.
Here is the testcase which shows the issue:
typedef unsigned long long u64;
typedef struct
{
u64 a:10;
u64 b:10;
u64 c:9;
u64 d:7;
u64 e:14;
u64 f:14;
}s_t;
void setting(s_t *a)
{
a->a = 0x2AA;
a->b = 0x2AA;
a->c = 0x155;
a->d = 0x2A;
a->e = 0x2AAA;
a->f = 0x2AAA;
}
void set(s_t *a, int b, int c, int d, int e, int f, int g)
{
a->a = b;
a->b = c;
a->c = d;
a->d = e;
a->e = f;
a->f = g;
}
--- CUT ---
If SLOW_BYTE_ACCESS is set to 0, we get many more instructions. See
the logic in bit_field_mode_iterator::next_mode (which calls
bit_field_mode_iterator::prefer_smaller_modes which checks
SLOW_BYTE_ACCESS).
Note the only other place which checks SLOW_BYTE_ACCESS is dojump.c
and I think that code might be dead due to expand directly from SSA.
Thanks,
Andrew Pinski
>
> R.