Re: Inefficient code

Paul Koning Thu, 05 Jul 2018 09:30:01 -0700

> On Jul 5, 2018, at 12:01 PM, Segher Boessenkool <seg...@kernel.crashing.org> 
> wrote:
> 
> On Thu, Jul 05, 2018 at 08:45:30AM -0400, Paul Koning wrote:
>> I have a struct that looks like this:
>> 
>> struct Xrb
>> {
>>    uint16_t xrlen;           /* Length of I/O buffer in bytes */
>>    uint16_t xrbc;            /* Byte count for transfer */
>>    void * xrloc;             /* Pointer to I/O buffer */
>>    uint8_t xrci;             /* Channel number times 2 for transfer */
>>    uint32_t xrblk:24;        /* Random access block number */
>>    uint16_t xrtime;  /* Wait time for terminal input */
>>    uint16_t xrmod;           /* Modifiers */
>> };
>> 
>> When I write to xrblk (that 24 bit field) on my 16 bit target, I get 
>> unexpectly inefficient output:
>> 
>>    XRB->xrblk = 5;
>> 
>>      movb    #5,10(r0)
>>      clrb    11(r0)
>>      clrb    7(r0)
> 
> (7? not 12?)

Octal offsets.  It's writing the 3 bytes in LSB to MSB order.  (PDP11 -- which 
has funny-endian ordering.)

> rather than the expected word write to the word-aligned lower half of that 
> field.
>> 
>> Looking at the dumps, I see it coming into the RTL expand phase as a single 
>> write, which expand then turns into the three insns corresponding to the 
>> above.  But (of course) there is a word (HImode) move also, which has the 
>> same cost as the byte one.
>> 
>> Is there something I have to do in my target definition to get this to come 
>> out right?  This is a strict_alignment target, but alignment is satisfied in 
>> this example.  Also, SLOW_BYTE_ACCESS is 1.
> 
> What is your MOVE_MAX?  It should be 2 probably.

It is. 

I just constructed another test case that shows the same issue more blatantly:

struct s
{
    char a;
    char b;
    char c;
    char d;
    int e;
    int f;
    char h;
    char i;
};

struct s ts;

void setts(void)
{
    ts.a=2;
    ts.b=4;
    ts.c=1;
    ts.d=24;
    ts.e=5;
    ts.f=42;
    ts.h=9;
    ts.i=3;
}

Each of the fields are written separately, even though clearly the adjacent 
byte writes can and should be combined into a single HImode move.  This happens 
both with -O2 and -Os.

        paul
Re: Inefficient code

Reply via email to