Re: [AArch64] Implement vset_lane intrinsics in C

James Greenhalgh Mon, 16 Sep 2013 02:30:03 -0700

On Fri, Sep 13, 2013 at 10:47:01PM +0100, Andrew Pinski wrote:
> On Fri, Sep 13, 2013 at 11:57 AM, James Greenhalgh
> <james.greenha...@arm.com> wrote:
> > Should return '1' whatever your endianness. Throwing together a quick
> > test case, that is the case for current trunk. Do you have a testcase
> > where this goes wrong?
> 
> I was not thinking of that but rather the definition of lanes in ARM64
> is different than from element due to memory ordering of endian.
> That is lane 0 is element 3 in big-endian.  Or is this only for
> aarch32 where the issue is located?
> 
> Thanks,
> Andrew Pinski


Well, AArch64 has the AArch32 style memory ordering for vectors,
which I think is different from what other big-endian architectures
use, but gives consistent behaviour between vector and array indexing.

So, take the easy case of a byte array

  uint8_t foo [8] = {0, 1, 2, 3, 4, 5, 6, 7}

We would expect both the big and little endian toolchains to lay
this out in memory as:

   0x0             ...         0x8
  | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

And element 0 would give us '0'. If we take the same array and load it
as a vector with ld1.b, both big and little-endian toolchains would load
it as:

   bit 128 ..   bit 64                           bit 0
   lane 16   | lane 7 |                       |  lane 0 |
  |.....     |    7   | 6 | 5 | 4 | 3 | 2 | 1 |   0     |

So lane 0 is '0', we're OK so far!

For a short array:

  uint16_t foo [4] = {0x0a0b, 0x1a1b, 0x2a2b, 0x3a3b};

The little endian compiler would lay memory out as:

   0x0             ...                0x8
  | 0b | 0a | 1b | 1a | 2b | 2a | 3b | 3a |

And the big endian compiler would lay out memory as:

   0x0             ...                0x8
  | 0a | 0b | 1a | 1b | 2a | 2b | 3a | 3b |

In both cases, element 0 is '0x0a0b'. If we load this array as a
vector with ld1.h both big and little-endian compilers will load
the vector as:

   bit 128 ..  bit 64                        bit 0
   lane 16   | lane 3  |                   | lane 0  |
  |.....     | 3b | 3a | 2b | 2a | 1b | 1a | 0b | 0a |

And lane 0 is '0x0a0b' So we are OK again!

Lanes and elements should match under our model. Which I don't think
is true of other architectures, where I think the whole vector object
is arranged big endian, such that we would need to lay our byte array
out as:

   0x0             ...         0x8
  | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |

For it to be correctly loaded, at which point there is a discrepancy
between element and lane.

But as I say, that is other architectures. AArch64 should be consistent.

Thanks,
James

Re: [AArch64] Implement vset_lane intrinsics in C

Reply via email to