On Fri, Sep 13, 2013 at 10:47:01PM +0100, Andrew Pinski wrote: > On Fri, Sep 13, 2013 at 11:57 AM, James Greenhalgh > <james.greenha...@arm.com> wrote: > > Should return '1' whatever your endianness. Throwing together a quick > > test case, that is the case for current trunk. Do you have a testcase > > where this goes wrong? > > I was not thinking of that but rather the definition of lanes in ARM64 > is different than from element due to memory ordering of endian. > That is lane 0 is element 3 in big-endian. Or is this only for > aarch32 where the issue is located? > > Thanks, > Andrew Pinski
Well, AArch64 has the AArch32 style memory ordering for vectors, which I think is different from what other big-endian architectures use, but gives consistent behaviour between vector and array indexing. So, take the easy case of a byte array uint8_t foo [8] = {0, 1, 2, 3, 4, 5, 6, 7} We would expect both the big and little endian toolchains to lay this out in memory as: 0x0 ... 0x8 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | And element 0 would give us '0'. If we take the same array and load it as a vector with ld1.b, both big and little-endian toolchains would load it as: bit 128 .. bit 64 bit 0 lane 16 | lane 7 | | lane 0 | |..... | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | So lane 0 is '0', we're OK so far! For a short array: uint16_t foo [4] = {0x0a0b, 0x1a1b, 0x2a2b, 0x3a3b}; The little endian compiler would lay memory out as: 0x0 ... 0x8 | 0b | 0a | 1b | 1a | 2b | 2a | 3b | 3a | And the big endian compiler would lay out memory as: 0x0 ... 0x8 | 0a | 0b | 1a | 1b | 2a | 2b | 3a | 3b | In both cases, element 0 is '0x0a0b'. If we load this array as a vector with ld1.h both big and little-endian compilers will load the vector as: bit 128 .. bit 64 bit 0 lane 16 | lane 3 | | lane 0 | |..... | 3b | 3a | 2b | 2a | 1b | 1a | 0b | 0a | And lane 0 is '0x0a0b' So we are OK again! Lanes and elements should match under our model. Which I don't think is true of other architectures, where I think the whole vector object is arranged big endian, such that we would need to lay our byte array out as: 0x0 ... 0x8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | For it to be correctly loaded, at which point there is a discrepancy between element and lane. But as I say, that is other architectures. AArch64 should be consistent. Thanks, James