Re: rseq/arm32: choosing rseq code signature

Mathieu Desnoyers Wed, 17 Apr 2019 08:30:57 -0700

----- On Apr 17, 2019, at 10:43 AM, Mathieu Desnoyers 
[email protected] wrote:


> ----- On Apr 17, 2019, at 6:37 AM, richard earnshaw [email protected]
> wrote:
> 
>> On 16/04/2019 14:39, Mathieu Desnoyers wrote:
>>> ----- On Apr 15, 2019, at 9:37 AM, Mathieu Desnoyers
>>> [email protected] wrote:
>>> 
>>>> ----- On Apr 15, 2019, at 9:30 AM, peter maydell [email protected] 
>>>> wrote:
>>>>
>>>>> On Mon, 15 Apr 2019 at 14:11, Mathieu Desnoyers
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> ----- On Apr 11, 2019, at 3:55 PM, peter maydell 
>>>>>> [email protected] wrote:
>>>>>>
>>>>>>> On Thu, 11 Apr 2019 at 18:51, Mathieu Desnoyers
>>>>>>> <[email protected]> wrote:
>>>>>>>>  * This translates to the following instruction pattern in the T16 
>>>>>>>> instruction
>>>>>>>>  * set:
>>>>>>>>  *
>>>>>>>>  * little endian:
>>>>>>>>  * def3        udf    #243      ; 0xf3
>>>>>>>>  * e7f5        b.n    <7f5>
>>>>>>>>  *
>>>>>>>>  * big endian:
>>>>>>>>  * e7f5        b.n    <7f5>
>>>>>>>>  * def3        udf    #243      ; 0xf3
>>>>>>>
>>>>>>> Do we really care about big-endian instruction-ordering for Thumb?
>>>>>>> It requires (AIUI) either an ARMv7R CPU which implements and sets
>>>>>>> SCTLR.IE to 1, or a v6-or-earlier CPU using BE32, and it's going to
>>>>>>> be even rarer than normal BE8 big-endian...
>>>>>>
>>>>>> I don't think we care enough about it to look for a trick to
>>>>>> turn the branch into something else (which would not branch away from the
>>>>>> udf instruction), but considering this signature will be ABI, it's good 
>>>>>> to
>>>>>> be thorough documentation-wise and cover all existing cases.
>>>>>
>>>>> I think if you want to document it it would be helpful to
>>>>> readers to make it clear that this is the ultra-rare
>>>>> big-endian-instruction-order "big endian Thumb", not the only
>>>>> moderately-rare little-endian-instructions-big-endian-data
>>>>> "big endian Thumb".
>>>>
>>>> I'm actually very much concerned about environments with big endian
>>>> data and little endian code. Which gcc compiler flags do I need to
>>>> use to test it ?
>>>>
>>>> I'm concerned about a signature mismatch between what is passed to
>>>> the rseq system call ("data-endian signature") and what is generated
>>>> in the code ("instruction-endian signature").
>>> 
>>> Based on this page:
>>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0360f/CDFBBCHB.html
>>> 
>>> My understanding is that the situation is as follows (please confirm):
>>> 
>>> - Prior to ARMv6, you could build and run code that is either big or little
>>> endian,
>>>   given you had a matching Linux kernel endianness. Code and data endianness
>>>   needed
>>>   to match,
>>> - Starting from ARMv6, only little endian code is supported. The endianness 
>>> for
>>> data
>>>   access can be changed through bit [9], the E bit, of the Program Status
>>>   Register,
>>>   (mixed endianness)
>>> 
>>> Looking at ARM build options for gcc, it seems you can select either big or
>>> little
>>> endian (-mbig-endian or -mlittle-endian (default)) which affects both
>>> instruction and
>>> data endianness. So I suspect the -mbig-endian option is really only useful 
>>> for
>>> pre-ARMv6.
>> 
>> -mbig-endian is still correct, even on later architectures.  The linker
>> gets involved, however, and (using the mapping symbol information) swaps
>> the code segments to little-endian form (this is why you have to use
>> .inst rather than .word when inserting instructions, so that the correct
>> mapping symbols are inserted).
> 
> So what you're saying is that if I have:
> 
> void main()
> {
>        asm volatile (
>                        ".arm\n\t"
>                        ".inst 0xe7f5def3\n\t"
>                        ".long 0xe7f5def3\n\t");
> }
> 
> and compile it with:
> 
> arm-linux-gnueabihf-gcc -mbig-endian -march=armv6 -c -o arm-big-endianv6.o
> arm-test-endian.c
> 
> It's expected that the generated .o will have big endian instructions, 
> matching
> the endianness of the data, e.g.:
> 
> hexdump arm-big-endianv6.o
> 
> [...]
> 0000030 0a00 0900 80b5 00af f5e7 f3de f5e7 f3de
> 
> But it's then at the linking stage that the linker will
> reverse the endianness of the ".inst" (but not .long).
> 
> Let's see:
> 
> arm-linux-gnueabihf-gcc -nodefaultlibs -nostdlib -mbig-endian -march=armv6 -o
> arm-big-endianv6 arm-big-endianv6.o
> /usr/lib/gcc-cross/arm-linux-gnueabihf/7/../../../../arm-linux-gnueabihf/bin/ld:
> warning: cannot find entry symbol _start; defaulting to 00000000000001b0
> 
> hexdump gives me:
> [...]
> 00001b0 80b5 00af f5e7 f3de f5e7 f3de c046 bd46
> 
> So it has not reversed the instruction endianness.
> 
> What am I doing wrong ?

It seems to be specific to using armv6 and armv7* with gcc 7.
gcc 8 seems to indeed reverse the code vs data endianness.

So we need to figure out whether .inst is the right things to
do to declare a signature, or if it's better to use ".long"
which would probably generate an invalid instruction on BE...

Thanks,

Mathieu

> 
> I'm using:
> 
> gcc version 7.3.0 (Ubuntu/Linaro 7.3.0-27ubuntu1~18.04)
> GNU ld (GNU Binutils for Ubuntu) 2.30
> 
> Thanks,
> 
> Mathieu
> 
>> 
>>> 
>>> For ARMv6+ mixed-endianness, it seems to be a mode that temporarily swap
>>> endianness
>>> of load/store instructions for specific memory accesses communicating with 
>>> DMA
>>> devices,
>>> so I don't see any scenario where we can generate a binary that has little
>>> endian code
>>> and big endian data. If that is true, then it should be fine to declare the
>>> signature
>>> with ".arm .inst" and expect the data endianness to be the same as code
>>> endianness.
>>> 
>>> Am I missing something ?
>>> 
>>> Thanks,
>>> 
>>> Mathieu
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: rseq/arm32: choosing rseq code signature

Reply via email to