Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-08 Thread Rin Okuyama

On 2019/01/08 17:33, Nick Hudson wrote:



On 07/01/2019 10:22, Rin Okuyama wrote:
[snip]


This kind of problems cannot be handled in software unless we

 (1) use cached memory (for which unaligned access is allowed), or
 (2) forbid compiler to generate unaligned access.

This patch

 http://www.netbsd.org/~rin/armv6_cached_dma_memory_20180107.patch

adds ARMV6_CACHED_DMA_MEMORY option. If enabled, DMA memory is forcibly
mapped cacheable on ARMv6+ [option (1) above]. This allows us unaligned
access to DMA buffer, however, as Nick and others pointed out, breaks
drivers that do not invalidate cache appropriately. If this option is
disabled, -mno-unaligned-access is added to CFLAGS [option (2) above].

I've tested both on my RPI3B+ (earmv7hf). Kernel of (1) works more than
12 hours without panic. However, vchiq(4) fails to initialize, and mue(4)
receives strange packets of zero-length (two times in 12 hours). Both
smell like driver bugs. Kernel of (2) works without problems as far as
I can see.


Does it work on rpi1? I forget if ARM11_COMPAT_MMU behaves differently here.


I bought my RPI0W yesterday to test it :-).

If ARMV6_CACHED_DMA_MEMORY option is specified, the kernel hangs
indefinitely when attaching audio0 at vcaudio0; unlike RPI3B+, vchiq0
and vcaudio0 are attached, however the system freezes before attaching
message of audio0. If vcaudio0 is disabled in the kernel config file,
the system worked without any troubles more than 12 hours.

The kernel built with -mno-unaligned-access works fine. Also, even if
neither ARMV6_CACHED_DMA_MEMORY nor -mno-unaligned-access options are
specified, axe(4) works without my hack in the current revision. I
confirmed that an instruction is generated which does unaligned access
to USB buffer.

This indicates that unaligned access is permitted even for non-cached
memory on ARMv6 in backward compatible MMU mode. Is it right? (I could
not find any documents about it...)

If it is true, we can possibly restrict ARMV6_CACHED_DMA_MEMORY option
to !ARM_MMU_V6C case. For arch/arm/arm32/bus_dma.c:


#if defined(ARMV6_CACHED_DMA_MEMORY) && !ARM_MMU_V6C
/*
 * Force cached mapping for ARMv6+, which allows us
 * unaligned access to DMA buffer. For ARMv6 MMU in
 * backward compatible mode, unaligned access is
 * permitted for non-cached memory.
 */
if (!CPU_IS_ARMV6_P() && !CPU_IS_ARMV7_P())
#endif
{
bool uncached = (flags & BUS_DMA_COHERENT);
...


Also, we can omit -mno-unaligned-access option in
arch/arm/conf/std/Makefile.arm:


.if !empty(MACHINE_ARCH:Mearmv6*) || !empty(MACHINE_ARCH:Mearmv7*)
. if ARMV6_CACHED_DMA_MEMORY
# Use cached memory for DMA on ARMv6+, which allows us unaligned
# access to DMA buffer.
CPPFLAGS+=  -DARMV6_CACHED_DMA_MEMORY
. elif ARM11_COMPAT_MMU
# For ARMv6 MMU in backward compatible mode, unaligned access is
# permitted for non-cached memory.
. else
# Otherwise, we need strictly-aligned access to DMA buffer.
CFLAGS+=-mno-unaligned-access
. endif
.endif


I wrote a patch including both changes:

http://www.netbsd.org/~rin/armv6_cached_dma_memory_20180109.patch

How do you think?


I also carried out simple benchmarks of building lang/perl5 of pkgsrc.
The working directory is USB SSD, TMPDIR is tmpfs, and terminal is ssh.
The difference is negligible: (1) 25:36.53 and (2) 25:34.81.


Using cached memory for DMA is slower?


Yes, at least for this single tries. However, it is only about 0.1%
difference, and I think it is within error margin. We may obtain
opposite results in another tries. Anyway, the difference would be
negligibly small.


We should use cached memory for DMA in the future. However, it may break
more drivers than I observed on RPI. Therefore, I would like to propose
a compromise plan:

(a) Before branching netbsd-9, disable ARMV6_CACHED_MEMORY, and use
 -mno-unaligned-access.

(b) After branching netbsd-9, enable ARMV6_CACHED_MEMORY, and stop using
 -mno-unaligned-access.

(c) After debugging drivers, use cached memory for DMA unconditionally
 on ARMv6+ and remove ARMV6_CACHED_DMA_MEMORY option.

Thoughts? Nick, does this look reasonable for you?


OK.

Let's backport all related fixes and plan to remove ARMV6_CACHED_DMA_MEMORY 
everywhere. (pending the rpi result)


Yeah, I would like to commit

the original patch
http://www.netbsd.org/~rin/armv6_cached_dma_memory_20180107.patch

or the revised one (explained above)
http://www.netbsd.org/~rin/armv6_cached_dma_memory_20180109.patch

if there's no objection until this weekend. Also, I will send a PR on
vchiq(4) problems. Which patch do you prefer? Or other ideas?

Thanks,
rin


Thanks,
Nick



Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-08 Thread Nick Hudson




On 07/01/2019 10:22, Rin Okuyama wrote:
[snip]


This kind of problems cannot be handled in software unless we

     (1) use cached memory (for which unaligned access is allowed), or
     (2) forbid compiler to generate unaligned access.

This patch

     http://www.netbsd.org/~rin/armv6_cached_dma_memory_20180107.patch

adds ARMV6_CACHED_DMA_MEMORY option. If enabled, DMA memory is forcibly
mapped cacheable on ARMv6+ [option (1) above]. This allows us unaligned
access to DMA buffer, however, as Nick and others pointed out, breaks
drivers that do not invalidate cache appropriately. If this option is
disabled, -mno-unaligned-access is added to CFLAGS [option (2) above].

I've tested both on my RPI3B+ (earmv7hf). Kernel of (1) works more than
12 hours without panic. However, vchiq(4) fails to initialize, and mue(4)
receives strange packets of zero-length (two times in 12 hours). Both
smell like driver bugs. Kernel of (2) works without problems as far as
I can see.


Does it work on rpi1? I forget if ARM11_COMPAT_MMU behaves differently here.



I also carried out simple benchmarks of building lang/perl5 of pkgsrc.
The working directory is USB SSD, TMPDIR is tmpfs, and terminal is ssh.
The difference is negligible: (1) 25:36.53 and (2) 25:34.81.


Using cached memory for DMA is slower?



We should use cached memory for DMA in the future. However, it may break
more drivers than I observed on RPI. Therefore, I would like to propose
a compromise plan:

(a) Before branching netbsd-9, disable ARMV6_CACHED_MEMORY, and use
     -mno-unaligned-access.

(b) After branching netbsd-9, enable ARMV6_CACHED_MEMORY, and stop using
     -mno-unaligned-access.

(c) After debugging drivers, use cached memory for DMA unconditionally
     on ARMv6+ and remove ARMV6_CACHED_DMA_MEMORY option.

Thoughts? Nick, does this look reasonable for you?


OK.

Let's backport all related fixes and plan to remove 
ARMV6_CACHED_DMA_MEMORY everywhere. (pending the rpi result)	


Thanks,
Nick


Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-07 Thread Rin Okuyama

On 2019/01/07 10:59, matthew green wrote:

http://netbsd.org/~kamil/kubsan/0007-boot-real-hardware.txt


i fixed the hdafg.c ones here.  not sure about the hdaudio.c
ones, since they are already 1u << 31.  leaving:

x86/pci/pci_machdep.c
ahcisata_core.c
amd64/kobj_machdep.c
netinet/tcp_input.c

beyond the xhci one, that actually doesn't matter since the
alignment is not required in the copy of the structure.


Well, there are two different problems on alignment. One is that in
structures, which should be fixed in software. The other is that
cannot be resolved in software, as pointed out by Michael.

Going back to the example of axe(4), more than one ethernet frames are
contained in RX buffer in general. Each frame has 4-byte H/W header
before it. The problem is that H/W headers are 2-byte aligned instead
of 4, which results in unaligned word-wise load by __builtin_memcpy().

++-+-++-
|H/W |ethernet | |H/W |ethernet
|hdr |frame| |hdr |frame
++-+-++-

This kind of problems cannot be handled in software unless we

(1) use cached memory (for which unaligned access is allowed), or
(2) forbid compiler to generate unaligned access.

This patch

http://www.netbsd.org/~rin/armv6_cached_dma_memory_20180107.patch

adds ARMV6_CACHED_DMA_MEMORY option. If enabled, DMA memory is forcibly
mapped cacheable on ARMv6+ [option (1) above]. This allows us unaligned
access to DMA buffer, however, as Nick and others pointed out, breaks
drivers that do not invalidate cache appropriately. If this option is
disabled, -mno-unaligned-access is added to CFLAGS [option (2) above].

I've tested both on my RPI3B+ (earmv7hf). Kernel of (1) works more than
12 hours without panic. However, vchiq(4) fails to initialize, and mue(4)
receives strange packets of zero-length (two times in 12 hours). Both
smell like driver bugs. Kernel of (2) works without problems as far as
I can see.

I also carried out simple benchmarks of building lang/perl5 of pkgsrc.
The working directory is USB SSD, TMPDIR is tmpfs, and terminal is ssh.
The difference is negligible: (1) 25:36.53 and (2) 25:34.81.

We should use cached memory for DMA in the future. However, it may break
more drivers than I observed on RPI. Therefore, I would like to propose
a compromise plan:

(a) Before branching netbsd-9, disable ARMV6_CACHED_MEMORY, and use
-mno-unaligned-access.

(b) After branching netbsd-9, enable ARMV6_CACHED_MEMORY, and stop using
-mno-unaligned-access.

(c) After debugging drivers, use cached memory for DMA unconditionally
on ARMv6+ and remove ARMV6_CACHED_DMA_MEMORY option.

Thoughts? Nick, does this look reasonable for you?

Thanks,
rin


re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread matthew green
> http://netbsd.org/~kamil/kubsan/0007-boot-real-hardware.txt

i fixed the hdafg.c ones here.  not sure about the hdaudio.c
ones, since they are already 1u << 31.  leaving:

x86/pci/pci_machdep.c
ahcisata_core.c
amd64/kobj_machdep.c
netinet/tcp_input.c

beyond the xhci one, that actually doesn't matter since the
alignment is not required in the copy of the structure.


.mrg.


Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread Christos Zoulas
On Jan 6,  9:53am, nick.hud...@gmx.co.uk (Nick Hudson) wrote:
-- Subject: Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys

| I'm pretty sure this is the same as http://gnats.netbsd.org/50038

Yes, this looks like the same issue; we should not be patching individual
drivers like this. The compiler should be producing correct code that handles
unaligned access.

| Maybe I should be brave enough to stop using PMAP_NOCACHE in bus_dma.c 
| for armv6+ and fix any missing bus_dmamap_sync calls.

Isn't that orthogonal?

christos


Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread Christos Zoulas
On Jan 6,  3:59pm, nick.hud...@gmx.co.uk (Nick Hudson) wrote:
-- Subject: Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys

| > Isn't that orthogonal?
| 
| Nope, because normal cached memory allows unaligned access (kernel and 
| userland).
| 

I did not realize that the i_axe case was referencing uncached memory. If
that's the case, then we should turn on the pmap flag and start fixing the
drivers :-) But perhaps we don't want to inflict that pain to everyone until
things are mostly fixed...

christos


Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread Nick Hudson




On 06/01/2019 15:31, Christos Zoulas wrote:

On Jan 6,  9:53am, nick.hud...@gmx.co.uk (Nick Hudson) wrote:
-- Subject: Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys

| I'm pretty sure this is the same as http://gnats.netbsd.org/50038

Yes, this looks like the same issue; we should not be patching individual
drivers like this. The compiler should be producing correct code that handles
unaligned access.

| Maybe I should be brave enough to stop using PMAP_NOCACHE in bus_dma.c
| for armv6+ and fix any missing bus_dmamap_sync calls.

Isn't that orthogonal?


Nope, because normal cached memory allows unaligned access (kernel and 
userland).


Nick


Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread Christos Zoulas
On Jan 6,  5:46pm, rokuy...@rk.phys.keio.ac.jp (Rin Okuyama) wrote:
-- Subject: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev

| I guess other codes can be miscompiled if -mno-unaligned-access is
| not specified. Can I commit the patch?

I believe this is the right way to do it, since we don't want unaligned
accesses in the kernel.

Best,

christos
| 
| Thanks,
| rin
| 
| Index: sys/arch/arm/conf/Makefile.arm
| ===
| RCS file: /home/netbsd/src/sys/arch/arm/conf/Makefile.arm,v
| retrieving revision 1.49
| diff -p -u -r1.49 Makefile.arm
| --- sys/arch/arm/conf/Makefile.arm22 Sep 2018 12:24:01 -  1.49
| +++ sys/arch/arm/conf/Makefile.arm6 Jan 2019 08:14:56 -
| @@ -53,6 +53,13 @@ CPPFLAGS.cpufunc_asm_armv6.S+= -mcpu=arm
|   CPPFLAGS.cpufunc_asm_arm11.S+=  -mcpu=arm1136j-s
|   CPPFLAGS.cpufunc_asm_xscale.S+= -mcpu=xscale
|   
| +# For GCC, -munaligned-access is enabled by default for ARMv6+.
| +# But the unaligned access is forbidden in the supervisor mode.
| +.if (!empty(MACHINE_ARCH:Mearmv6*) || !empty(MACHINE_ARCH:Mearmv7*)) \
| +&& ${ACTIVE_CC} == "gcc"
| +CFLAGS+= -mno-unaligned-access
| +.endif
| +
|   ##
|   ## (3) libkern and compat
|   ##
-- End of excerpt from Rin Okuyama




Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread Kamil Rytarowski
kUBSan detected a number of unaligned accesses in USB code:

http://netbsd.org/~kamil/kubsan/0007-boot-real-hardware.txt

On 06.01.2019 09:46, Rin Okuyama wrote:
> (CC added to port-...@netbsd.org)
> 
> Let me summarize the problem briefly. In axe(4), there is a code where
> memcpy() is carried out from 2-byte aligned buffer to 4-byte structure:
> 
> https://nxr.netbsd.org/xref/src/sys/dev/usb/if_axe.c#1284
> 
> This results in kernel panic due to alignment fault on earmv[67]hf:
> 
> https://twitter.com/furandon_pig/status/1071771151418908672
> 
> In short, this is because -munaligned-access is enabled on ARMv6+ by
> default for GCC. As the unaligned memory access is forbidden in the
> supervisor mode unlike in the user mode, we need to explicitly specify
> -mno-unaligned-access for kernel on ARMv6+.
> 
> On 2019/01/06 12:59, matthew green wrote:
>> Christos Zoulas writes:
>>> In article <20190106003905.60969f...@cvs.netbsd.org>,
>>> Rin Okuyama  wrote:
 -=-=-=-=-=-

 Module Name:    src
 Committed By:    rin
 Date:    Sun Jan  6 00:39:05 UTC 2019

 Modified Files:
 src/sys/dev/usb: if_axe.c

 Log Message:
 Fix kernel panic on arm reported by @furandon_pig on Twitter.

 Hardware header is 2-byte aligned in RX buffer, not 4-byte.
 For some architectures, __builtin_memcpy() of GCC 6 attempts to
 copy 4-byte header at once, which results in alignment error.
>>>
>>> This is really ugly..
>>>
>>> https://stackoverflow.com/questions/24883410/armcc-problems-with-memcpy-alignment-exceptions
>>>
>>>
>>> Perhaps there is a better solution? Can't memcpy be smarter?
>>
>> hmmm, what happens if struct axe_sframe_hdr is not marked
>> "packed"?  this feels like a compiler bug, but perhaps it
>> is assuming it can write 4 bytes to the structure when it
>> is only 2 byte aligned.
>>
>> is there a small test case that reproduces the problem?
>> preferably in user land?
> 
> On 2019/01/06 15:25, m...@netbsd.org wrote:
>> Are we building ARM with -mstrict-alignment?
> 
> I tried to reproduce the problem on userland. objdump(1) shows an
> unaligned load is generated. However, alignment fault does not occur:
> 
> % uname -p
> earmv7hf
> % cat test.c
> #include 
> #include 
> 
> int
> main()
> {
>     char buf[sizeof(int) + 1];
>     int i;
> 
>     fread(buf, 1, sizeof(buf), stdin);
>     memcpy(, [1], sizeof(i));
>     printf("0x%x\n", i);
>     return 0;
> }
> % cc -g -O2 test.c && cc test.o
> % objdump test.o
> ...
>   28:   e51b1013    ldr r1, [fp, #-19]  ; 0xffed
> ...
> % ./a.out
> 01234
> 0x34333231
> 
> This is because unaligned access is permitted for the user mode on
> ARMv6+. For GCC, -munaligned-access is enabled by default on ARMv6+.
> However, the unaligned access is forbidden in the supervisor mode.
> So, we need to explicitly specify -mno-unaligned-access for kernel
> on ARMv6+.
> 
> By reverting if_axe.c r1.94 and applying the attached patch, axe(4)
> works fine on earmv7hf. We can see that the instruction is changed
> from word-wise load to byte-wise load by specifying
> -mno-unaligned-access:
> 
> % armv7--netbsdelf-eabihf-objdump -d if_axe.o
> (before) 364:   e4983004    ldr r3, [r8], #4
> --->
> (after)  364:   e5d6    ldrb    r0, [r6]
> 
> I guess other codes can be miscompiled if -mno-unaligned-access is
> not specified. Can I commit the patch?
> 
> Thanks,
> rin
> 
> Index: sys/arch/arm/conf/Makefile.arm
> ===
> RCS file: /home/netbsd/src/sys/arch/arm/conf/Makefile.arm,v
> retrieving revision 1.49
> diff -p -u -r1.49 Makefile.arm
> --- sys/arch/arm/conf/Makefile.arm    22 Sep 2018 12:24:01 -    1.49
> +++ sys/arch/arm/conf/Makefile.arm    6 Jan 2019 08:14:56 -
> @@ -53,6 +53,13 @@ CPPFLAGS.cpufunc_asm_armv6.S+=    -mcpu=arm
>  CPPFLAGS.cpufunc_asm_arm11.S+=    -mcpu=arm1136j-s
>  CPPFLAGS.cpufunc_asm_xscale.S+=    -mcpu=xscale
>  
> +# For GCC, -munaligned-access is enabled by default for ARMv6+.
> +# But the unaligned access is forbidden in the supervisor mode.
> +.if (!empty(MACHINE_ARCH:Mearmv6*) || !empty(MACHINE_ARCH:Mearmv7*)) \
> +    && ${ACTIVE_CC} == "gcc"
> +CFLAGS+=    -mno-unaligned-access
> +.endif
> +
>  ##
>  ## (3) libkern and compat
>  ##




signature.asc
Description: OpenPGP digital signature


Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread Martin Husemann
On Sun, Jan 06, 2019 at 05:52:55AM -0800, Jason Thorpe wrote:
> That's probably a good idea in any case, because there will almost
> certainly be a performance benefit, but I still think ensuring that
> drivers don't perform unaligned accesses is desirable.

It is a bit tricky. We do this only when compiling for CPUs that can do
the unaligned access, i.e. when compiling kernels for arm v5 we tell
gcc to not generate unaligned access ops.

IIUC in this case the driver code was innocent (using proper memcpy),
but gcc optimized the memcpy (which was fine too, given the flags we
pass to it).

Martin


Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread Jason Thorpe



> On Jan 6, 2019, at 5:36 AM, Martin Husemann  wrote:
> 
> On Sun, Jan 06, 2019 at 08:31:53AM -0500, Greg Troxel wrote:
>> Why do we generate code with unaligned access in user space?  That seems
>> surprising, if the processor isn't happy about it.
> 
> The processor is happy with it, both in user- and kernel space.
> Only special memory regions mapped uncached make it trap.

There is a performance penalty for unaligned accesses, and not even all ARM 
versions can do it in a way that produces the expected results.  The device in 
question here is a USB device, and we support pre-ARMv6 systems that have USB 
capability.

> Nick suggest to change the mapping for bus_dma to cached, which would avoid
> the issue but may expose bugs in device drivers.

That's probably a good idea in any case, because there will almost certainly be 
a performance benefit, but I still think ensuring that drivers don't perform 
unaligned accesses is desirable.

-- thorpej



Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread Martin Husemann
On Sun, Jan 06, 2019 at 08:31:53AM -0500, Greg Troxel wrote:
> Why do we generate code with unaligned access in user space?  That seems
> surprising, if the processor isn't happy about it.

The processor is happy with it, both in user- and kernel space.
Only special memory regions mapped uncached make it trap.

Nick suggest to change the mapping for bus_dma to cached, which would avoid
the issue but may expose bugs in device drivers.

Martin


Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread Greg Troxel
matthew green  writes:

>> In short, this is because -munaligned-access is enabled on ARMv6+ by
>> default for GCC. As the unaligned memory access is forbidden in the
>> supervisor mode unlike in the user mode, we need to explicitly specify
>> -mno-unaligned-access for kernel on ARMv6+.
>
> i think this seems like the right thing to do here.
>
> othewise we'd have to patch this all over..

Why do we generate code with unaligned access in user space?  That seems
surprising, if the processor isn't happy about it.



re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread matthew green
> In short, this is because -munaligned-access is enabled on ARMv6+ by
> default for GCC. As the unaligned memory access is forbidden in the
> supervisor mode unlike in the user mode, we need to explicitly specify
> -mno-unaligned-access for kernel on ARMv6+.

i think this seems like the right thing to do here.

othewise we'd have to patch this all over..


.mrg.


Re: Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread Nick Hudson

On 06/01/2019 08:46, Rin Okuyama wrote:

(CC added to port-...@netbsd.org)

Let me summarize the problem briefly. In axe(4), there is a code where
memcpy() is carried out from 2-byte aligned buffer to 4-byte structure:

https://nxr.netbsd.org/xref/src/sys/dev/usb/if_axe.c#1284

This results in kernel panic due to alignment fault on earmv[67]hf:

https://twitter.com/furandon_pig/status/1071771151418908672

In short, this is because -munaligned-access is enabled on ARMv6+ by
default for GCC. As the unaligned memory access is forbidden in the
supervisor mode unlike in the user mode, we need to explicitly specify
-mno-unaligned-access for kernel on ARMv6+.



I'm pretty sure this is the same as http://gnats.netbsd.org/50038

Maybe I should be brave enough to stop using PMAP_NOCACHE in bus_dma.c 
for armv6+ and fix any missing bus_dmamap_sync calls.


Nick



Unaligned access in kernel on ARMv6+ (Re: CVS commit: src/sys/dev/usb)

2019-01-06 Thread Rin Okuyama

(CC added to port-...@netbsd.org)

Let me summarize the problem briefly. In axe(4), there is a code where
memcpy() is carried out from 2-byte aligned buffer to 4-byte structure:

https://nxr.netbsd.org/xref/src/sys/dev/usb/if_axe.c#1284

This results in kernel panic due to alignment fault on earmv[67]hf:

https://twitter.com/furandon_pig/status/1071771151418908672

In short, this is because -munaligned-access is enabled on ARMv6+ by
default for GCC. As the unaligned memory access is forbidden in the
supervisor mode unlike in the user mode, we need to explicitly specify
-mno-unaligned-access for kernel on ARMv6+.

On 2019/01/06 12:59, matthew green wrote:

Christos Zoulas writes:

In article <20190106003905.60969f...@cvs.netbsd.org>,
Rin Okuyama  wrote:

-=-=-=-=-=-

Module Name:src
Committed By:   rin
Date:   Sun Jan  6 00:39:05 UTC 2019

Modified Files:
src/sys/dev/usb: if_axe.c

Log Message:
Fix kernel panic on arm reported by @furandon_pig on Twitter.

Hardware header is 2-byte aligned in RX buffer, not 4-byte.
For some architectures, __builtin_memcpy() of GCC 6 attempts to
copy 4-byte header at once, which results in alignment error.


This is really ugly..

https://stackoverflow.com/questions/24883410/armcc-problems-with-memcpy-alignment-exceptions

Perhaps there is a better solution? Can't memcpy be smarter?


hmmm, what happens if struct axe_sframe_hdr is not marked
"packed"?  this feels like a compiler bug, but perhaps it
is assuming it can write 4 bytes to the structure when it
is only 2 byte aligned.

is there a small test case that reproduces the problem?
preferably in user land?


On 2019/01/06 15:25, m...@netbsd.org wrote:

Are we building ARM with -mstrict-alignment?


I tried to reproduce the problem on userland. objdump(1) shows an
unaligned load is generated. However, alignment fault does not occur:

% uname -p
earmv7hf
% cat test.c
#include 
#include 

int
main()
{
char buf[sizeof(int) + 1];
int i;

fread(buf, 1, sizeof(buf), stdin);
memcpy(, [1], sizeof(i));
printf("0x%x\n", i);
return 0;
}
% cc -g -O2 test.c && cc test.o
% objdump test.o
...
  28:   e51b1013ldr r1, [fp, #-19]  ; 0xffed
...
% ./a.out
01234
0x34333231

This is because unaligned access is permitted for the user mode on
ARMv6+. For GCC, -munaligned-access is enabled by default on ARMv6+.
However, the unaligned access is forbidden in the supervisor mode.
So, we need to explicitly specify -mno-unaligned-access for kernel
on ARMv6+.

By reverting if_axe.c r1.94 and applying the attached patch, axe(4)
works fine on earmv7hf. We can see that the instruction is changed
from word-wise load to byte-wise load by specifying
-mno-unaligned-access:

% armv7--netbsdelf-eabihf-objdump -d if_axe.o
(before) 364:   e4983004ldr r3, [r8], #4
--->
(after)  364:   e5d6ldrbr0, [r6]

I guess other codes can be miscompiled if -mno-unaligned-access is
not specified. Can I commit the patch?

Thanks,
rin

Index: sys/arch/arm/conf/Makefile.arm
===
RCS file: /home/netbsd/src/sys/arch/arm/conf/Makefile.arm,v
retrieving revision 1.49
diff -p -u -r1.49 Makefile.arm
--- sys/arch/arm/conf/Makefile.arm  22 Sep 2018 12:24:01 -  1.49
+++ sys/arch/arm/conf/Makefile.arm  6 Jan 2019 08:14:56 -
@@ -53,6 +53,13 @@ CPPFLAGS.cpufunc_asm_armv6.S+=   -mcpu=arm
 CPPFLAGS.cpufunc_asm_arm11.S+= -mcpu=arm1136j-s
 CPPFLAGS.cpufunc_asm_xscale.S+=-mcpu=xscale
 
+# For GCC, -munaligned-access is enabled by default for ARMv6+.

+# But the unaligned access is forbidden in the supervisor mode.
+.if (!empty(MACHINE_ARCH:Mearmv6*) || !empty(MACHINE_ARCH:Mearmv7*)) \
+&& ${ACTIVE_CC} == "gcc"
+CFLAGS+=   -mno-unaligned-access
+.endif
+
 ##
 ## (3) libkern and compat
 ##