>Synopsis: <alignment fault on armv7 (omap) using carp(4)> >Category: arm >Environment: System : OpenBSD 5.9 Details : OpenBSD 5.9 (DBGGENERIC) #0: Sat Feb 6 12:22:27 EST 2016 r...@beagle2.mit.edu:/usr/src/sys/arch/armv7/compile/DBGGENERIC
Architecture: OpenBSD.armv7 Machine : armv7 >Description: With two beaglebone black's running -current, an alignment fault is encountered at ip_input.c:262 in ipv4_input() when they are configured to use carp(4) to share the same IP address. Source context from ip_input.c (alignment fault occurs when ip->ip_dst.s_addr is loaded at line 262): 258: ip = mtod(m, struct ip *); 259: } 260: 261: /* 127/8 must not appear on wire - RFC1122 */ 262: if ((ntohl(ip->ip_dst.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET || 263: (ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET) { 264: if ((ifp->if_flags & IFF_LOOPBACK) == 0) { 265: ipstat.ips_badaddr++; 266: goto bad; ddb(4) output: $ Fatal kernel mode data abort: 'Alignment Fault 1' trapframe: 0xcb2d8e40 DFSR=00000001, DFAR=c4cb401e, spsr=80000013 r0 =c924d400, r1 =00000003, r2 =00000045, r3 =00000038 r4 =c4cb400e, r5 =c06f2ca4, r6 =00000014, r7 =c4d65800 r8 =c0710e50, r9 =c069294c, r10=c0692918, r11=cb2d8eb8 r12=60000093, ssp=cb2d8e8c, slr=c040bc88, pc =c04616ec Stopped at ipv4_input+0x9c: ldrls r3, [r4, #0x010] ddb> trace ipv4_input+0xc scp=0xc046165c rlv=0xc0461ab4 (ipintr+0x24) rsp=0xcb2d8ebc rfp=0xcb2d8ecc r10=0xc0692918 r8=0xc0710e50 r7=0xc06edd88 r6=0xc06edd88 r5=0x00000000 r4=0x00000004 ipintr+0xc scp=0xc0461a9c rlv=0xc041b290 (netintr+0xa0) rsp=0xcb2d8ed0 rfp=0xcb2d8ef0 netintr+0xc scp=0xc041b1fc rlv=0xc053f3d0 (softintr_dispatch+0x84) rsp=0xcb2d8ef4 rfp=0xcb2d8f10 r7=0x00000000 r6=0xc0710eb4 r5=0xc0710ec0 r4=0xc89e13a0 softintr_dispatch+0x18 scp=0xc053f364 rlv=0xc053eef8 (arm_do_pending_intr+0x110) rsp=0xcb2d8f14 rfp=0xcb2d8f40 r6=0xc0710190 r5=0x20000013 r4=0x00000004 arm_do_pending_intr+0x10 scp=0xc053edf8 rlv=0xc040d9a8 (if_input_process+0xcc) rsp=0xcb2d8f44 rfp=0xcb2d8f78 r10=0xc0692918 r9=0x00000000 r8=0x00000000 r7=0xcb2d8f44 r6=0x00000000 r5=0xc4d65800 r4=0xc4d57480 if_input_process+0xc scp=0xc040d8e8 rlv=0xc03b5c2c (taskq_thread+0x90) rsp=0xcb2d8f7c rfp=0xcb2d8fb0 r10=0xc06e643c r8=0xc06e65d8 r7=0xcb2d8f7c r6=0x00000001 r5=0xc89e2040 r4=0xc03b5b04 taskq_thread+0xc scp=0xc03b5ba8 rlv=0xc0536c10 (proc_trampoline+0x18) rsp=0xcb2d8fb4 rfp=0xc07f3edc r7=0x00000000 r6=0x00000000 r5=0xc89e2040 r4=0xc03b5b9c Bad frame pointer: 0xc07f3edc this problem has also been encountered with both BB's running -stable. >How-To-Repeat: Install either -current or -stable on two beaglebone black's, with names beagle1 and beagle2. On a LAN 192.168.123.0/24 with default gateway 192.168.123.2, set /etc/mygate to 192.168.123.2 on beagle1 and beagle2, then set /etc/hostname.cpsw0 on beagle1 to be inet 192.168.123.201 255.255.255.0 NONE and on beagle2 inet 192.168.123.202 255.255.255.0 NONE then run the following commands on both to use carp(4): doas ifconfig carp0 create doas ifconfig carp0 vhid 1 pass tyrell carpdev cpsw0 192.168.123.222 netmask 255.255.255.0 shortly thereafter a beaglebone will encounter an alignment fault. >Fix: The cause of this problem is unknown to me. I would speculate that the issue lies in m_pullup mishandling alignment, given that netowkring on the beaglebone black usually functions normally, and that there are branches prior to the crash in which m_pullup is used in deriving a pointer to ip, which when using carp(4) apparently misaligned. In investigating this issue further, I replaced offending 32-bit loads in the kernel with calls to get_unaligned_le32(), defined as (from linux/unaligned/packed_struct.h): struct __una_u32 { u32 x; } __packed; static inline u32 get_unaligned_le32(const void *p) { const struct __una_u32 *ptr = (const struct __una_u32 *)p; return ptr->x; } Other than replacements in ip_input.c, udp_usrreq.c was also changed as well as the macros IN6_IS_ADDR_UNSPECIFIED, IN6_IS_ADDR_LOOPBACK, IN6_IS_ADDR_V4COMPAT, and IN6_IS_ADDR_V4MAPPED in in6.h. This resulted in carp(4) appearing to function normally, but beagle1 and beagle2 repeatedly lost networking temporarily and recurrent 'device timeout's appeared in dmesg (as well as carp(4) messages informing state changes from master to slave and vice versa). To me that behavior might suggest the problem is deeper than a bookkeeping mistake of aligning memory in mbuf. dmesg: OpenBSD 5.9 (DBGGENERIC) #0: Sat Feb 6 12:22:27 EST 2016 r...@beagle2.mit.edu:/usr/src/sys/arch/armv7/compile/DBGGENERIC real mem = 536870912 (512MB) avail mem = 518074368 (494MB) warning: no entropy supplied by boot loader mainbus0 at root cpu0 at mainbus0: ARM Cortex A8 R3 rev 2 (ARMv7 core) cpu0: DC enabled IC enabled WB disabled EABT branch prediction enabled cpu0: 32KB(64b/l,4way) I-cache, 32KB(64b/l,4way) wr-back D-cache omap0 at mainbus0: TI AM335x BeagleBone prcm0 at omap0 rev 0.2 sitaracm0 at omap0: control module, rev 1.0 intc0 at omap0 rev 5.0 edma0 at omap0 rev 0.0 dmtimer0 at omap0 rev 3.1 dmtimer1 at omap0 rev 3.1 omdog0 at omap0 rev 0.1 omgpio0 at omap0: rev 0.1 gpio0 at omgpio0: 32 pins omgpio1 at omap0: rev 0.1 gpio1 at omgpio1: 32 pins omgpio2 at omap0: rev 0.1 gpio2 at omgpio2: 32 pins omgpio3 at omap0: rev 0.1 gpio3 at omgpio3: 32 pins omap0: device tiiic unit 0 not configured omap0: device tiiic unit 1 not configured omap0: device tiiic unit 2 not configured ommmc0 at omap0 sdmmc0 at ommmc0 ommmc1 at omap0 sdmmc1 at ommmc1 com0 at omap0: ti16750, 64 byte fifo com0: console cpsw0 at omap0: version 1.12 (0), address 84:eb:18:e4:61:3a ukphy0 at cpsw0 phy 0: Generic IEEE 802.3u media interface, rev. 1: OUI 0x0001f0, model 0x000f scsibus0 at sdmmc0: 2 targets, initiator 0 sd0 at scsibus0 targ 1 lun 0: <SD/MMC, Drive #01, > SCSI2 0/direct fixed sd0: 30436MB, 512 bytes/sector, 62333952 sectors scsibus1 at sdmmc1: 2 targets, initiator 0 sd1 at scsibus1 targ 1 lun 0: <SD/MMC, Drive #01, > SCSI2 0/direct fixed sd1: 3648MB, 512 bytes/sector, 7471104 sectors vscsi0 at root scsibus2 at vscsi0: 256 targets softraid0 at root scsibus3 at softraid0: 256 targets boot device: sd0 root on sd0a (c38fd352429a26ad.a) swap on sd0b dump on sd0b WARNING: / was not properly unmounted WARNING: CHECK AND RESET THE DATE!