> -----Original Message-----
> From: David Christensen <[email protected]>
> Sent: Friday, July 26, 2019 3:36 AM
> To: Jerin Jacob Kollanukkaran <[email protected]>; Bruce Richardson
> <[email protected]>; hgovindh
> <[email protected]>
> Cc: Remy Horton <[email protected]>; Marko Kovacevic
> <[email protected]>; Ori Kam <[email protected]>; Pablo de
> Lara <[email protected]>; Radu Nicolau
> <[email protected]>; Akhil Goyal <[email protected]>; Tomasz
> Kantecki <[email protected]>; [email protected];
> [email protected]; [email protected]; Gavin Hu
> <[email protected]>
> Subject: [EXT] Re: [dpdk-dev] [PATCH v2] examples/l3fwd: fix unaligned
> memory access
>
>
>>>> Fix unaligned memory access when reading IPv6 header which leads to
> >>>> segmentation fault by changing aligned memory read to unaligned
> >>>> memory read.
> >>>>
> >>>> Bugzilla ID: 279
> >>>> Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build")
> >>>> Cc: [email protected]
> >>>> Cc: [email protected]
> >>>> Signed-off-by: hgovindh <[email protected]>
> >>>> ---
> >>>> V2: Added functions which will do unaligned load based on the
> >>>> underlying architecture
> >>>> ---
> >>>> ---
> >>>> examples/l3fwd/l3fwd_em.c | 26 ++++++++++++++++++++++++--
> >>>> 1 file changed, 24 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/examples/l3fwd/l3fwd_em.c
> b/examples/l3fwd/l3fwd_em.c
> >>>> index fa8f82be6..f2641586b 100644
> >>>> --- a/examples/l3fwd/l3fwd_em.c
> >>>> +++ b/examples/l3fwd/l3fwd_em.c
> >>>> @@ -244,6 +244,29 @@ em_mask_key(void *key, xmm_t
> mask) #error No
> >>>> vector engine (SSE, NEON, ALTIVEC) available, check your toolchain
> >>>> #endif
> >>>>
> >>>> +#if defined(RTE_MACHINE_CPUFLAG_SSE2) static inline xmm_t
> >>>> +em_load_key(void *key) {
> >>>> + return _mm_loadu_si128((__m128i *)(key)); } #elif
> >>>> +defined(RTE_MACHINE_CPUFLAG_NEON)
> >>>> +static inline xmm_t
> >>>> +em_load_key(void *key)
> >>>> +{
> >>>> + return vld1q_s32((int32_t *)key); } #elif
> >>>> +defined(RTE_MACHINE_CPUFLAG_ALTIVEC)
> >>>> +static inline xmm_t
> >>>> +em_load_key(void *key)
> >>>> +{
> >>>> + return vec_ld(0, (xmm_t *)(key)); }
> >>
> >> Added power pc maintainer
> >
> >> Not sure all architecture need SIMD instructions for access to
> >> unaligned memory location.
> >>
> >> @hgovindh,
> >> Could you provide exact setup details for reproducing this issue, I
> >> can test it on arm64.
> >> Like l3fwd command, Traffic generator traffic pattern
> >
> > The vec_ld() function requires 16 byte alignment. (My understanding
> > is that GCC code will mask the lower four bits of the address to
> > enforce the requirement:
> > https://gcc.gcc.gnu.narkive.com/cJndcMpR/vec-ld-versus-vec-vsx-ld-on-p
> > ower8)
> > Power 8 and later processors support the vec_vsx_ld() function which
> > does not have the same memory alignment requirements.
> >
> > I'll need to try and reproduce the original bug to see what code is
> > actually being generated. Outside of vector instructions I wouldn't
> > expect to see errors with unaligned data references.
>
> Tested original bugzilla 279 on Power 9 system with RHEL 7.6 and gcc 4.8.5, no
> segmentation fault observed after 30 minutes (observed segmentation fault
> on Intel system immediately).
>
> Code dissassembly:
> (gdb) info line l3fwd_em.c:290
> Line 290 of "/home/davec/src/dpdk/examples/l3fwd/l3fwd_em.c" starts at
> address 0x10146fbc <em_main_loop+1660>
> and ends at 0x10146fc0 <em_main_loop+1664>.
> (gdb) disass /m 0x10146fbc,0x10146fc0
> Dump of assembler code from 0x10146fbc to 0x10146fc0:
> 290 key.xmm[1] = *(xmm_t *)data1;
> 0x0000000010146fbc <em_main_loop+1660>: li r7,20
>
> End of assembler dump.
>
> Since vector element ordering is different on Intel vs Power/ARM, suggest
> only applying vector operation to Intel code at this time otherwise additional
> steps may be required to modify MASK values to match the new vector
> operations.
On arm64, Generated assembly is following. Where LDUR and STR works
With unaligned memory(i.e no need for special handling).
I would suggest to have eal function to abstract The difference between x86 vs
Power/ARM
to avoid ifdef clutter in all the applications.
key.xmm[1] = *(xmm_t *)data1;
0x00000000004ebed4 <+1188>: 60 40 c1 3c ldur q0, [x3, #20]
0x00000000004ebedc <+1196>: a0 73 80 3d str q0, [x29, #448]
0x00000000004ec064 <+1588>: 41 40 c1 3c ldur q1, [x2, #20]
0x00000000004ec06c <+1596>: a1 73 80 3d str q1, [x29, #448]
>
> Dave