On Thursday, February 7, 2019 5:27:09 AM IST Michael Ellerman wrote: > Chandan Rajendra <chan...@linux.ibm.com> writes: > > On Wednesday, February 6, 2019 5:20:04 PM IST Michael Ellerman wrote: > >> Chandan Rajendra <chan...@linux.ibm.com> writes: > >> > On Friday, February 1, 2019 4:43:52 PM IST Michael Ellerman wrote: > >> >> Michael Ellerman <m...@ellerman.id.au> writes: > >> >> > >> >> > Adding Simon who wrote the code. > >> >> > > >> >> > Chandan Rajendra <chan...@linux.ibm.com> writes: > >> >> >> When executing fstests' generic/026 test, I hit the following call > >> >> >> trace, > >> >> >> > >> >> >> [ 417.061038] BUG: Unable to handle kernel data access at > >> >> >> 0xc00000062ac40000 > >> >> >> [ 417.062172] Faulting instruction address: 0xc000000000092240 > >> >> >> [ 417.062242] Oops: Kernel access of bad area, sig: 11 [#1] > >> >> >> [ 417.062299] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries > >> >> >> [ 417.062366] Modules linked in: > >> >> >> [ 417.062401] CPU: 0 PID: 27828 Comm: chacl Not tainted > >> >> >> 5.0.0-rc2-next-20190115-00001-g6de6dba64dda #1 > >> >> >> [ 417.062495] NIP: c000000000092240 LR: c00000000066a55c CTR: > >> >> >> 0000000000000000 > >> >> >> [ 417.062567] REGS: c00000062c0c3430 TRAP: 0300 Not tainted > >> >> >> (5.0.0-rc2-next-20190115-00001-g6de6dba64dda) > >> >> >> [ 417.062660] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> > >> >> >> CR: 44000842 XER: 20000000 > >> >> >> [ 417.062750] CFAR: 00007fff7f3108ac DAR: c00000062ac40000 DSISR: > >> >> >> 40000000 IRQMASK: 0 > >> >> >> GPR00: 0000000000000000 c00000062c0c36c0 > >> >> >> c0000000017f4c00 c00000000121a660 > >> >> >> GPR04: c00000062ac3fff9 0000000000000004 > >> >> >> 0000000000000020 00000000275b19c4 > >> >> >> GPR08: 000000000000000c 46494c4500000000 > >> >> >> 5347495f41434c5f c0000000026073a0 > >> >> >> GPR12: 0000000000000000 c0000000027a0000 > >> >> >> 0000000000000000 0000000000000000 > >> >> >> GPR16: 0000000000000000 0000000000000000 > >> >> >> 0000000000000000 0000000000000000 > >> >> >> GPR20: c00000062ea70020 c00000062c0c38d0 > >> >> >> 0000000000000002 0000000000000002 > >> >> >> GPR24: c00000062ac3ffe8 00000000275b19c4 > >> >> >> 0000000000000001 c00000062ac30000 > >> >> >> GPR28: c00000062c0c38d0 c00000062ac30050 > >> >> >> c00000062ac30058 0000000000000000 > >> >> >> [ 417.063563] NIP [c000000000092240] memcmp+0x120/0x690 > >> >> >> [ 417.063635] LR [c00000000066a55c] > >> >> >> xfs_attr3_leaf_lookup_int+0x53c/0x5b0 > >> >> >> [ 417.063709] Call Trace: > >> >> >> [ 417.063744] [c00000062c0c36c0] [c00000000066a098] > >> >> >> xfs_attr3_leaf_lookup_int+0x78/0x5b0 (unreliable) > >> >> >> [ 417.063851] [c00000062c0c3760] [c000000000693f8c] > >> >> >> xfs_da3_node_lookup_int+0x32c/0x5a0 > >> >> >> [ 417.063944] [c00000062c0c3820] [c0000000006634a0] > >> >> >> xfs_attr_node_addname+0x170/0x6b0 > >> >> >> [ 417.064034] [c00000062c0c38b0] [c000000000664ffc] > >> >> >> xfs_attr_set+0x2ac/0x340 > >> >> >> [ 417.064118] [c00000062c0c39a0] [c000000000758d40] > >> >> >> __xfs_set_acl+0xf0/0x230 > >> >> >> [ 417.064190] [c00000062c0c3a00] [c000000000758f50] > >> >> >> xfs_set_acl+0xd0/0x160 > >> >> >> [ 417.064268] [c00000062c0c3aa0] [c0000000004b69b0] > >> >> >> set_posix_acl+0xc0/0x130 > >> >> >> [ 417.064339] [c00000062c0c3ae0] [c0000000004b6a88] > >> >> >> posix_acl_xattr_set+0x68/0x110 > >> >> >> [ 417.064412] [c00000062c0c3b20] [c0000000004532d4] > >> >> >> __vfs_setxattr+0xa4/0x110 > >> >> >> [ 417.064485] [c00000062c0c3b80] [c000000000454c2c] > >> >> >> __vfs_setxattr_noperm+0xac/0x240 > >> >> >> [ 417.064566] [c00000062c0c3bd0] [c000000000454ee8] > >> >> >> vfs_setxattr+0x128/0x130 > >> >> >> [ 417.064638] [c00000062c0c3c30] [c000000000455138] > >> >> >> setxattr+0x248/0x600 > >> >> >> [ 417.064710] [c00000062c0c3d90] [c000000000455738] > >> >> >> path_setxattr+0x108/0x120 > >> >> >> [ 417.064785] [c00000062c0c3e00] [c000000000455778] > >> >> >> sys_setxattr+0x28/0x40 > >> >> >> [ 417.064858] [c00000062c0c3e20] [c00000000000bae4] > >> >> >> system_call+0x5c/0x70 > >> >> >> [ 417.064930] Instruction dump: > >> >> >> [ 417.064964] 7d201c28 7d402428 7c295040 38630008 38840008 408201f0 > >> >> >> 4200ffe8 2c050000 > >> >> >> [ 417.065051] 4182ff6c 20c50008 54c61838 7d201c28 <7d402428> > >> >> >> 7d293436 7d4a3436 7c295040 > >> >> >> [ 417.065150] ---[ end trace 0d060411b5e3741b ]--- > >> >> >> > >> >> >> > >> >> >> Both the memory locations passed to memcmp() had "SGI_ACL_FILE" and > >> >> >> len > >> >> >> argument of memcmp() was set to 12. s1 argument of memcmp() had the > >> >> >> value > >> >> >> 0x00000000f4af0485, while s2 argument had the value > >> >> >> 0x00000000ce9e316f. > >> >> >> > >> >> >> The following is the code path within memcmp() that gets executed > >> >> >> for the > >> >> >> above mentioned values, > >> >> >> > >> >> >> - Since len (i.e. 12) is greater than 7, we branch to .Lno_short. > >> >> >> - We then prefetch the contents of r3 & r4 and branch to > >> >> >> .Ldiffoffset_8bytes_make_align_start. > >> >> >> - Under .Ldiffoffset_novmx_cmp, Since r3 is unaligned we end up > >> >> >> comparing > >> >> >> "SGI" part of the string. r3's value is then aligned. r4's value is > >> >> >> incremented by 3. For comparing the remaining 9 bytes, we jump to > >> >> >> .Lcmp_lt32bytes. > >> >> >> - Here, 8 bytes of the remaining 9 bytes are compared and execution > >> >> >> moves to > >> >> >> .Lcmp_rest_lt8bytes. > >> >> >> - Here we execute "LD rB,0,r4". In the case of this bug, r4 has an > >> >> >> unaligned > >> >> >> value and hence ends up accessing the "next" double word. The > >> >> >> "next" double > >> >> >> word happens to occur after the last page mapped into the kernel's > >> >> >> address > >> >> >> space and hence this leads to the previously listed oops. > >> >> > > >> >> > Thanks for the analysis. > >> >> > > >> >> > This is just a bug, we can't read past the end of the source or dest. > >> >> > >> >> How about this, works for me. > >> >> > >> >> cheers > >> >> > >> >> diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S > >> >> index 844d8e774492..2a302158cb53 100644 > >> >> --- a/arch/powerpc/lib/memcmp_64.S > >> >> +++ b/arch/powerpc/lib/memcmp_64.S > >> >> @@ -215,20 +215,29 @@ _GLOBAL_TOC(memcmp) > >> >> beq .Lzero > >> >> > >> >> .Lcmp_rest_lt8bytes: > >> >> - /* Here we have only less than 8 bytes to compare with. at > >> >> least s1 > >> >> - * Address is aligned with 8 bytes. > >> >> - * The next double words are load and shift right with > >> >> appropriate > >> >> - * bits. > >> >> + /* > >> >> + * Here we have less than 8 bytes left to compare with. We > >> >> mustn't read > >> >> + * past the end of either source or dest. > >> >> */ > >> >> - subfic r6,r5,8 > >> >> - slwi r6,r6,3 > >> >> - LD rA,0,r3 > >> >> - LD rB,0,r4 > >> >> - srd rA,rA,r6 > >> >> - srd rB,rB,r6 > >> >> - cmpld cr0,rA,rB > >> >> + > >> >> + /* If we have less than 4 bytes, just do byte at a time */ > >> >> + cmpwi cr1, r5, 4 > >> >> + blt cr1, .Lshort > >> >> + > >> >> + /* Compare 4 bytes */ > >> >> + LW rA,0,r3 > >> >> + LW rB,0,r4 > >> >> + cmpd cr0,rA,rB > >> >> bne cr0,.LcmpAB_lightweight > >> >> - b .Lzero > >> >> + > >> >> + /* If we had exactly 4 bytes left, we're done now */ > >> >> + beq cr1, .Lzero > >> >> + > >> >> + /* Otherwise do what ever's left a byte at a time */ > >> >> + subi r5, r5, 4 > >> >> + addi r3, r3, 4 > >> >> + addi r4, r4, 4 > >> >> + b .Lshort > >> >> > >> >> .Lnon_zero: > >> >> mr r3,rC > >> >> > >> >> > >> > > >> > With the above patch, Linux kernel does not end up in oops. Hence, > >> > > >> > Tested-by: Chandan Rajendra <chan...@linux.ibm.com> > >> > >> Thanks. > >> > >> How many times had you hit the original oops? ie. was it easily > >> reproducible? > > > > I could recreate the issue within 20 iterations of the test. For verifying > > your patch, I let the test run for 500 iterations. > > OK great. Thanks. > > I guess I should be running fstests. What's the easiest way to do that, > is there an avocado thing for it?
I just execute the main script shipped with fstests. You can get fstests from git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git. Please refer to the README file for a list of devel packages that needs to be installed before building fstests. Build fstests by invoking "make". You can then add the following into a file called local.config at the root directory of fstests, ------------------------------ export TEST_DEV=/dev/loop0 export TEST_DIR=/mnt/test export SCRATCH_DEV=/dev/loop1 export SCRATCH_MNT=/mnt/scratch MKFS_OPTIONS="-b size=65536" ------------------------------ Create xfs filesystem on $TEST_DEV i.e. "mkfs.xfs -f -b size=65536 /dev/loop0". Finally, invoke the test using, # ./check generic/026 -- chandan