On Jul 19 2023, Aneesh Kumar K V wrote: > On 7/19/23 10:34 AM, Hugh Dickins wrote: > > On Tue, 18 Jul 2023, Aneesh Kumar K.V wrote: > >> Hugh Dickins <hu...@google.com> writes: > >> > >>> Instead of pte_lockptr(), use the recently added pte_offset_map_nolock() > >>> in assert_pte_locked(). BUG if pte_offset_map_nolock() fails: this is > >>> stricter than the previous implementation, which skipped when pmd_none() > >>> (with a comment on khugepaged collapse transitions): but wouldn't we want > >>> to know, if an assert_pte_locked() caller can be racing such transitions? > >>> > >> > >> The reason we had that pmd_none check there was to handle khugpaged. In > >> case of khugepaged we do pmdp_collapse_flush and then do a ptep_clear. > >> ppc64 had the assert_pte_locked check inside that ptep_clear. > >> > >> _pmd = pmdp_collapse_flush(vma, address, pmd); > >> .. > >> ptep_clear() > >> -> asset_ptep_locked() > >> ---> pmd_none > >> -----> BUG > >> > >> > >> The problem is how assert_pte_locked() verify whether we are holding > >> ptl. It does that by walking the page table again and in this specific > >> case by the time we call the function we already had cleared pmd . > > > > Aneesh, please clarify, I've spent hours on this. > > > > From all your use of past tense ("had"), I thought you were Acking my > > patch; but now, after looking again at v3.11 source and today's, > > I think you are NAKing my patch in its present form. > > > > Sorry for the confusion my reply created. > > > You are pointing out that anon THP's __collapse_huge_page_copy_succeeded() > > uses ptep_clear() at a point after pmdp_collapse_flush() already cleared > > *pmd, so my patch now leads that one use of assert_pte_locked() to BUG. > > Is that your point? > > > > Yes. I haven't tested this yet to verify that it is indeed hitting that BUG. > But a code inspection tells me we will hit that BUG on powerpc because of > the above details. > Hi Aneesh,
After testing it, I can confirm that it encountered a BUG on powerpc. Log report as attached Thanks, Jay Patel > > I can easily restore that khugepaged comment (which had appeared to me > > out of date at the time, but now looks still relevant) and pmd_none(*pmd) > > check: but please clarify. > > > > That is correct. if we add that pmd_none check back we should be good here. > > > -aneesh
[ 53.513058][ T105] ------------[ cut here ]------------ [ 53.513080][ T105] kernel BUG at arch/powerpc/mm/pgtable.c:327! [ 53.513090][ T105] Oops: Exception in kernel mode, sig: 5 [#1] [ 53.513099][ T105] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 53.513109][ T105] Modules linked in: bonding pseries_rng rng_core vmx_crypto gf128mul ibmveth crc32c_vpmsum fuse autofs4 [ 53.513135][ T105] CPU: 3 PID: 105 Comm: khugepaged Not tainted 6.5.0-rc1-gebfaf626e99f-dirty #1 [ 53.513146][ T105] Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries [ 53.513156][ T105] NIP: c000000000079478 LR: c00000000007946c CTR: 0000000000000000 [ 53.513165][ T105] REGS: c000000008e9b930 TRAP: 0700 Not tainted (6.5.0-rc1-gebfaf626e99f-dirty) [ 53.513175][ T105] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002882 XER: 20040000 [ 53.513202][ T105] CFAR: c000000000412a30 IRQMASK: 0 [ 53.513202][ T105] GPR00: c00000000007946c c000000008e9bbd0 c0000000012d3500 0000000000000001 [ 53.513202][ T105] GPR04: 0000000011000000 c000000008e9bbb0 c000000008e9bbf0 ffffffffffffffff [ 53.513202][ T105] GPR08: 00000000000003ff 0000000000000000 0000000000000000 000000000000000a [ 53.513202][ T105] GPR12: 00000000497b0000 c00000001ec9d480 c00c00000016fe00 c000000051455000 [ 53.513202][ T105] GPR16: 0000000000000000 ffffffffffffffff 0000000000000001 0000000000000001 [ 53.513202][ T105] GPR20: c000000002912430 c000000051455000 0000000000000000 c00000000946e650 [ 53.513202][ T105] GPR24: c0000000029800e8 0000000011000000 c00c000000145168 c000000002980180 [ 53.513202][ T105] GPR28: 0000000011000000 8603f85b000080c0 c000000008e9bc70 c00000001bd0d680 [ 53.513304][ T105] NIP [c000000000079478] assert_pte_locked.part.18+0x168/0x1b0 [ 53.513318][ T105] LR [c00000000007946c] assert_pte_locked.part.18+0x15c/0x1b0 [ 53.513329][ T105] Call Trace: [ 53.513335][ T105] [c000000008e9bbd0] [c00000000007946c] assert_pte_locked.part.18+0x15c/0x1b0 (unreliable) [ 53.513350][ T105] [c000000008e9bc00] [c00000000048e10c] collapse_huge_page+0x11dc/0x1700 [ 53.513362][ T105] [c000000008e9bd40] [c00000000048ed18] hpage_collapse_scan_pmd+0x6e8/0x850 [ 53.513374][ T105] [c000000008e9be30] [c000000000492544] khugepaged+0x7e4/0xb70 [ 53.513386][ T105] [c000000008e9bf90] [c00000000015f038] kthread+0x138/0x140 [ 53.513399][ T105] [c000000008e9bfe0] [c00000000000dd58] start_kernel_thread+0x14/0x18 [ 53.513411][ T105] Code: 7c852378 7c844c36 794a1564 7c894038 794af082 79241f24 78eaf00e 7c8a2214 48399541 60000000 7c630074 7863d182 <0b030000> e9210020 81290000 7d290074 [ 53.513448][ T105] ---[ end trace 0000000000000000 ]--- [ 53.516544][ T105] [ 53.516551][ T105] note: khugepaged[105] exited with irqs disabled [ 182.648447][ T1952] mconf[1952]: segfault (11) at 110efa38 nip 1001e97c lr 1001e8a8 code 1 [ 182.648482][ T1952] mconf[1952]: code: 60420000 4bfffd59 4bffffd0 60000000 60420000 e93f0070 2fa90000 409e0014 [ 182.648494][ T1952] mconf[1952]: code: 480000cc e9290000 2fa90000 419e00c0 <81490008> 2f8a0005 409effec e9490028 [ 962.694079][T39811] sda2: Can't mount, would change RO state